Anonymize a manuscript for double-blind peer review — without uploading the file

Before you upload to a journal submission system: a 7-minute workflow that strips the author name from the Word document, the document GUID and edit history from the PDF, and the camera/GPS metadata from every figure. Word and PDF both covered. Mac and Windows. The manuscript never leaves your computer.

Download FileHop Jump to the workflow

Why this matters (the 34% number, and where it actually comes from)

Anonymization is a process across multiple layers — textual, metadata, institutional, figure-content, and venue. This article handles the metadata layer cleanly and gives you a manual checklist for the textual layer. The starting point is a peer-reviewed primary source that almost no other article on this topic cites correctly.

Katz, Proto, and Olmsted (2002) prospectively reviewed manuscripts submitted to two radiology journals with double-blinded peer-review policies:

"Of 880 prospectively evaluated manuscripts, 300 (34%) contained information that potentially or definitely revealed the author or institution. The editors then correctly identified the authors or institutions in 221 of the 300 (74% of the leaks; 25% of all submissions)."

Katz DS, Proto AV, Olmsted WW. Incidence and nature of unblinding by authors: our experience at two radiology journals with double-blinded peer review policies. AJR Am J Roentgenol. 2002 Dec;179(6):1415-21. PMID 12438428.

The leaks broke down into five recurring categories: author initials embedded in the manuscript text, references to the author's own work as 'in press', references to the author's previously-published papers (without third-person framing), institutional identifiers in figures (a hospital name on a CT scanner, a department logo on a chart), and institution names in the methods section.
Almost none of these leak categories are metadata. They are textual. But the metadata layer — Word's Author field, the PDF's document GUID and producer string, the EXIF data on every embedded figure — adds a sixth class that the Katz study did not measure (because in 2002 most manuscripts arrived on paper or as fresh-from-Word .doc files). Today, the metadata channel is the easiest to leak and the easiest to fix.
A 2017 study (Le Goues et al., 'Effectiveness of Anonymization in Double-Blind Review', arXiv:1709.01609) followed up at three software-engineering venues and found 70-86% of reviews contained no author guess and 74-90% had no correct guess — anonymization is imperfect but largely effective. A 2022 paper (arXiv:2211.07467) showed that neural networks can attribute authorship from text content alone with about 73% accuracy. Together: even with perfect metadata hygiene the textual channel exists. This article handles the metadata channel cleanly. The textual channel is a manual review step — there is a checklist at the bottom of the page.

Attribution note: The 34% figure is sometimes misattributed to Nature. The misattribution is widespread but incorrect. Nature publishes a separate double-anonymous peer-review checklist (cited below) about the recommended procedure; the failure-rate statistic is from Katz/Proto/Olmsted 2002.

The three layers of metadata identity (and which tool handles each)

Most publisher checklists treat 'file metadata' as a single step ('use Document Inspector'). They are not the same step. There are three distinct layers of metadata identity in a manuscript submission, and each is handled by a different tool. Naming them separately is the difference between a clean anonymization and the 'I ran Document Inspector and the figures still had EXIF' missed-class problem.

1. Word document properties + tracked changes + comments

What it is: Author, Last Modified By, Manager, Company, Created date, Last Saved date, Revision Number, Last Printed date, document GUID, Title (often differs from the filename), Subject, Keywords, Hyperlink Base. Plus every tracked change still recorded inside word/document.xml even after you Accept All and turn off Track Changes. Plus every comment in word/comments.xml. Plus every co-author's name in word/people.xml.

Which tool handles it: Microsoft Word's Document Inspector. File → Info → Check for Issues → Inspect Document. Check 'Document Properties and Personal Information' and 'Comments, Revisions, and Versions'. Click Inspect, then Remove All on each section that returns results. This is Microsoft's tool — it is the right tool for the .docx layer, and FileHop does not replicate it.

2. PDF document properties + XMP metadata

What it is: The PDF Info dictionary (Author, Title, Subject, Keywords, Creator, Producer, CreationDate, ModDate — the Producer string often reveals which version of Word and which OS produced the file), the XMP metadata stream in the catalog (a duplicate of much of the same data plus an XMP MM:DocumentID and an InstanceID — the PDF equivalent of a GUID that follows the document through edits), and per-annotation /T author tags on any annotations that survived the .docx → PDF export.

Which tool handles it: FileHop PDF Compress with 'Remove metadata' enabled. Strips the entire Info dictionary from the trailer and removes XMP from the catalog. Runs locally on your computer; the manuscript never leaves your machine.

3. Image EXIF on every embedded figure file

What it is: Camera make and model (a Canon EOS R6 in the EXIF tells the reviewer the lab uses Canon, and which lens), GPS coordinates (a photograph taken at 41.8902° N, 12.4922° E says 'Rome' even if the caption says 'European city'), software/user name (Photoshop CC 2024, user 'jsmith'), edit history, the original date the photograph was captured. Present on JPEG, TIFF, PNG (via tEXt/iTXt chunks), HEIC, and WebP.

Which tool handles it: FileHop image batch metadata removal — strips EXIF, XMP, and IPTC from JPEG (APP1/APP2/APP13 markers, JFIF/APP0 preserved); removes non-essential chunks from PNG (tEXt/iTXt/zTXt/eXIf/tIME); re-encodes WebP and HEIC/HEIF without metadata; processes TIFF. Drag a folder of figures in, get a clean folder out. Runs locally.

Workflow A — you have a .docx manuscript (the Word path)

If the file you are sending started as a .docx, run this seven-step workflow. The first three steps happen inside Microsoft Word; steps 4-6 happen inside FileHop; step 7 is the manual textual review that no tool can do for you. Total time: about seven minutes.

1

Step 1: Work on a copy

File → Save As → MyManuscript_anon.docx. Do every following step on the anonymized copy. Keep the original — you need it for the camera-ready version after acceptance, and it preserves authorship for your records (important if your institution requires retention).
2

Step 2: Accept all tracked changes and delete all comments — in Microsoft Word

Review tab → Accept → Accept All Changes and Stop Tracking. Then Review tab → Delete → Delete All Comments in Document. Confirm Track Changes is OFF in the Review tab. This step alone does not remove the underlying revision history from the .docx; that requires the next step. But it removes the visible-on-toggle markup that the next step alone cannot.
3

Step 3: Run Document Inspector (Microsoft's tool, in Word)

File → Info → Check for Issues → Inspect Document. Tick 'Document Properties and Personal Information', 'Comments, Revisions, and Versions', and 'Headers, Footers, and Watermarks'. Click Inspect. For each item that returns results, click Remove All. This removes the Author, Company, and revision history from docProps/core.xml and docProps/app.xml, the residual w:ins / w:del / w:commentRangeStart / w:commentRangeEnd markers from word/document.xml, the comments from word/comments.xml, and the co-author entries from word/people.xml. This is Microsoft's tool. FileHop does not do this step.
4

Step 4: Convert to PDF locally — in the FileHop desktop app

Open FileHop → Convert → DOCX to PDF. Drop the cleaned .docx in. The conversion runs on your machine; the file never uploads. The output is a PDF — but it still carries its own metadata layer (the PDF Info dictionary, XMP) from the conversion engine, so do not skip the next step. Note: 'Print to PDF' in Word produces similar output but the PDF Producer field then advertises the OS and Word version; running the convert step in FileHop gives a more neutral Producer string.
5

Step 5: Strip the PDF metadata — in FileHop

FileHop → Compress PDF → enable 'Remove metadata'. Run the operation. FileHop removes the Info dictionary (Author, Title, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and the XMP metadata stream from the PDF catalog. The compress operation also re-encodes embedded images at the chosen quality, which incidentally removes any per-image EXIF that survived inside the PDF — if your figures are embedded inside the PDF rather than uploaded separately, this is the step that handles them. If your figures upload separately (most journals require them as separate files), run Step 6.
→ Open the PDF Compressor
6

Step 6: Batch-strip EXIF from figure files — in FileHop

FileHop → Images → Remove Metadata. Drop the folder of figure files in. FileHop strips EXIF/XMP/IPTC from JPEG, removes non-essential chunks from PNG, re-encodes WebP and HEIC/HEIF without metadata, and processes TIFF. Output a clean folder. Upload these to the submission system, not the originals.
7

The irreducible human step

Step 7: Manual self-reference review (no tool can do this for you)

This is the irreducible human step — the textual de-anonymization channel exists regardless of metadata hygiene. The 2022 'Cracking Double-Blind Review' paper (arXiv:2211.07467) demonstrated about 73% authorship attribution from text content alone, using citation patterns and writing style. Search the manuscript text for: your own initials (especially in the methods section — 'analyzed by JS' is a leak); any reference to your own work as 'in press' (rephrase to 'a related study'); any first-person reference to your prior publications ('we previously showed' → 'previous work by the authors showed'); any institutional identifiers in figure captions, scanner labels in radiographs, screen text in screenshots, or watermarks in supplementary material; any acknowledgements or grant numbers (move to the cover letter). No software can interpret these patterns for you.

Workflow B — you already have a PDF (the PDF-only path)

If the file is already a PDF and you did not produce it from a .docx (e.g., a converted manuscript a collaborator sent you, a typeset preprint, a scan), the workflow is shorter: no Word phase, just the PDF metadata strip, the figure EXIF strip, and the manual textual review.

1

Step 1: Work on a copy

Duplicate the file. Operate on the copy. Keep the original.
2

Step 2: Strip the PDF metadata — in FileHop

FileHop → Compress PDF → enable 'Remove metadata'. Same as Workflow A Step 5: removes the Info dictionary, XMP catalog metadata, and via the image re-encoding, any EXIF on figures embedded inside the PDF. If your figures are also uploaded as separate files, also run Step 3.
→ Open the PDF Compressor
3

Step 3: Batch-strip EXIF from figure files — in FileHop

Same as Workflow A Step 6. Drop the figure folder into FileHop → Images → Remove Metadata. Upload the cleaned files, not the originals.
4

The irreducible human step

Step 4: Manual self-reference review

Same as Workflow A Step 7. The textual checklist applies regardless of the source-file format — citation patterns, in-press references, first-person 'we previously showed' constructions, institutional identifiers in figures, and acknowledgements all need a manual pass.

Verify in 60 seconds (works with any tool, not just FileHop)

A short verification routine catches the cases where Step 5 didn't run, where the wrong file got uploaded, or where a figure copy slipped past the EXIF strip. The checks are tool-agnostic — they work with any anonymization workflow.

1 Open the final PDF in Preview (Mac) or right-click → Properties → Details (Windows). Confirm the Author, Title, and Subject fields are empty. The Producer field may still show a PDF library name (lopdf, PDFKit) — that is fine; it does not identify you.
2 In a terminal, run pdfinfo MyManuscript_anon.pdf (Mac/Linux with poppler-utils installed) or open the file in any PDF reader's Document Properties view. Confirm /Author, /Title, /Subject, /Keywords, /Creator are empty or missing.
3 Open one of the separately-uploaded figure files. On Mac: File → Get Info → More Info → confirm no Camera, Lens, GPS, or Software fields. On Windows: right-click → Properties → Details → confirm Camera, GPS, and Author rows are empty.
4 Search the manuscript text (Cmd+F / Ctrl+F) for your last name, your first name, your initials, your institution's name, your institution's acronym, any grant number, and the word 'we'. The 'we' search is the most useful — it surfaces every place where you wrote in first person about your own prior work.
5 Open the PDF in a PDF reader, copy-paste a paragraph into a plain-text editor, and confirm the copied text matches what is on the page (no surprise residue from tracked changes that flattened into the page during a bad export).
6 Have a co-author or a labmate cold-read the manuscript for 5 minutes and ask: 'who do you think wrote this?' If they can guess from the citation pattern or the writing tics, the textual channel is leaking and Step 7 of Workflow A needs another pass.

What this workflow does NOT do

• It does NOT remove the textual de-anonymization channel. Citation patterns, in-press self-references, first-person 'we previously showed' constructions, and writing-style attribution are textual — not metadata — and no tool can clean them automatically. Step 7 of Workflow A is the irreducible human step. The 2022 'Cracking Double-Blind Review' paper (arXiv:2211.07467) demonstrated about 73% authorship attribution from text content alone.
• It does NOT cover preprint-server de-anonymization. If you posted the manuscript to arXiv, bioRxiv, SSRN, OSF, or a personal/institutional repository before submission, reviewers may find it via Google during the review window. A 2020 arXiv paper (arXiv:2007.00177) showed statistically significant correlation between arXiv presence and acceptance at ICLR 2019/2020. Check your target venue's preprint policy.
• It does NOT certify journal/venue compliance. Each journal sets its own anonymization requirements (some want grant numbers retained in a separate file; some want clinical trial registration numbers removed; some have stricter EXIF policies). Read the target journal's author guidelines and use this workflow as the file-layer hygiene baseline. It also does not constitute IRB compliance — that is your institution's process.
• It does NOT do in-place .docx metadata scrubbing inside FileHop. The Word layer is Microsoft's Document Inspector territory; FileHop handles the convert-to-PDF + strip-PDF-metadata + EXIF-strip phases that come after.
• It runs on macOS and Windows only. Linux users — common in HPC and computational STEM — can substitute qpdf (for PDF metadata) and exiftool (for image EXIF) as a CLI workflow with equivalent results. See FAQ for the exact commands.

Why local processing matters for unpublished manuscripts

An unpublished manuscript is one of the most sensitive document categories in a researcher's working life — it carries unpublished data, unpublished interpretation, and the priority claim itself. Uploading it to a third-party metadata scrubber, an online PDF tool, or a 'free' AI-powered cleanup service hands a copy to a third-party server with terms of service you did not negotiate. The 'without uploading the file' line in the headline is the wedge for this workflow.

• Every step in Workflows A and B runs on your computer. The DOCX-to-PDF conversion, the PDF metadata strip, and the image EXIF strip all happen in the FileHop desktop app. No file is uploaded. No file is sent to a server.
• No telemetry on file contents. We do not log what you scrubbed, what was in it, or which journal you are submitting to.
• No AI training on your files.
• Open output format. FileHop writes standard PDF — opens identically in the journal submission system, in Acrobat, in Preview, and in any PDF reader the editor uses.
• Honest scope on the .docx phase: Microsoft Word's Document Inspector is also a local tool — it does not phone home to Microsoft. The Word phase of this workflow is desktop-local end-to-end IF you run Word on your machine. Web-based Word (Office.com) is a different posture; the scrub still works but the file is on Microsoft's servers.
• Mac and Windows. Tested through FileHop v0.29 (2026).

Why uploading sensitive files to online converters is a problem → FileHop for researchers — the full workflow set → Compress a PDF (with metadata removal) →

Sources

Authoritative pages used to verify the workflow above. The Katz/Proto/Olmsted 2002 AJR paper is the load-bearing primary source for the 34% statistic. No endorsement implied.

Frequently asked questions

Does saving a Word document as PDF remove the metadata? ▼

No. Saving a .docx as PDF preserves most of the document properties — the Author and Title fields carry through, and the PDF then gets its own Producer string and document GUID layered on top. You need both Word's Document Inspector on the .docx AND a PDF metadata strip on the resulting PDF. Workflow A covers both steps.

Is Microsoft Word's Document Inspector enough on its own? ▼

For the .docx layer, yes — it removes the document properties, the residual tracked-changes markers, the comments, and the co-author metadata from the relevant XML files inside the .docx package. But it does not touch the PDF you produce afterwards, and it does not touch the EXIF on figure files. Document Inspector is one of three steps, not the whole job.

Does FileHop remove metadata from a .docx file directly? ▼

No. FileHop handles the PDF half of the workflow: convert the cleaned .docx to PDF locally, strip the PDF Info dictionary and XMP metadata, and batch-strip EXIF from figure files. For the .docx itself, use Microsoft Word's own Document Inspector (File → Info → Check for Issues → Inspect Document). That is Microsoft's tool and the right one for the .docx layer.

What exactly does FileHop's PDF metadata strip remove? ▼

The entire PDF Info dictionary in the trailer (Author, Title, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and the XMP metadata stream referenced from the PDF catalog (a duplicate of much of that data plus the XMP MM:DocumentID and InstanceID identifiers). The strip runs as part of the Compress PDF operation when 'Remove metadata' is enabled. The PDF Producer field after the strip may show a PDF library name (e.g., lopdf) — that does not identify you.

Does the upload happen on FileHop's servers? ▼

No. FileHop is a desktop app for macOS and Windows; the PDF metadata strip, the image EXIF strip, and the .docx-to-PDF conversion all run on your computer. The manuscript file never leaves your machine. This is the trust wedge over online metadata scrubbers, which require uploading the (unpublished) manuscript to a third-party server.

Why do I have to remove EXIF from figures separately? ▼

Because most journal submission systems require figures as separate files (TIFF, JPEG, PNG), not embedded inside the manuscript PDF. The PDF metadata strip cleans the manuscript file; the figures are their own files with their own metadata layer. Camera make/model, GPS coordinates, and the software/user name on a Photoshop edit can all appear in figure EXIF. FileHop's image batch metadata removal handles JPEG, PNG, WebP, HEIC/HEIF, and TIFF in one pass.

Will this anonymize my manuscript guaranteed? ▼

No tool can guarantee that. Anonymization is a process across multiple layers: textual (citation patterns, in-press references, self-references in the third person), metadata (document properties, PDF Info dictionary, EXIF), institutional (figure labels, scanner labels, screenshot text), and venue (preprint-server timing). This workflow handles the metadata layer cleanly and gives you a manual checklist for the textual layer. The interpretive 'is this paragraph identifying?' decision is yours.

Where did the 34% figure come from? ▼

Katz DS, Proto AV, Olmsted WW. 'Incidence and nature of unblinding by authors: our experience at two radiology journals with double-blinded peer review policies.' AJR Am J Roentgenol. 2002 Dec;179(6):1415-21 (PMID 12438428). Of 880 prospectively reviewed manuscripts, 300 (34%) contained unblinding information. The figure is sometimes misattributed to Nature; the misattribution is widespread but incorrect. The Nature checklist is a separate document about the recommended procedure.

What about preprint servers — should I delay posting to arXiv? ▼

Check the target journal's preprint policy. Some venues consider preprint-server presence a violation of double-blind protocols; others explicitly allow it. A 2020 arXiv study (2007.00177) found statistically significant correlation between arXiv presence and acceptance at ICLR 2019/2020, which most reviewers attributed to informal de-anonymization. If your venue prohibits preprint posting during review, delay; if it allows, the timing is your call.

Does FileHop work on Linux? ▼

No. FileHop is macOS and Windows only. Linux researchers can substitute qpdf + exiftool as a CLI workflow: 'qpdf --empty --pages input.pdf 1-z -- output.pdf && qpdf --remove-info output.pdf cleaned.pdf' for the PDF metadata strip, and 'exiftool -all= -overwrite_original *.jpg *.png *.tif *.tiff' for the figure EXIF strip. Both are local-processing and well-maintained.

Will this break my reviewer's ability to load the file? ▼

No. Removing metadata does not affect page content, fonts, embedded images, bookmarks, internal links, or the text layer. The PDF opens and renders identically in Preview, Acrobat, the journal submission system viewer, and any other PDF reader. Only the hidden author/document-properties fields are emptied.

How do I cite my own previous work without revealing identity? ▼

In the third person, by reference number rather than by name. Replace 'we previously showed (Smith et al. 2023)' with 'previous work demonstrated [12]' where reference 12 is your Smith et al. 2023 paper. Replace 'we are currently extending this analysis (Smith et al., in press)' with 'this approach has been described in a related study [13]'. Replace 'our prior method' with 'the prior method described in [14]'. Nature's checklist, IOP's checklist, and Scholastica's documentation all converge on the third-person reference-numbered form.

Download FileHop

Mac and Windows. The PDF metadata strip, the image EXIF strip, and the .docx-to-PDF conversion all run on your computer; the manuscript never uploads. The compress + remove-metadata step is in the PDF section; the image batch metadata removal is in the Images section. Free to install.

Download FileHop (Mac + Windows) See the full researcher workflow set

For researchers — the daily workflow

Strip document metadata before sending (lawyers)

PDF Compressor (with metadata removal)

Why uploading sensitive files to online converters is a problem

Anonymize a manuscript for double-blind peer review — without uploading the file

Why this matters (the 34% number, and where it actually comes from)

The three layers of metadata identity (and which tool handles each)

1. Word document properties + tracked changes + comments

2. PDF document properties + XMP metadata

3. Image EXIF on every embedded figure file

Workflow A — you have a .docx manuscript (the Word path)

Step 1: Work on a copy

Step 2: Accept all tracked changes and delete all comments — in Microsoft Word

Step 3: Run Document Inspector (Microsoft's tool, in Word)

Step 4: Convert to PDF locally — in the FileHop desktop app

Step 5: Strip the PDF metadata — in FileHop

Step 6: Batch-strip EXIF from figure files — in FileHop

Step 7: Manual self-reference review (no tool can do this for you)

Workflow B — you already have a PDF (the PDF-only path)

Step 1: Work on a copy

Step 2: Strip the PDF metadata — in FileHop

Step 3: Batch-strip EXIF from figure files — in FileHop

Step 4: Manual self-reference review