Anonymize a manuscript for double-blind peer review — without uploading the file
Before you upload to a journal submission system: a 7-minute workflow that strips the author name from the Word document, the document GUID and edit history from the PDF, and the camera/GPS metadata from every figure. Word and PDF both covered. Mac and Windows. The manuscript never leaves your computer.
Why this matters (the 34% number, and where it actually comes from)
Anonymization is a process across multiple layers — textual, metadata, institutional, figure-content, and venue. This article handles the metadata layer cleanly and gives you a manual checklist for the textual layer. The starting point is a peer-reviewed primary source that almost no other article on this topic cites correctly.
Katz, Proto, and Olmsted (2002) prospectively reviewed manuscripts submitted to two radiology journals with double-blinded peer-review policies:
"Of 880 prospectively evaluated manuscripts, 300 (34%) contained information that potentially or definitely revealed the author or institution. The editors then correctly identified the authors or institutions in 221 of the 300 (74% of the leaks; 25% of all submissions)."
- The leaks broke down into five recurring categories: author initials embedded in the manuscript text, references to the author's own work as 'in press', references to the author's previously-published papers (without third-person framing), institutional identifiers in figures (a hospital name on a CT scanner, a department logo on a chart), and institution names in the methods section.
- Almost none of these leak categories are metadata. They are textual. But the metadata layer — Word's Author field, the PDF's document GUID and producer string, the EXIF data on every embedded figure — adds a sixth class that the Katz study did not measure (because in 2002 most manuscripts arrived on paper or as fresh-from-Word .doc files). Today, the metadata channel is the easiest to leak and the easiest to fix.
- A 2017 study (Le Goues et al., 'Effectiveness of Anonymization in Double-Blind Review', arXiv:1709.01609) followed up at three software-engineering venues and found 70-86% of reviews contained no author guess and 74-90% had no correct guess — anonymization is imperfect but largely effective. A 2022 paper (arXiv:2211.07467) showed that neural networks can attribute authorship from text content alone with about 73% accuracy. Together: even with perfect metadata hygiene the textual channel exists. This article handles the metadata channel cleanly. The textual channel is a manual review step — there is a checklist at the bottom of the page.
Attribution note: The 34% figure is sometimes misattributed to Nature. The misattribution is widespread but incorrect. Nature publishes a separate double-anonymous peer-review checklist (cited below) about the recommended procedure; the failure-rate statistic is from Katz/Proto/Olmsted 2002.
The three layers of metadata identity (and which tool handles each)
Most publisher checklists treat 'file metadata' as a single step ('use Document Inspector'). They are not the same step. There are three distinct layers of metadata identity in a manuscript submission, and each is handled by a different tool. Naming them separately is the difference between a clean anonymization and the 'I ran Document Inspector and the figures still had EXIF' missed-class problem.
1. Word document properties + tracked changes + comments
What it is: Author, Last Modified By, Manager, Company, Created date, Last Saved date, Revision Number, Last Printed date, document GUID, Title (often differs from the filename), Subject, Keywords, Hyperlink Base. Plus every tracked change still recorded inside word/document.xml even after you Accept All and turn off Track Changes. Plus every comment in word/comments.xml. Plus every co-author's name in word/people.xml.
Which tool handles it: Microsoft Word's Document Inspector. File → Info → Check for Issues → Inspect Document. Check 'Document Properties and Personal Information' and 'Comments, Revisions, and Versions'. Click Inspect, then Remove All on each section that returns results. This is Microsoft's tool — it is the right tool for the .docx layer, and FileHop does not replicate it.
2. PDF document properties + XMP metadata
What it is: The PDF Info dictionary (Author, Title, Subject, Keywords, Creator, Producer, CreationDate, ModDate — the Producer string often reveals which version of Word and which OS produced the file), the XMP metadata stream in the catalog (a duplicate of much of the same data plus an XMP MM:DocumentID and an InstanceID — the PDF equivalent of a GUID that follows the document through edits), and per-annotation /T author tags on any annotations that survived the .docx → PDF export.
Which tool handles it: FileHop PDF Compress with 'Remove metadata' enabled. Strips the entire Info dictionary from the trailer and removes XMP from the catalog. Runs locally on your computer; the manuscript never leaves your machine.
3. Image EXIF on every embedded figure file
What it is: Camera make and model (a Canon EOS R6 in the EXIF tells the reviewer the lab uses Canon, and which lens), GPS coordinates (a photograph taken at 41.8902° N, 12.4922° E says 'Rome' even if the caption says 'European city'), software/user name (Photoshop CC 2024, user 'jsmith'), edit history, the original date the photograph was captured. Present on JPEG, TIFF, PNG (via tEXt/iTXt chunks), HEIC, and WebP.
Which tool handles it: FileHop image batch metadata removal — strips EXIF, XMP, and IPTC from JPEG (APP1/APP2/APP13 markers, JFIF/APP0 preserved); removes non-essential chunks from PNG (tEXt/iTXt/zTXt/eXIf/tIME); re-encodes WebP and HEIC/HEIF without metadata; processes TIFF. Drag a folder of figures in, get a clean folder out. Runs locally.
Workflow A — you have a .docx manuscript (the Word path)
If the file you are sending started as a .docx, run this seven-step workflow. The first three steps happen inside Microsoft Word; steps 4-6 happen inside FileHop; step 7 is the manual textual review that no tool can do for you. Total time: about seven minutes.
- 1
Step 1: Work on a copy
File → Save As → MyManuscript_anon.docx. Do every following step on the anonymized copy. Keep the original — you need it for the camera-ready version after acceptance, and it preserves authorship for your records (important if your institution requires retention).
- 2
Step 2: Accept all tracked changes and delete all comments — in Microsoft Word
Review tab → Accept → Accept All Changes and Stop Tracking. Then Review tab → Delete → Delete All Comments in Document. Confirm Track Changes is OFF in the Review tab. This step alone does not remove the underlying revision history from the .docx; that requires the next step. But it removes the visible-on-toggle markup that the next step alone cannot.
- 3
Step 3: Run Document Inspector (Microsoft's tool, in Word)
File → Info → Check for Issues → Inspect Document. Tick 'Document Properties and Personal Information', 'Comments, Revisions, and Versions', and 'Headers, Footers, and Watermarks'. Click Inspect. For each item that returns results, click Remove All. This removes the Author, Company, and revision history from docProps/core.xml and docProps/app.xml, the residual w:ins / w:del / w:commentRangeStart / w:commentRangeEnd markers from word/document.xml, the comments from word/comments.xml, and the co-author entries from word/people.xml. This is Microsoft's tool. FileHop does not do this step.
- 4
Step 4: Convert to PDF locally — in the FileHop desktop app
Open FileHop → Convert → DOCX to PDF. Drop the cleaned .docx in. The conversion runs on your machine; the file never uploads. The output is a PDF — but it still carries its own metadata layer (the PDF Info dictionary, XMP) from the conversion engine, so do not skip the next step. Note: 'Print to PDF' in Word produces similar output but the PDF Producer field then advertises the OS and Word version; running the convert step in FileHop gives a more neutral Producer string.
- 5
Step 5: Strip the PDF metadata — in FileHop
FileHop → Compress PDF → enable 'Remove metadata'. Run the operation. FileHop removes the Info dictionary (Author, Title, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and the XMP metadata stream from the PDF catalog. The compress operation also re-encodes embedded images at the chosen quality, which incidentally removes any per-image EXIF that survived inside the PDF — if your figures are embedded inside the PDF rather than uploaded separately, this is the step that handles them. If your figures upload separately (most journals require them as separate files), run Step 6.
→ Open the PDF Compressor - 6
Step 6: Batch-strip EXIF from figure files — in FileHop
FileHop → Images → Remove Metadata. Drop the folder of figure files in. FileHop strips EXIF/XMP/IPTC from JPEG, removes non-essential chunks from PNG, re-encodes WebP and HEIC/HEIF without metadata, and processes TIFF. Output a clean folder. Upload these to the submission system, not the originals.
- 7
The irreducible human step
Step 7: Manual self-reference review (no tool can do this for you)
This is the irreducible human step — the textual de-anonymization channel exists regardless of metadata hygiene. The 2022 'Cracking Double-Blind Review' paper (arXiv:2211.07467) demonstrated about 73% authorship attribution from text content alone, using citation patterns and writing style. Search the manuscript text for: your own initials (especially in the methods section — 'analyzed by JS' is a leak); any reference to your own work as 'in press' (rephrase to 'a related study'); any first-person reference to your prior publications ('we previously showed' → 'previous work by the authors showed'); any institutional identifiers in figure captions, scanner labels in radiographs, screen text in screenshots, or watermarks in supplementary material; any acknowledgements or grant numbers (move to the cover letter). No software can interpret these patterns for you.
Workflow B — you already have a PDF (the PDF-only path)
If the file is already a PDF and you did not produce it from a .docx (e.g., a converted manuscript a collaborator sent you, a typeset preprint, a scan), the workflow is shorter: no Word phase, just the PDF metadata strip, the figure EXIF strip, and the manual textual review.
- 1
Step 1: Work on a copy
Duplicate the file. Operate on the copy. Keep the original.
- 2
Step 2: Strip the PDF metadata — in FileHop
FileHop → Compress PDF → enable 'Remove metadata'. Same as Workflow A Step 5: removes the Info dictionary, XMP catalog metadata, and via the image re-encoding, any EXIF on figures embedded inside the PDF. If your figures are also uploaded as separate files, also run Step 3.
→ Open the PDF Compressor - 3
Step 3: Batch-strip EXIF from figure files — in FileHop
Same as Workflow A Step 6. Drop the figure folder into FileHop → Images → Remove Metadata. Upload the cleaned files, not the originals.
- 4
The irreducible human step
Step 4: Manual self-reference review
Same as Workflow A Step 7. The textual checklist applies regardless of the source-file format — citation patterns, in-press references, first-person 'we previously showed' constructions, institutional identifiers in figures, and acknowledgements all need a manual pass.
Verify in 60 seconds (works with any tool, not just FileHop)
A short verification routine catches the cases where Step 5 didn't run, where the wrong file got uploaded, or where a figure copy slipped past the EXIF strip. The checks are tool-agnostic — they work with any anonymization workflow.
- 1 Open the final PDF in Preview (Mac) or right-click → Properties → Details (Windows). Confirm the Author, Title, and Subject fields are empty. The Producer field may still show a PDF library name (lopdf, PDFKit) — that is fine; it does not identify you.
- 2 In a terminal, run pdfinfo MyManuscript_anon.pdf (Mac/Linux with poppler-utils installed) or open the file in any PDF reader's Document Properties view. Confirm /Author, /Title, /Subject, /Keywords, /Creator are empty or missing.
- 3 Open one of the separately-uploaded figure files. On Mac: File → Get Info → More Info → confirm no Camera, Lens, GPS, or Software fields. On Windows: right-click → Properties → Details → confirm Camera, GPS, and Author rows are empty.
- 4 Search the manuscript text (Cmd+F / Ctrl+F) for your last name, your first name, your initials, your institution's name, your institution's acronym, any grant number, and the word 'we'. The 'we' search is the most useful — it surfaces every place where you wrote in first person about your own prior work.
- 5 Open the PDF in a PDF reader, copy-paste a paragraph into a plain-text editor, and confirm the copied text matches what is on the page (no surprise residue from tracked changes that flattened into the page during a bad export).
- 6 Have a co-author or a labmate cold-read the manuscript for 5 minutes and ask: 'who do you think wrote this?' If they can guess from the citation pattern or the writing tics, the textual channel is leaking and Step 7 of Workflow A needs another pass.
What this workflow does NOT do
- • It does NOT remove the textual de-anonymization channel. Citation patterns, in-press self-references, first-person 'we previously showed' constructions, and writing-style attribution are textual — not metadata — and no tool can clean them automatically. Step 7 of Workflow A is the irreducible human step. The 2022 'Cracking Double-Blind Review' paper (arXiv:2211.07467) demonstrated about 73% authorship attribution from text content alone.
- • It does NOT cover preprint-server de-anonymization. If you posted the manuscript to arXiv, bioRxiv, SSRN, OSF, or a personal/institutional repository before submission, reviewers may find it via Google during the review window. A 2020 arXiv paper (arXiv:2007.00177) showed statistically significant correlation between arXiv presence and acceptance at ICLR 2019/2020. Check your target venue's preprint policy.
- • It does NOT certify journal/venue compliance. Each journal sets its own anonymization requirements (some want grant numbers retained in a separate file; some want clinical trial registration numbers removed; some have stricter EXIF policies). Read the target journal's author guidelines and use this workflow as the file-layer hygiene baseline. It also does not constitute IRB compliance — that is your institution's process.
- • It does NOT do in-place .docx metadata scrubbing inside FileHop. The Word layer is Microsoft's Document Inspector territory; FileHop handles the convert-to-PDF + strip-PDF-metadata + EXIF-strip phases that come after.
- • It runs on macOS and Windows only. Linux users — common in HPC and computational STEM — can substitute qpdf (for PDF metadata) and exiftool (for image EXIF) as a CLI workflow with equivalent results. See FAQ for the exact commands.
Why local processing matters for unpublished manuscripts
An unpublished manuscript is one of the most sensitive document categories in a researcher's working life — it carries unpublished data, unpublished interpretation, and the priority claim itself. Uploading it to a third-party metadata scrubber, an online PDF tool, or a 'free' AI-powered cleanup service hands a copy to a third-party server with terms of service you did not negotiate. The 'without uploading the file' line in the headline is the wedge for this workflow.
- • Every step in Workflows A and B runs on your computer. The DOCX-to-PDF conversion, the PDF metadata strip, and the image EXIF strip all happen in the FileHop desktop app. No file is uploaded. No file is sent to a server.
- • No telemetry on file contents. We do not log what you scrubbed, what was in it, or which journal you are submitting to.
- • No AI training on your files.
- • Open output format. FileHop writes standard PDF — opens identically in the journal submission system, in Acrobat, in Preview, and in any PDF reader the editor uses.
- • Honest scope on the .docx phase: Microsoft Word's Document Inspector is also a local tool — it does not phone home to Microsoft. The Word phase of this workflow is desktop-local end-to-end IF you run Word on your machine. Web-based Word (Office.com) is a different posture; the scrub still works but the file is on Microsoft's servers.
- • Mac and Windows. Tested through FileHop v0.29 (2026).
Sources
Authoritative pages used to verify the workflow above. The Katz/Proto/Olmsted 2002 AJR paper is the load-bearing primary source for the 34% statistic. No endorsement implied.
- Katz DS, Proto AV, Olmsted WW. Incidence and nature of unblinding by authors: our experience at two radiology journals with double-blinded peer review policies. AJR Am J Roentgenol. 2002 Dec;179(6):1415-21. PMID 12438428.
- Le Goues C, Brun Y, Apel S, Berger E, Khurshid S, Smaragdakis Y. Effectiveness of Anonymization in Double-Blind Review. arXiv:1709.01609 (2017).
- Cracking Double-Blind Review: Authorship Attribution with Deep Learning. arXiv:2211.07467 (2022).
- De-anonymization of authors through arXiv submissions during double-blind review. arXiv:2007.00177 (2020).
- Nature — How to anonymize a manuscript for double-blind peer review (author checklist).
- Nature — Double-anonymized peer review guidelines (publisher policy).
- Microsoft Support — Remove hidden data and personal information by inspecting documents (Document Inspector).
- Taylor & Francis — How to submit your manuscript for anonymous peer review.
- IOP Publishing — Checklist for anonymising your manuscript for double-anonymous peer review.
- Scholastica — Anonymizing Your Manuscript Submission.
- Sibling article: Strip document metadata before sending (lawyer-cluster sibling — same two-tool workflow shape, attorney audience).
Frequently asked questions
Does saving a Word document as PDF remove the metadata? ▼
Is Microsoft Word's Document Inspector enough on its own? ▼
Does FileHop remove metadata from a .docx file directly? ▼
What exactly does FileHop's PDF metadata strip remove? ▼
Does the upload happen on FileHop's servers? ▼
Why do I have to remove EXIF from figures separately? ▼
Will this anonymize my manuscript guaranteed? ▼
Where did the 34% figure come from? ▼
What about preprint servers — should I delay posting to arXiv? ▼
Does FileHop work on Linux? ▼
Will this break my reviewer's ability to load the file? ▼
How do I cite my own previous work without revealing identity? ▼
Download FileHop
Mac and Windows. The PDF metadata strip, the image EXIF strip, and the .docx-to-PDF conversion all run on your computer; the manuscript never uploads. The compress + remove-metadata step is in the PDF section; the image batch metadata removal is in the Images section. Free to install.