FileHop for Researchers
A private desktop workspace for the files you work on every day — papers, figures, scans, interview transcripts. Sits next to your reference manager, runs entirely on your machine.
Built for academic, industry, and independent researchers. Runs on Mac and Windows. No per-seat subscription, no cloud account.
Download FileHop FreeSix file workflows researchers do every day
And that your reference manager does not.
Annotate papers for a literature review
Highlight, sticky-note for marginalia, freehand for diagrams, stamp for 'READ' / 'CITE' workflow signals. Annotation Memory resumes your markup state when you reopen the file. Your reference library stays in Zotero; the markup stays on your disk.
Anonymize a manuscript for double-blind submission
Convert Word to clean PDF, strip the PDF's metadata (author, timestamps, tracked-changes residue), batch-strip EXIF on every embedded figure. Practical workflow for the 'Word File > Properties > Author' problem.
OCR a scanned research archive locally
Image-only PDFs from old papers, multipage TIFF, handwritten lab notebooks, scanned figures. Local VLM OCR (MiniCPM-V for 8GB RAM, Chandra OCR for 12GB+ with handwriting, math, tables and 40+ languages, olmOCR-2 for max accuracy) — one-time model download, no archival material leaves your machine.
Organize a mixed-file research project
PDFs, screenshots, figure exports, and draft notes. Combine the right subset into one annotated dossier for an advisor, co-author, or supervisor. Your knowledge graph (Notion, Obsidian, OneNote) stays where it is; FileHop is where files come together before they leave the lab.
Redact an interview transcript before sharing
Destructive redaction: text glyphs and image pixels inside the redaction box are removed and the output is re-walked to confirm nothing redactable survives. Anonymize participant names, locations, and identifiers in qualitative data before sharing with a co-author — IRB and FERPA hygiene at the file layer.
Prepare figures for journal submission
Batch convert PNG, JPG, and TIFF figures to journal-target format at the right DPI. Strip EXIF metadata (camera model, GPS, edit history, embedded software and user name) before submission — the under-discussed double-blind de-anonymization vector.
Where FileHop fits in your stack
Your reference manager (Zotero / Mendeley / EndNote / Papers / Citavi / Paperpile)
↓ owns citations, library, BibTeX — when a file leaves it …
FileHop — file workspace. Annotate, OCR, redact, anonymize, convert, compress, combine.
↓ goes back to your co-authors, journal, preprint server, or archive
Submission portal · journal · data repository · collaborator
FileHop sits beside your citation library, not on top of it. Your reference manager owns the bibliography. FileHop owns the file work that happens around each paper — once it leaves the citation pane and before it reaches the journal, archive, or co-author.
What 'local-first' actually solves — and what it doesn't
What local-first solves
- ✓ Exposure of unpublished manuscripts to third-party servers during routine file work.
- ✓ Peer-review confidentiality violations from pasting reviewer copies into cloud LLMs.
- ✓ IRB-restricted interview data passing through transcription APIs.
- ✓ Metadata leakage that de-anonymizes double-blind submissions (Word properties, PDF document properties, EXIF on figures).
- ✓ 'Free' online OCR or conversion sites whose terms of service grant the operator rights to your uploaded text or figures.
- ✓ Vendor lock-in to a specific PDF cloud or reference-manager file format.
What local-first does NOT solve
- — Citation management, bibliography, or BibTeX export — your reference manager owns this.
- — Qualitative-coding analysis — NVivo, Atlas.ti, MAXQDA, and Dedoose own this.
- — Statistical analysis or notebook environments — R, Stata, SPSS, Jupyter, and RStudio own this.
- — Plagiarism or similarity detection — iThenticate and Turnitin own this.
- — Systematic-review screening — Covidence, Rayyan, and DistillerSR own this.
- — Your institution's IRB protocol, data-use agreement, or data-classification policy — those remain your responsibility.
- — Verifying that your own anonymization, redaction, and metadata removal worked before submission or sharing — always check.
The FileHop tools, in research terms
The same toolkit your daily file work needs, named the way you'd say it in a lab meeting.
| Workflow | FileHop tool |
|---|---|
| Strip identifying metadata from a manuscript draft for double-blind submission | Compress PDF (with opt-in metadata removal) → |
| Hit a journal submission system's PDF size cap | Compress to 5MB → |
| Combine paper + figures + supplementary into one PDF for advisor circulation | Merge PDF → |
| Split a 600-page archive scan into chapters or sections | Split PDF → |
| Password-protect an IRB consent form for secure transport | Protect PDF → |
| Open a password-protected file a collaborator sent (with the password) | Unlock PDF → |
| Extract the methods section or a single appendix as a standalone PDF | Extract Pages → |
| Reorder appendices and supplementary sections | Reorder Pages → |
| Rotate a sideways-scanned archive page | Rotate PDF → |
| Convert a draft PDF back into an editable Word document | PDF to Word → |
| Convert legacy TIFF figures to journal-acceptable formats | TIFF to JPG → |
| Batch-convert dozens of figure exports at once | Batch Image Converter → |
| OCR a stack of image-only archival PDFs locally | Extract Text from Image (local VLM OCR) → |
How the privacy story actually works
- · Files are processed on your computer. Conversion, compression, merging, redaction, annotation, and (with a downloaded model) OCR all run locally in the FileHop desktop app — your file does not transit our servers for any of these tasks.
- · Local OCR runs as a downloadable Vision-Language Model on your machine. MiniCPM-V 2.6 for 8GB-RAM laptops; Chandra OCR for 12GB+ machines (handwriting, math, tables, 40+ languages including non-Latin scripts); olmOCR-2 as a higher-accuracy alternative. One-time model download (~5-8 GB depending on model). No prompt, page image, or extracted text leaves your device.
- · Cloud OCR is available as an opt-in alternative when your machine does not meet the local-model RAM requirement. It is off by default and asks for explicit consent per task.
- · Audio transcription runs locally using Whisper. Interview audio does not leave your computer.
- · No telemetry on file contents. We do not log what you opened, what you OCR'd, what you redacted, or what you transcribed.
- · No AI training on your files. We do not use your documents to train models.
- · Open output formats. FileHop writes standard PDF, DOCX, JPG, PNG, TIFF, MP4, CSV, and MD — no proprietary container, no lock-in.
- · One-time install, no account required for local features.
Reference reading
Cited as public reference points only — not endorsements of FileHop.
- Springer / Annals of Biomedical Engineering — hidden prompts in manuscripts
- Nature — anonymization checklist for double-blind peer review
- Elsevier — risks of AI-assisted academic writing
- noScribe (GitHub) — local offline transcription precedent for IRB-restricted material
Where researchers in different contexts put FileHop to work
Light treatment — FileHop is one workspace, not a separate product per discipline.
Academic (PhD / postdoc / faculty)
Literature-review annotation across 60+ papers, double-blind anonymization for journal submission, OCR of archival material that can't legally leave the institution.
Qualitative / social science
Interview-transcript redaction for IRB and FERPA compliance, mixed-file project organization (transcripts, field notes, consent forms), local Whisper transcription of interview audio.
STEM / lab science
Figure preparation for journal submission (TIFF / PNG / DPI / EXIF strip), batch image conversion across dozens of plots, supplementary-file combining for submission.
Medical / clinical
HIPAA-aware local workflow for IRB-restricted documents, scan-and-redact patient-adjacent material. Caveat: no compliance certification — FileHop handles the file-layer; your protocol and institutional review remain your responsibility.
Industry research / R&D
Proprietary-document handling, batch metadata strip for external sharing, offline competitive-document review without uploading to a free converter or cloud LLM.
Independent researcher / archive / library
OCR-ing legacy collections (MiniCPM-V for clean modern scans, Chandra OCR for handwriting and multi-language archival material), converting old scan formats, organizing mixed-format archives.
What FileHop is not
Honest scope keeps everyone out of trouble. FileHop is a file workspace, not a research-tech suite.
- · Not a reference manager. Keep using Zotero, Mendeley, EndNote, Papers, Citavi, or Paperpile for citations, library, and BibTeX.
- · Not an AI research assistant. We don't search papers, write literature reviews, or synthesize findings — Elicit, Scite, SciSpace, Consensus, and Undermind own that lane.
- · Not a qualitative-coding platform. Use NVivo, Atlas.ti, MAXQDA, or Dedoose for coding analysis and theme querying.
- · Not a plagiarism or similarity checker. iThenticate and Turnitin own that.
- · Not a systematic-review screening platform. Covidence, Rayyan, and DistillerSR own that.
- · Not a statistical or notebook environment. R, Stata, SPSS, Jupyter, and RStudio own that.
- · Not certified for any specific IRB protocol, HIPAA Business Associate role, GDPR processor agreement, FERPA obligation, or institutional data-classification policy. Local processing reduces certain risk categories; compliance remains your and your institution's responsibility.
- · Not a substitute for verifying your own anonymization, redaction, and metadata removal before submission or sharing.
- · Not a Linux or iPad app today. Mac and Windows desktop only.
Frequently asked questions
Is FileHop a replacement for Zotero or Mendeley?
No. FileHop is a file workspace — it sits beside your reference manager. Zotero (or Mendeley, EndNote, Papers, Citavi, Paperpile) owns your citations, library, and BibTeX. FileHop handles the file work that happens around each paper: annotate, OCR archival scans, redact transcripts, anonymize a manuscript draft, prepare figures for submission, combine a dossier for an advisor.
Can I use FileHop for unpublished manuscripts under peer review?
FileHop processes your files on your computer — they do not transit our servers for the file-handling tasks (annotate, OCR locally, redact, anonymize, convert, compress, merge). That reduces your exposure to third-party server risk for unpublished work relative to free online converters or cloud LLMs. Peer-review confidentiality, however, is your responsibility — FileHop cannot certify it for you. For reviewer copies in particular, the publisher's confidentiality requirements always govern.
Does the OCR run locally or in the cloud?
Local by default. FileHop ships downloadable Vision-Language Models (VLMs) for OCR — MiniCPM-V 2.6 for 8GB-RAM laptops, Chandra OCR for 12GB+ RAM machines (handwriting, math, tables, 40+ languages including non-Latin scripts), and olmOCR-2 as a higher-accuracy alternative. After a one-time model download (~5-8 GB depending on model), OCR runs entirely on your machine. Cloud OCR is available as an opt-in fallback when your machine does not meet the local-model RAM requirement; it is off by default and asks for explicit consent per task.
What OCR models does FileHop ship with?
Three families, picked by your RAM budget. MiniCPM-V 2.6 (Q4_K_M, fits 8GB RAM) for clean modern scans. Chandra OCR (Q4_K_M, Q5_K_M, Q8_0 variants for 12GB+, 14GB+, 16GB+ RAM) for handwriting, math, tables, and 40+ languages. olmOCR-2 7B (Q4_K_M, Q5_K_M, Q8_0) as a higher-accuracy alternative. FileHop picks a default based on your RAM; you can override it. The models are downloaded on first use, not bundled with the installer.
Can the OCR read handwriting, math, and tables?
Chandra OCR is designed for handwriting, math expressions, and table layouts — that is its strength. MiniCPM-V 2.6 is fine for clean modern typeset scans but struggles with dense handwriting. olmOCR-2 is strong on scientific layouts. Be honest with yourself about your archive: a folder of clean modern PDFs is a job for MiniCPM-V; a stack of handwritten lab notebooks is a job for Chandra. All three are downloadable; you can experiment.
Does FileHop's redaction actually destroy the underlying text?
Yes. FileHop's redaction permanently removes text glyphs, image pixels, contained vector paths, and inline images inside the redaction region, then re-walks the output to confirm nothing redactable survives. It fails closed when content cannot be redacted faithfully. For anonymizing qualitative-research transcripts (participant names, locations, identifiers) or removing author-identifying paragraphs from a double-blind manuscript draft, this is the correct behavior. As a belt-and-braces precaution for the most sensitive material, you can also rasterize the page after redaction.
Will FileHop fully anonymize my manuscript for double-blind submission?
FileHop handles the file-layer hygiene: PDF metadata strip (author, title, timestamps, modification history), batch EXIF strip on every embedded figure, and destructive redaction of any author-identifying paragraphs you mark. The full workflow is: write in Word as usual, convert to PDF through FileHop, strip the PDF's metadata, batch-strip EXIF on the figure files, then visually review for self-references, 'Smith et al. (in prep)', distinctive grant numbers, or institutional affiliations in figure footers. FileHop does not in-place scrub the .docx — convert to PDF first, then strip the PDF metadata. The interpretive review (deciding what counts as identifying in your specific submission) is yours.
Is this OK for IRB-restricted interview transcripts?
FileHop keeps your transcripts on your computer for the file-handling work (annotate, redact, organize, combine). Local Whisper transcription means the audio also stays on your computer. That reduces exposure to third-party transcription APIs and cloud LLMs — which is the specific risk category several IRBs have raised in recent years. It does not by itself satisfy your protocol's requirements; your IRB approval, data-classification policy, and institutional review continue to govern. Confirm with your IRB before changing your transcription workflow.
Does FileHop do citation extraction or BibTeX export?
No. Your reference manager handles that — Zotero's PDF-metadata retrieval, Mendeley, EndNote, Papers, Citavi, or Paperpile. FileHop is the file workspace that sits beside the reference manager; the bibliography stays in the reference manager.
How is FileHop different from Adobe Acrobat?
FileHop is a desktop-only app that runs locally without a cloud account; Acrobat is increasingly cloud-anchored and requires a per-seat subscription. FileHop is not a complete Acrobat replacement — Acrobat has features FileHop does not (e.g., certificate-backed e-signature, mature accessibility tooling). For the daily file work most researchers actually do (annotate papers, anonymize drafts, redact transcripts, prepare figures, compress for submission, OCR archives locally), FileHop covers it — and the local VLM OCR is a real advantage over Acrobat's cloud OCR for IRB-restricted material. See our Adobe Acrobat alternative comparison for details.
Will it work on my Mac, Windows, or Linux machine?
Mac and Windows — including Apple Silicon. No Linux build today. No iPad or iOS app today. If those matter for your workflow, factor that in.
Can I try FileHop before committing?
Yes — FileHop has a free tier you can install and use without a subscription. Download it from the link above. The privacy story is architectural, not policy-based — the same install you try is the same install that processes files locally.
Bring the file work back to your desk
FileHop is free to install and runs locally on Mac and Windows. Your unpublished work, IRB-restricted material, and peer-review copies stay on disk.
Download FileHop Free