Package Information
Documentation
n8n-nodes-mopdf
MoPDF is an n8n community node for local PDF processing. It extracts selectable text, converts PDF pages to images, and runs OCR locally with MuPDF and Tesseract.js.
The project is intended for self-hosted n8n. Document processing does not depend on remote OCR services or SaaS APIs. Depending on the runtime setup, Tesseract.js language assets may need to be available locally or downloaded separately.
Installation
Requirements:
- self-hosted n8n
- Node.js 18+
Install from Settings -> Community Nodes -> Install:
n8n-nodes-mopdf
Operations
| Operation | Input | Output | Notes |
|---|---|---|---|
| PDF to Images | PDF binary | PNG or JPEG binaries | Supports per-page export and page selection |
| OCR | Image binary | Text or Markdown | Optional word and line coordinates |
| Extract Text | PDF binary | Text, Markdown, JSON, or HTML | Uses direct PDF text extraction only |
| Text + OCR Fallback | PDF binary | Text, Markdown, JSON, or HTML | Falls back to OCR only for pages without selectable text |
Output formats
| Format | Description |
|---|---|
| Plain text | Clean extracted text without layout markup |
| Markdown | Compact, structure-aware output for humans and LLM pipelines |
| JSON | Structured extraction output with layout information |
| HTML | Raw HTML-style layout export |
Important n8n note:
- In n8n Schema and JSON views, multiline strings are shown with escaped
\nsequences because the UI displays serialized JSON. - The stored field value still contains real newline characters.
Project docs
Detailed architecture, development, publishing and community docs live in the GitHub repository.
- Repository home
- See
docs/for technical documentation - See
.github/for contribution and security guidance
Local development
Install dependencies once:
npm install
Common commands:
npm run build
npm run dev:docker:up
npm run dev:docker:reload
npm run dev:docker:logs
npm run fixtures:build
npm run fixtures:generate:windows
The Docker workflow mounts this repository directly into a local n8n container, so normal code iterations do not require reinstalling the package through the Community Nodes UI.
Detailed setup, environment variables, fixture generation and manual validation steps are documented in the GitHub repository under docs/ and tests/manual-test-plan.md.
npm package scope
The published npm package is intentionally minimal:
- built runtime files from
dist/ - package metadata from
package.json - this
README.md LICENSE
Repository-only assets such as source TypeScript files, Docker setup, fixtures, tests, scripts and GitHub community files stay in GitHub.
Licensing
This package depends on:
- MuPDF - AGPL v3
- Tesseract.js - Apache 2.0
The repository source is licensed under MIT. The installed npm package also depends on MuPDF (AGPL v3) and Tesseract.js (Apache 2.0), so review upstream obligations before redistribution. The bundled LICENSE file includes the repository license text and a dependency notice.