mopdf

n8n community node for PDF processing - convert PDF to images, extract text and run OCR

Package Information

Downloads: 2,094 weekly / 7,882 monthly
Latest Version: 0.3.1
Author: Kayjix

Documentation

n8n-nodes-mopdf

MoPDF is an n8n community node for local PDF processing. It extracts selectable text, converts PDF pages to images, and runs OCR locally with MuPDF and Tesseract.js.

The project is intended for self-hosted n8n. Document processing does not depend on remote OCR services or SaaS APIs. Depending on the runtime setup, Tesseract.js language assets may need to be available locally or downloaded separately.

Installation

Requirements:

  • self-hosted n8n
  • Node.js 18+

Install from Settings -> Community Nodes -> Install:

n8n-nodes-mopdf

Operations

Operation Input Output Notes
PDF to Images PDF binary PNG or JPEG binaries Supports per-page export and page selection
OCR Image binary Text or Markdown Optional word and line coordinates
Extract Text PDF binary Text, Markdown, JSON, or HTML Uses direct PDF text extraction only
Text + OCR Fallback PDF binary Text, Markdown, JSON, or HTML Falls back to OCR only for pages without selectable text

Output formats

Format Description
Plain text Clean extracted text without layout markup
Markdown Compact, structure-aware output for humans and LLM pipelines
JSON Structured extraction output with layout information
HTML Raw HTML-style layout export

Important n8n note:

  • In n8n Schema and JSON views, multiline strings are shown with escaped \n sequences because the UI displays serialized JSON.
  • The stored field value still contains real newline characters.

Project docs

Detailed architecture, development, publishing and community docs live in the GitHub repository.

  • Repository home
  • See docs/ for technical documentation
  • See .github/ for contribution and security guidance

Local development

Install dependencies once:

npm install

Common commands:

npm run build
npm run dev:docker:up
npm run dev:docker:reload
npm run dev:docker:logs
npm run fixtures:build
npm run fixtures:generate:windows

The Docker workflow mounts this repository directly into a local n8n container, so normal code iterations do not require reinstalling the package through the Community Nodes UI.

Detailed setup, environment variables, fixture generation and manual validation steps are documented in the GitHub repository under docs/ and tests/manual-test-plan.md.

npm package scope

The published npm package is intentionally minimal:

  • built runtime files from dist/
  • package metadata from package.json
  • this README.md
  • LICENSE

Repository-only assets such as source TypeScript files, Docker setup, fixtures, tests, scripts and GitHub community files stay in GitHub.

Licensing

This package depends on:

The repository source is licensed under MIT. The installed npm package also depends on MuPDF (AGPL v3) and Tesseract.js (Apache 2.0), so review upstream obligations before redistribution. The bundled LICENSE file includes the repository license text and a dependency notice.

Discussion