Package Information
Released: 9/16/2025
Downloads: 34 weekly / 202 monthly
Latest Version: 4.4.2
Author: N8N Tools
Available Nodes
Documentation
N8N Tools - Document Processor
Process and analyze documents with OCR, text extraction, and format conversion capabilities. This N8N community node provides comprehensive document processing through the N8N Tools platform.
✨ Features
- 📄 Text Extraction: Extract text from various document formats
- 🔍 OCR Processing: Extract text from images and scanned documents
- 🔄 Format Conversion: Convert between PDF, DOCX, TXT, HTML, MD, RTF
- 📊 Metadata Extraction: Get document properties and information
- ✂️ Page Splitting: Split documents into individual pages
- 🔗 Document Merging: Combine multiple documents
- 🌍 Multi-language OCR: Support for Portuguese, English, Spanish, French, German
- 💰 Cost Tracking: Usage monitoring and budget controls
🚀 Quick Start
Installation
Install this node in your N8N instance:
Via Community Nodes (Recommended)
- Go to Settings > Community Nodes in your N8N interface
- Click Install a community node
- Enter
n8n-nodes-n8ntools-document-processor - Click Install
Via npm
npm install n8n-nodes-n8ntools-document-processor
Setup Credentials
- Sign up at N8N Tools and get your API key
- In N8N, create new N8N Tools API credentials
- Enter your API URL:
https://api.n8ntools.io - Enter your API key
📖 Usage
Supported Operations
| Operation | Description | Input | Output |
|---|---|---|---|
| Extract Text | Extract text content | PDF, DOCX, DOC, RTF | Plain text |
| Extract Metadata | Get document properties | Any document | JSON metadata |
| Convert Format | Change document format | Various formats | PDF, DOCX, TXT, HTML, MD, RTF |
| Split Pages | Split into individual pages | PDF, DOCX | ZIP with pages |
| Merge Documents | Combine multiple documents | Multiple files | Single document |
| OCR Processing | Extract text from images | PDF, images | Text with OCR |
Example Workflow
[File Trigger] → [N8N Tools Document Processor] → [Extract Data] → [Database/Email]
Configuration Example
Invoice Text Extraction:
{
"operation": "extractText",
"inputSource": "binaryData",
"binaryPropertyName": "data",
"advancedOptions": {
"extractImages": true,
"extractTables": true,
"preserveFormatting": true
}
}
⚙️ Node Parameters
Input Configuration
- Input Source: Binary Data, File URL, or Base64
- Binary Property: Name of binary property (default: "data")
- File URL: Direct URL to document file
- Base64 Data: Base64 encoded document content
Operation-Specific Options
Format Conversion
- Target Format: PDF, DOCX, TXT, HTML, MD, RTF
Page Splitting
- Page Range: Specific pages (e.g., "1-5") or "all"
OCR Processing
- Language: Portuguese, English, Spanish, French, German, Auto-detect
Advanced Options
- Extract Images: Include images from document
- Extract Tables: Parse table data
- Preserve Formatting: Maintain original formatting
- Password: For password-protected documents
📤 Output Data
Text Extraction Result
{
"text": "This is the extracted text content...",
"wordCount": 1250,
"pageCount": 3,
"hasImages": true,
"hasTables": true,
"images": [
{
"page": 1,
"base64": "iVBORw0KGgoAAAANSUhEUgAA...",
"format": "png"
}
],
"tables": [
{
"page": 2,
"rows": 5,
"columns": 3,
"data": [["Header1", "Header2", "Header3"], ...]
}
],
"success": true,
"operation": "extractText",
"creditsUsed": 2,
"originalFilename": "invoice.pdf"
}
Format Conversion Result
Returns the converted document as binary data with metadata:
{
"success": true,
"operation": "convertFormat",
"originalFilename": "document.pdf",
"convertedFilename": "document.docx",
"targetFormat": "docx",
"creditsUsed": 1
}
Metadata Extraction Result
{
"filename": "report.pdf",
"fileSize": 2048000,
"mimeType": "application/pdf",
"pageCount": 15,
"author": "John Doe",
"title": "Annual Report 2024",
"subject": "Company Performance",
"keywords": ["business", "report", "annual"],
"creationDate": "2024-01-15T10:30:00Z",
"modificationDate": "2024-01-16T14:20:00Z",
"hasPassword": false,
"isEncrypted": false,
"success": true
}
🔧 Supported File Formats
Input Formats
- PDF: PDF documents (including password-protected)
- Microsoft Word: DOCX, DOC
- Text: TXT, RTF
- Web: HTML, XML
- Images: PNG, JPG, TIFF (for OCR)
Output Formats
- PDF: Portable Document Format
- DOCX: Microsoft Word (newer format)
- TXT: Plain text
- HTML: HyperText Markup Language
- MD: Markdown
- RTF: Rich Text Format
🔍 OCR Capabilities
Supported Languages
- Portuguese (
por): Optimized for Brazilian Portuguese - English (
eng): US and UK English - Spanish (
spa): Latin American and Iberian Spanish - French (
fra): French language support - German (
deu): German language support - Auto-detect (
auto): Automatic language detection
OCR Example
{
"operation": "ocrProcessing",
"inputSource": "fileUrl",
"fileUrl": "https://example.com/scanned-invoice.pdf",
"ocrLanguage": "por",
"advancedOptions": {
"extractTables": true,
"preserveFormatting": true
}
}
🛠️ Advanced Use Cases
Invoice Processing Pipeline
[Email Trigger] → [Download Attachment] → [Extract Text] → [Parse Data] → [Update CRM]
Document Classification
[File Upload] → [Extract Metadata] → [Classify Type] → [Route to Process]
Bulk Document Conversion
[File Monitor] → [Document Processor] → [Convert to PDF] → [Archive]
Contract Analysis
[Document Input] → [Extract Text] → [Find Key Terms] → [Generate Summary]
📊 Processing Examples
Extract Contract Details
// Extract specific information from legal documents
{
"operation": "extractText",
"advancedOptions": {
"extractTables": true,
"preserveFormatting": true
}
}
// Then use regex or NLP to find specific clauses
Convert Legacy Documents
// Convert old DOC files to modern formats
{
"operation": "convertFormat",
"targetFormat": "docx"
}
Process Scanned Forms
// OCR processing for form data extraction
{
"operation": "ocrProcessing",
"ocrLanguage": "eng",
"advancedOptions": {
"extractTables": true // For form fields
}
}
💸 Pricing & Limits
- Text Extraction: 1 credit per document
- Format Conversion: 1 credit per conversion
- OCR Processing: 2 credits per document
- Page Splitting: 1 credit per document
- Document Merging: 1 credit per operation
- File Size Limit: 100MB per document
- Page Limit: 500 pages per document
🚨 Error Handling
Common errors and solutions:
// Password-protected document
{
"error": "Document is password protected",
"success": false,
"suggestion": "Provide password in advancedOptions"
}
// Unsupported format
{
"error": "Unsupported file format: .xyz",
"success": false,
"suggestion": "Check supported input formats"
}
// OCR language not detected
{
"error": "Could not detect document language",
"success": false,
"suggestion": "Specify OCR language manually"
}
Password-Protected Documents
{
"advancedOptions": {
"password": "your-document-password"
}
}
🔄 Integration Examples
With PDF Generator
[Data] → [Generate PDF] → [Extract Text] → [Validate Content]
With Web Scraper
[Scrape URLs] → [Download PDFs] → [Process Documents] → [Store Data]
With Email
[Email Attachment] → [Process Document] → [Extract Key Info] → [Reply with Summary]
🔗 Related Packages
- PDF Generator: Create PDFs from processed data
- Web Scraper: Scrape documents from websites
📋 Requirements
- N8N version 0.174.0 or higher
- N8N Tools account and API key
- Node.js 18+ (for development)
🆘 Support
- 📧 Email: support@n8ntools.io
- 📖 Documentation: docs.n8ntools.io
- 💬 Community: Discord
- 🐛 Issues: GitHub
📄 License
MIT License - see LICENSE file for details.
Part of the N8N Tools ecosystem • Website • All Packages