N8N Tools - Document Processor
Overview
The N8N Tools - Document Processor node enables automated processing of documents using OCR (Optical Character Recognition), text extraction, and AI-powered analysis. It supports multiple input sources including binary files, URLs, and Base64-encoded content. The node sends documents to an external API service for processing and can handle asynchronous job polling with fallback to synchronous processing if needed.
This node is beneficial in scenarios such as:
- Extracting structured text and tables from scanned PDFs or images.
- Performing OCR on image-based documents to convert them into searchable text.
- Analyzing document content with AI to classify or summarize information.
- Automating workflows that require metadata extraction or content transformation into various formats like JSON, Markdown, or HTML.
Practical examples:
- Automatically extracting invoice data from PDF attachments received via email.
- Converting scanned contracts into editable text with preserved table structures.
- Generating summarized reports from large documents using AI analysis.
- Archiving documents with extracted metadata and searchable content.
Properties
| Name | Meaning |
|---|---|
| Document URL | Publicly accessible URL of the document to process. |
| Processing Options | Collection of options controlling how the document is processed: |
- Output Format: Format of extracted content (json, text, markdown, html). |
|
- Language: OCR language code (e.g., auto, en, pt, es, fr, de). |
|
- Include Images: Whether to extract and include images from the document (true/false). |
|
- Include Tables: Whether to extract and preserve table structures (true/false). |
|
- AI Analysis: Enable AI-powered content analysis (true/false). |
|
- Extract Metadata: Extract document metadata such as author and creation date (true/false). |
|
| Output | How to return the processed results: |
| - JSON Response: Return processed data as JSON. | |
| - Binary File: Return processed document as a binary file. | |
| - Both: Return both JSON data and binary file. |
Output
The node outputs data depending on the selected output mode:
JSON Response: Outputs a JSON object containing the processed document data, which may include extracted text, tables, images (if requested), metadata, AI analysis results, and job-related information such as job ID and processing time.
Binary File: Outputs the processed document as a binary file attached to the output item. This typically represents the processed document in PDF or another format returned by the API.
Both: Outputs both the JSON data and the binary file together in the same item.
Common fields in the JSON output include:
- Extracted textual content or structured data.
- Metadata about the document.
- Job identifiers and processing timestamps.
- Flags indicating success or fallback usage.
If binary data is included, it is base64-decoded and prepared as binary data for downstream nodes.
Dependencies
- Requires an API key credential for authenticating with the external N8N Tools API platform.
- The node communicates with the N8N Tools API endpoints over HTTPS.
- Proper network access to the API URL and publicly accessible document URLs is necessary.
- No additional local dependencies; all processing is done remotely via the API.
Troubleshooting
Invalid subscription or API key: If the API returns 401 or 403 errors during validation, verify that the API key credential is correct and active.
No binary data found under property: When using binary input source, ensure the specified binary property exists and contains valid data.
Unknown operation or input source: Confirm that the selected operation and input source are supported and correctly configured.
Document processing timeout: If asynchronous processing times out (after ~30 seconds), the node attempts synchronous fallback. Persistent timeouts may indicate issues with the API's Redis service or processing queue availability.
Document processing failed: Errors returned from the API during processing will be surfaced. Check the error message for details and verify document accessibility and format.
Network or connectivity issues: Ensure the node can reach the API endpoint and the document URL is publicly accessible.
Links and References
- N8N Documentation
- N8N Tools Platform API Documentation (hypothetical link based on context)
- OCR Language Codes Reference
- Working with Binary Data in n8n