N8N Tools - Document Processor
Overview
The N8N Tools - Document Processor node enables automated processing of documents using OCR (Optical Character Recognition), text extraction, and AI-powered analysis. It supports multiple input sources including binary files, URLs, and Base64-encoded content. The node sends documents to an external API service for processing and can return results in various formats.
This node is beneficial in scenarios such as:
- Extracting searchable text from scanned PDFs or images.
- Analyzing document content with AI to classify or summarize.
- Converting documents into structured data including tables and metadata.
- Automating workflows that require document ingestion and processing without manual intervention.
Practical examples:
- Automatically extracting invoice data from PDF attachments in emails.
- Processing scanned contracts to extract key clauses and metadata.
- Performing OCR on images uploaded by users to convert them into editable text.
- Using AI analysis to categorize large batches of documents by topic or type.
Properties
| Name | Meaning |
|---|---|
| Operation | The action to perform on the document. Options: Process Document, Process from URL, Extract Text Only, OCR Document, Analyze Document, Process Sync. |
| Input Source | Source of the document input. Options: Binary File (from workflow binary data), URL (publicly accessible document URL), Base64 (Base64 encoded document content). |
| Binary Property | Name of the binary property containing the file when Input Source is Binary File. Default is "data". |
| Document URL | Publicly accessible URL of the document when Input Source is URL or operation is Process from URL. |
| Document (Base64) | Base64 encoded document content when Input Source is Base64. |
| Processing Options | Collection of options controlling processing details: • Output Format: JSON, Text, Markdown, HTML • Language: OCR language code (auto, en, pt, es, fr, de, etc.) • Include Images: whether to extract images • Include Tables: preserve table structures • AI Analysis: enable AI-powered content analysis • Extract Metadata: extract document metadata like author, creation date |
| Output | How to return processed results. Options: JSON Response (default), Binary File (processed document as binary), Both (JSON data and binary file). |
Output
The node outputs data in one of three ways depending on the selected Output property:
JSON Response: Returns a JSON object containing extracted text, metadata, AI analysis results, and other structured information about the document. This includes fields like
extractedText,content,metadata,jobId, and processing timestamps.Binary File: Returns the processed document as a binary file attached to the output item. The binary data is prepared with appropriate filename and MIME type (usually PDF).
Both: Returns both the JSON data and the binary file together in the output item.
If asynchronous processing is used, the node polls the job status up to 3 times with 10-second intervals. If the job does not complete in time, it falls back to synchronous processing to ensure results are returned.
Dependencies
- Requires an API key credential for authenticating with the external N8N Tools API service.
- The node makes HTTP requests to the configured API URL endpoint.
- The external service must be reachable and the API key valid.
- The workflow environment should allow outgoing HTTPS requests.
- No additional local dependencies are required.
Troubleshooting
Invalid subscription or API key: Error messages indicating 401 or 403 status codes mean the API key is invalid or subscription expired. Verify and update credentials.
No binary data found under property: When using binary input source, ensure the specified binary property exists and contains valid file data.
Unknown operation or input source: Check that the selected operation and input source values are supported and correctly spelled.
Document processing timeout: If the async job does not complete within ~30 seconds, the node attempts synchronous fallback. Persistent timeouts may indicate issues with the external processing queue or Redis service connectivity.
Document processing failed: Generic failure messages may include error details from the API. Review document format compatibility and API service status.
Missing or malformed document URL/Base64: Ensure URLs are publicly accessible and Base64 strings are properly encoded.
Links and References
This summary is based solely on static analysis of the provided source code and property definitions.