N8N Tools - Document Processor icon

N8N Tools - Document Processor

Process documents with OCR, text extraction and AI analysis using N8N Tools platform

Overview

The N8N Tools - Document Processor node enables automated processing of documents using OCR (Optical Character Recognition), text extraction, and AI-powered analysis. It supports multiple input sources including binary files, URLs, and Base64-encoded content. The node sends documents to an external API service for processing and can return results in various formats.

This node is beneficial in scenarios such as:

  • Extracting searchable text from scanned PDFs or images.
  • Performing AI-driven classification or content analysis on documents.
  • Converting documents into structured data including tables and metadata.
  • Automating workflows that require document digitization and understanding.

Practical examples:

  • Automatically extracting invoice data from uploaded PDF files.
  • Analyzing contracts or legal documents for key clauses using AI.
  • Processing images of receipts captured via mobile devices to extract text and tables.
  • Archiving documents with embedded metadata and searchable content.

Properties

Name Meaning
Operation The type of document operation to perform. Options: Process Document, Process from URL, Extract Text Only, OCR Document, Analyze Document, Process Sync.
Input Source Source of the document input. Options: Binary File (from workflow binary data), URL (publicly accessible document URL), Base64 (Base64 encoded document content).
Binary Property Name of the binary property containing the file when Input Source is Binary File. Default is "data".
Document URL Publicly accessible URL of the document when Input Source is URL or when using the Process from URL operation.
Document (Base64) Base64 encoded document content when Input Source is Base64.
Processing Options Collection of options controlling processing details:
• Output Format: JSON, Text, Markdown, HTML
• Language: OCR language code (auto, en, pt, es, fr, de, etc.)
• Include Images: whether to extract images
• Include Tables: preserve table structures
• AI Analysis: enable AI-powered content analysis
• Extract Metadata: extract document metadata like author, creation date
Output How to return processed results. Options: JSON Response (processed data as JSON), Binary File (processed document as binary), Both (JSON data and binary file).

Output

The node outputs data in one of three ways depending on the selected Output property:

  • JSON Response: Outputs a JSON object containing extracted text, metadata, AI analysis results, and other structured information returned by the API. Includes fields like extractedText, content, metadata, jobId, and processing timestamps.

  • Binary File: Outputs the processed document as a binary file attached to the output item. The binary data is base64-decoded and prepared with appropriate filename and MIME type (usually PDF).

  • Both: Outputs both the JSON response and the binary file together in the same item.

Additionally, each output includes metadata about the operation, success status, timestamp, and node version.

Dependencies

  • Requires an API key credential for the N8N Tools API platform.
  • The node communicates with the external N8N Tools API endpoints over HTTPS.
  • The API URL and API key must be configured in the node credentials.
  • The external service handles document processing asynchronously; the node polls job status up to 30 seconds before falling back to synchronous processing.
  • Network connectivity to the API endpoint is required.
  • No internal Redis or queue configuration is needed on the user side, but timeouts may indicate backend service issues.

Troubleshooting

  • No binary data found under property: Occurs if the specified binary property does not exist or contains no data. Verify the binary property name matches the incoming data.

  • Invalid subscription or API key: Indicates authentication failure with the external API. Check that the API key credential is valid and has proper permissions.

  • Document processing failed: Generic error returned if the API reports failure during processing. Review the error message for details.

  • Document processing timeout after 30 seconds: The asynchronous job did not complete in time. This may indicate backend queue or Redis service issues. The node attempts synchronous fallback but if that also fails, check the external service status.

  • Unknown operation or input source: Configuration errors where unsupported operations or input sources are selected.

  • No binary data returned from API: When requesting binary output but the API response lacks binary content. May indicate processing issues or unsupported document types.

To resolve most issues:

  • Confirm all required properties are set correctly.
  • Validate API credentials.
  • Ensure the document input is accessible and properly formatted.
  • Check network connectivity.
  • Retry or contact the API provider if backend errors persist.

Links and References

Discussion