N8N Tools - Document Processor icon

N8N Tools - Document Processor

Process documents with OCR, text extraction and AI analysis using N8N Tools platform

Overview

The N8N Tools - Document Processor node enables automated processing of documents through OCR (Optical Character Recognition), text extraction, and AI-powered analysis. It supports multiple input sources including binary files, URLs, and Base64-encoded content. The node sends documents to the N8N Tools platform API for processing and returns extracted text, metadata, images, tables, or analyzed content in various formats.

This node is beneficial in scenarios such as:

  • Extracting searchable text from scanned PDFs or images.
  • Automating data extraction from invoices, contracts, or reports.
  • Performing AI-driven classification or summarization of document contents.
  • Integrating document processing into workflows without manual intervention.

Practical examples:

  • Automatically extracting invoice details from uploaded PDF files.
  • Processing publicly accessible URLs of documents to extract text and tables.
  • Using AI analysis to classify documents by type or topic within a workflow.

Properties

Name Meaning
Input Source Selects the source of the document to process:
• Binary File — process file from binary data in the workflow.
• URL — process document from a public URL.
• Base64 — process document from a Base64 string.
Binary Property (Required if Input Source is Binary File) Name of the binary property containing the file to process.
Document URL (Required if Input Source is URL) Publicly accessible URL of the document to process.
Document (Base64) (Required if Input Source is Base64) Base64 encoded content of the document.
Processing Options Collection of options to customize processing:
• Output Format — format of extracted content: JSON, Text, Markdown, or HTML.
• Language — OCR language code or 'auto'.
• Include Images — whether to extract images.
• Include Tables — whether to preserve table structures.
• AI Analysis — enable AI-powered content analysis.
• Extract Metadata — extract document metadata like author and creation date.
Output How to return results:
• JSON Response — return processed data as JSON.
• Binary File — return processed document as a binary file.
• Both — return both JSON data and binary file.

Output

The node outputs an array of items with the following structure depending on the selected output mode:

  • JSON Response:
    The json field contains the processed document data, which may include:

    • Extracted text or structured content (tables, images if enabled).
    • Metadata such as author, creation date (if enabled).
    • AI analysis results (if enabled).
    • Job and processing status information.
  • Binary File:
    The binary field contains the processed document file (e.g., PDF) returned from the API, prepared for further use in the workflow. The json field includes metadata about the file.

  • Both:
    Contains both the JSON data and the binary file as described above.

If the API returns no binary data when binary output is requested, a warning is included in the JSON output.

Dependencies

  • Requires an API key credential for the N8N Tools platform to authenticate requests.
  • The node communicates with the N8N Tools API endpoints over HTTP.
  • Proper network access to the API URL and any document URLs used must be ensured.
  • No additional external libraries are required beyond n8n's built-in helpers.

Troubleshooting

  • Common Issues:

    • Missing or invalid API key will cause authentication errors.
    • Incorrect binary property name or missing binary data will cause errors.
    • Document URLs must be publicly accessible; private or protected URLs will fail.
    • Large documents or complex processing may lead to timeouts or require synchronous fallback.
  • Error Messages:

    • "N8N Tools API: Invalid subscription or API key. Please check your credentials."
      Indicates authentication failure; verify API key correctness and subscription status.
    • "No binary data found under property \"<propertyName>\""
      Means the specified binary property does not exist or is empty; check the input data.
    • "Document processing failed: <error>"
      General processing failure; review error details and ensure document validity.
    • "Document processing timeout after 30 seconds..."
      Indicates async processing took too long; the node attempts synchronous fallback automatically.
    • "Unknown operation: <operation>"
      Operation parameter is invalid; select a supported operation.
  • Recommendations:

    • Ensure all required parameters are set according to the chosen input source.
    • Use small test documents initially to validate configuration.
    • Check network connectivity to the API and document URLs.
    • Enable "Continue On Fail" in the node settings to handle errors gracefully in workflows.

Links and References

Discussion