PDF & Excel Processor

Process PDF and Excel files

Overview

The PDF & Excel Processor node processes binary files of type PDF or Excel. It allows users to extract text (with or without OCR), retrieve metadata from PDFs, read data from specific worksheets in Excel files, or list all worksheets. This node is useful for automating document processing workflows, such as extracting information from scanned invoices (PDFs), digitizing paper documents using OCR, or integrating spreadsheet data into automated pipelines.

Practical examples:

  • Extracting text from a batch of PDF reports and sending the content to an email or database.
  • Using OCR to process scanned PDF receipts and extract relevant fields.
  • Reading sales data from a specific worksheet in an Excel file and pushing it to a CRM system.

Properties

Name Type Meaning
File Type options Select whether the input file is a PDF or Excel document.
Binary Property string The name of the binary property containing the file data (e.g., "data").

Output

  • The output is a JSON object with the following structure:
    • All original json properties from the input item are preserved.
    • A new field pdfResults is added, which contains:
      • The result of the selected operation (e.g., extracted text, metadata, worksheet data).
      • operation: The operation performed.
      • success: Boolean indicating if the operation succeeded.
      • timestamp: ISO timestamp of when the operation was performed.
  • The original binary data is passed through unchanged.

Example output:

{
  "json": {
    "...": "original input fields",
    "pdfResults": {
      "...": "operation-specific results",
      "operation": "extractText",
      "success": true,
      "timestamp": "2024-06-01T12:34:56.789Z"
    }
  },
  "binary": {
    "...": "original binary data"
  }
}

Dependencies

  • No external API keys required.
  • Relies on internal processor modules (ProcessorFactory) for handling PDF and Excel operations.
  • Requires the binary data to be present in the specified property.

Troubleshooting

Common issues:

  • No binary data found:
    Error: "No binary data found"
    Resolution: Ensure that the incoming item has a binary property with the expected name.

  • Binary property 'X' not found:
    Error: "Binary property 'X' not found"
    Resolution: Double-check the value set for "Binary Property" and ensure the input item contains this property.

  • Binary data in property 'X' is invalid or missing data content:
    Error: "Binary data in property 'X' is invalid or missing data content"
    Resolution: Make sure the binary data is correctly attached and not empty.

  • Failed to create buffer from binary data:
    Error: "Failed to create buffer from binary data: ..."
    Resolution: The binary data may be corrupted or not properly base64-encoded.

Links and References

Discussion