Overview
The PDF & Excel Processor node processes binary files of type PDF or Excel. It allows users to extract text (with or without OCR), retrieve metadata from PDFs, read data from specific worksheets in Excel files, or list all worksheets. This node is useful for automating document processing workflows, such as extracting information from scanned invoices (PDFs), digitizing paper documents using OCR, or integrating spreadsheet data into automated pipelines.
Practical examples:
- Extracting text from a batch of PDF reports and sending the content to an email or database.
- Using OCR to process scanned PDF receipts and extract relevant fields.
- Reading sales data from a specific worksheet in an Excel file and pushing it to a CRM system.
Properties
| Name | Type | Meaning |
|---|---|---|
| File Type | options | Select whether the input file is a PDF or Excel document. |
| Binary Property | string | The name of the binary property containing the file data (e.g., "data"). |
Output
- The output is a JSON object with the following structure:
- All original
jsonproperties from the input item are preserved. - A new field
pdfResultsis added, which contains:- The result of the selected operation (e.g., extracted text, metadata, worksheet data).
operation: The operation performed.success: Boolean indicating if the operation succeeded.timestamp: ISO timestamp of when the operation was performed.
- All original
- The original binary data is passed through unchanged.
Example output:
{
"json": {
"...": "original input fields",
"pdfResults": {
"...": "operation-specific results",
"operation": "extractText",
"success": true,
"timestamp": "2024-06-01T12:34:56.789Z"
}
},
"binary": {
"...": "original binary data"
}
}
Dependencies
- No external API keys required.
- Relies on internal processor modules (
ProcessorFactory) for handling PDF and Excel operations. - Requires the binary data to be present in the specified property.
Troubleshooting
Common issues:
No binary data found:
Error:"No binary data found"
Resolution: Ensure that the incoming item has a binary property with the expected name.Binary property 'X' not found:
Error:"Binary property 'X' not found"
Resolution: Double-check the value set for "Binary Property" and ensure the input item contains this property.Binary data in property 'X' is invalid or missing data content:
Error:"Binary data in property 'X' is invalid or missing data content"
Resolution: Make sure the binary data is correctly attached and not empty.Failed to create buffer from binary data:
Error:"Failed to create buffer from binary data: ..."
Resolution: The binary data may be corrupted or not properly base64-encoded.