PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

The node provides functionality to convert PDF documents into Excel spreadsheets. It supports multiple input methods for the PDF file, including binary data from a previous node, base64 encoded content, or a URL pointing to the PDF. The conversion process can be customized by selecting quality levels (Draft or Quality), enabling OCR for scanned PDFs, and choosing whether to merge all sheets into one or keep them separate. This node is useful in scenarios where tabular data embedded in PDFs needs to be extracted for further analysis, reporting, or integration with spreadsheet-based workflows.

Practical examples:

  • Extracting financial tables from monthly PDF reports into Excel for data analysis.
  • Converting scanned invoices or receipts into editable Excel files using OCR.
  • Automating data extraction from PDF forms submitted online and converting them into structured Excel sheets.

Properties

Name Meaning
Input Data Type Choose how to provide the PDF file to convert to Excel. Options: Binary Data (from previous node), Base64 String (provide encoded content), URL (link to PDF file).
Input Binary Field Name of the binary property containing the PDF file when using Binary Data input type (usually "data").
Base64 PDF Content Base64 encoded string representing the PDF document content, used when Input Data Type is Base64 String.
PDF URL URL to the PDF file to convert, used when Input Data Type is URL.
Quality Type Select conversion quality: Draft (faster, suitable for simple PDFs with clear tables) or Quality (slower but more accurate, better for complex layouts).
Language OCR language setting for text recognition in images or scanned PDFs (e.g., English).
Merge All Sheets Boolean option to combine all Excel sheets into one sheet (true) or keep them as separate sheets (false).
Output Format Boolean option to preserve original formatting when possible during conversion.
OCR When Needed Boolean option to enable OCR (Optical Character Recognition) for scanned PDFs that require text extraction.
Output File Name Name for the resulting Excel output file (default: "PDF_to_EXCEL_output.xlsx").
Document Name Name of the source PDF file for reference purposes (default: "output.pdf").

Output

The node outputs JSON data representing the result of the PDF to Excel conversion. Typically, this includes the Excel file content either as binary data attached to the output item or as a downloadable file. The output file respects the naming specified in the "Output File Name" property.

If the node supports binary output, it will contain the Excel file in binary form, which can be passed to subsequent nodes for saving, emailing, or further processing.

Dependencies

  • Requires access to an external PDF processing service or API capable of converting PDFs to Excel format, including OCR capabilities.
  • Needs proper authentication credentials (such as an API key or token) configured in n8n to interact with the external service.
  • Network access to fetch PDF files if URLs are used as input.
  • The OCR language setting requires the service to support the specified language.

Troubleshooting

  • Common issues:

    • Invalid or inaccessible PDF URL leading to download failures.
    • Incorrect binary property name causing the node to not find the PDF file in input data.
    • Unsupported PDF formats or encrypted PDFs may cause conversion errors.
    • OCR failures if the language is not supported or the PDF quality is too low.
    • Large or complex PDFs might lead to timeouts or slow processing.
  • Error messages and resolutions:

    • "File not found in binary property": Verify the binary property name matches the actual input.
    • "Failed to download PDF from URL": Check URL accessibility and network connectivity.
    • "Conversion failed due to unsupported format": Ensure the PDF is not corrupted or encrypted.
    • "OCR language not supported": Use a supported language code or disable OCR if not needed.
    • "API authentication error": Confirm that the API key or credentials are correctly set up in n8n.

Links and References

Discussion