PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

The node provides functionality to parse documents using a specified parsing configuration. It supports multiple input methods for the document, including binary data from a previous node, base64 encoded strings, or URLs pointing to the document file. The parsed output can be returned either as JSON data or as a text file.

This node is beneficial in scenarios where automated extraction of structured data from documents is required, such as processing invoices, contracts, reports, or any other document types that need to be programmatically analyzed and converted into usable data formats.

Practical examples:

  • Extracting invoice details like amounts, dates, and vendor information from PDF files.
  • Parsing contract documents to identify key clauses or metadata.
  • Converting scanned documents or PDFs into structured JSON for further automation workflows.

Properties

Name Meaning
Input Data Type Choose how to provide the document to parse. Options: Binary Data (document file from previous node), Base64 String (base64 encoded document content), URL (link to document file).
Input Binary Field Name of the binary property containing the document file (used only if Input Data Type is Binary Data).
Base64 Document Content Base64 encoded content of the document (used only if Input Data Type is Base64 String).
Document URL URL to the document file to parse (used only if Input Data Type is URL).
Document Name Name of the source document file for reference purposes.
Parse ID GUID identifying the parse configuration/template to use for parsing the document.
Output Format Format of the parsed document output. Options: JSON (parsed data as JSON), Text File (parsed data as a downloadable text file).
Output File Name Name for the output file when Output Format is set to Text File.
Advanced Options Collection of additional options, including "Custom Profiles" which allows setting extra API call options via JSON profiles for advanced parsing configurations.
Binary Data Output Name Custom name for the binary data field in the node's output.

Output

The node outputs an array of items corresponding to each input item processed. Each output item contains:

  • A json field with the parsed document data when the output format is JSON.
  • If the output format is Text File, the parsed content is provided as binary data under a custom-named binary property (default name "data"), representing the text file content.

Thus, the output structure adapts based on the selected output format, providing either structured JSON data or a downloadable text file.

Dependencies

  • Requires access to an external document parsing service configured with the specified parse configuration ID.
  • Needs appropriate API authentication credentials configured in n8n to communicate with the parsing service.
  • Network access to fetch documents if URLs are used as input.
  • No internal credential names are exposed; users must configure their API keys or tokens as per their environment.

Troubleshooting

  • Common issues:

    • Invalid or missing Parse ID: The node requires a valid GUID for the parsing template; ensure this is correctly provided.
    • Incorrect input data type or missing binary data: When using binary input, verify the binary property name matches the actual input.
    • Network errors when using URL input: Ensure the URL is accessible and publicly reachable by the service.
    • Malformed base64 string: Validate the base64 content is correctly encoded without corruption.
  • Error messages:

    • Errors related to authentication failures indicate misconfigured or missing API credentials.
    • Parsing errors may occur if the document format is unsupported or the parse configuration does not match the document structure.
    • File not found or inaccessible errors when using URLs suggest network or permission issues.
  • Resolutions:

    • Double-check all required parameters and credentials.
    • Test document accessibility independently before running the node.
    • Use the advanced options to customize parsing profiles if default parsing fails.

Links and References

Discussion