Actions80
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Image Watermark To Image
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- Add Text Watermark To Image
- AI-Invoice Parser
- AI-Process Contract
- AI-Process HealthCard
- Classify Document
- Compress Image
- Compress PDF
- Convert HTML To PDF
- Convert Image Format
- Convert JSON To Excel
- Convert Markdown To PDF
- Convert PDF To Editable PDF Using OCR
- Convert PDF To Excel
- Convert PDF To PowerPoint
- Convert PDF To Word
- Convert To PDF
- Convert URL to PDF
- Convert VISIO
- Convert Word to PDF Form
- Create Images From PDF
- Create PDF/A
- Create Swiss QR Bill
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Extract Attachment From PDF
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Resources
- Extract Table From PDF
- Extract Text By Expression
- Extract Text From Word
- Fill PDF Form
- Find And Replace Text
- Flip Image
- Flatten PDF
- Generate Barcode
- Generate Document Single
- Generate Documents Multiple
- Get Document From Pdf4me
- Get Image Metadata
- Get PDF Metadata
- Get Tracking Changes In Word
- Image Extract Text
- Linearize PDF
- Merge Multiple PDFs
- Overlay PDFs
- Parse Document
- Protect PDF
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Remove EXIF Tags From Image
- Repair PDF Document
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Document
- Rotate Image
- Rotate Image By EXIF Data
- Rotate PDF Page
- Sign PDF
- Split PDF By Barcode
- Split PDF By Swiss QR
- Split PDF By Text
- Split PDF Regular
- Unlock PDF
- Update Hyperlinks Annotation
- Upload File To PDF4me
Overview
The node provides functionality to parse documents using a specified parsing configuration. It supports multiple input methods for the document, including binary data from a previous node, base64 encoded strings, or URLs pointing to the document file. The parsed output can be returned either as JSON data or as a text file.
This node is beneficial in scenarios where automated extraction of structured data from documents is required, such as processing invoices, contracts, reports, or any other document types that need to be programmatically analyzed and converted into usable data formats.
Practical examples:
- Extracting invoice details like amounts, dates, and vendor information from PDF files.
- Parsing contract documents to identify key clauses or metadata.
- Converting scanned documents or PDFs into structured JSON for further automation workflows.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the document to parse. Options: Binary Data (document file from previous node), Base64 String (base64 encoded document content), URL (link to document file). |
| Input Binary Field | Name of the binary property containing the document file (used only if Input Data Type is Binary Data). |
| Base64 Document Content | Base64 encoded content of the document (used only if Input Data Type is Base64 String). |
| Document URL | URL to the document file to parse (used only if Input Data Type is URL). |
| Document Name | Name of the source document file for reference purposes. |
| Parse ID | GUID identifying the parse configuration/template to use for parsing the document. |
| Output Format | Format of the parsed document output. Options: JSON (parsed data as JSON), Text File (parsed data as a downloadable text file). |
| Output File Name | Name for the output file when Output Format is set to Text File. |
| Advanced Options | Collection of additional options, including "Custom Profiles" which allows setting extra API call options via JSON profiles for advanced parsing configurations. |
| Binary Data Output Name | Custom name for the binary data field in the node's output. |
Output
The node outputs an array of items corresponding to each input item processed. Each output item contains:
- A
jsonfield with the parsed document data when the output format is JSON. - If the output format is Text File, the parsed content is provided as binary data under a custom-named binary property (default name "data"), representing the text file content.
Thus, the output structure adapts based on the selected output format, providing either structured JSON data or a downloadable text file.
Dependencies
- Requires access to an external document parsing service configured with the specified parse configuration ID.
- Needs appropriate API authentication credentials configured in n8n to communicate with the parsing service.
- Network access to fetch documents if URLs are used as input.
- No internal credential names are exposed; users must configure their API keys or tokens as per their environment.
Troubleshooting
Common issues:
- Invalid or missing Parse ID: The node requires a valid GUID for the parsing template; ensure this is correctly provided.
- Incorrect input data type or missing binary data: When using binary input, verify the binary property name matches the actual input.
- Network errors when using URL input: Ensure the URL is accessible and publicly reachable by the service.
- Malformed base64 string: Validate the base64 content is correctly encoded without corruption.
Error messages:
- Errors related to authentication failures indicate misconfigured or missing API credentials.
- Parsing errors may occur if the document format is unsupported or the parse configuration does not match the document structure.
- File not found or inaccessible errors when using URLs suggest network or permission issues.
Resolutions:
- Double-check all required parameters and credentials.
- Test document accessibility independently before running the node.
- Use the advanced options to customize parsing profiles if default parsing fails.
Links and References
- PDF4me Developer Profiles Documentation — for customizing parsing profiles and advanced options.
- General documentation on document parsing APIs and best practices for preparing input data.