PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

The node provides functionality to extract tables from PDF documents. It supports multiple input methods for the PDF file, including binary data from a previous node, a base64 encoded string, or a URL pointing to the PDF file. This flexibility allows users to integrate table extraction into various workflows where PDFs may come from different sources.

Common scenarios where this node is beneficial include:

Automating data extraction from invoices, reports, or forms contained in PDFs.
Extracting tabular data for further processing, analysis, or storage in databases.
Integrating with document management systems to parse and index table content automatically.

For example, a user could upload a PDF invoice as binary data, then use this node to extract the invoice line items table for accounting automation.

Properties

Name	Meaning
Input Data Type	Choose how to provide the PDF file to extract tables from. Options: • Binary Data (use PDF file from previous node) • Base64 String (provide PDF content as base64 encoded string) • URL (provide URL to PDF file)
Input Binary Field	Name of the binary property that contains the PDF file (usually "data" for file uploads). Required if Input Data Type is Binary Data.
Base64 PDF Content	Base64 encoded PDF document content. Required if Input Data Type is Base64 String.
PDF URL	URL to the PDF file to extract tables from. Required if Input Data Type is URL.
Document Name	Name of the document used for processing. Defaults to "document.pdf".
Advanced Options	Collection of additional options for customizing the extraction process. For example, you can specify custom profiles in JSON format to adjust API call properties or enable specific features supported by the underlying service.

Output

The node outputs JSON data representing the extracted tables from the PDF document. The structure typically includes rows and columns corresponding to the tables found within the PDF. Each item in the output array corresponds to one input item processed.

If the node supports binary output (not explicitly shown here), it would represent extracted data or processed files in binary form, but based on the provided code and properties, the main output is structured JSON data describing the extracted tables.

Dependencies

Requires an external PDF processing API/service capable of extracting tables from PDF documents.
Needs appropriate API credentials or authentication tokens configured in n8n to access the PDF processing service.
Network access is required if using the URL input method to fetch the PDF file.

Troubleshooting

Common Issues:
- Providing incorrect or inaccessible URLs will cause failures in fetching the PDF.
- Incorrect base64 encoding or corrupted binary data will result in extraction errors.
- Missing required input fields depending on the selected input data type.
Error Messages:
- Errors related to invalid PDF format or unreadable content usually indicate issues with the input file.
- Authentication or permission errors suggest misconfigured API credentials.
- Timeout or network errors when using URL input indicate connectivity problems.
Resolutions:
- Verify the correctness and accessibility of the PDF source (binary, base64, or URL).
- Ensure API credentials are valid and have necessary permissions.
- Use the "Document Name" property to help identify files during troubleshooting.

Links and References

PDF4me API Documentation — Reference for advanced options and custom profiles.
General information on PDF table extraction techniques and best practices can be found in various PDF processing libraries and services documentation.