Actions80
- Extract Text From Word
- Find And Replace Text
- Convert PDF To Editable PDF Using OCR
- Create Swiss QR Bill
- Split PDF By Barcode
- Split PDF By Swiss QR
- Split PDF By Text
- Split PDF Regular
- Create PDF/A
- Convert HTML To PDF
- Convert Markdown To PDF
- Upload File To PDF4me
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Fill PDF Form
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- AI-Invoice Parser
- AI-Process HealthCard
- AI-Process Contract
- Generate Barcode
- Classify Document
- Parse Document
- Linearize PDF
- Flatten PDF
- Convert To PDF
- Json To Excel
- Convert PDF To Excel
- Convert PDF To Word
- Convert PDF To PowerPoint
- Convert VISIO
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Extract Pages
- Merge Multiple PDFs
- Overlay PDFs
- Rotate Document
- Rotate Page
- Sign PDF
- URL to PDF
- Add Image Watermark To Image
- Add Text Watermark To Image
- Compress Image
- Convert Image Format
- Create Images From PDF
- Flip Image
- Get Image Metadata
- Image Extract Text
- Remove EXIF Tags From Image
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Image
- Rotate Image By EXIF Data
- Compress PDF
- Get PDF Metadata
- Repair PDF Document
- Get Document From Pdf4me
- Update Hyperlinks Annotation
- Protect Document
- Unlock PDF
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Generate Document Single
- Generate Documents Multiple
- Get Tracking Changes In Word
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Attachment From PDF
- Extract Text By Expression
- Extract Table From PDF
- Extract Resources
Overview
The node provides functionality to extract tables from PDF documents. It supports multiple input methods for the PDF file, including binary data from a previous node, a base64 encoded string, or a URL pointing to the PDF file. This flexibility allows users to integrate table extraction into various workflows where PDFs may come from different sources.
Common scenarios where this node is beneficial include:
- Automating data extraction from invoices, reports, or forms contained in PDFs.
- Extracting tabular data for further processing, analysis, or storage in databases.
- Integrating with document management systems to parse and index table content automatically.
For example, a user could upload a PDF invoice as binary data, then use this node to extract the invoice line items table for accounting automation.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the PDF file to extract tables from. Options: • Binary Data (use PDF file from previous node) • Base64 String (provide PDF content as base64 encoded string) • URL (provide URL to PDF file) |
| Input Binary Field | Name of the binary property that contains the PDF file (usually "data" for file uploads). Required if Input Data Type is Binary Data. |
| Base64 PDF Content | Base64 encoded PDF document content. Required if Input Data Type is Base64 String. |
| PDF URL | URL to the PDF file to extract tables from. Required if Input Data Type is URL. |
| Document Name | Name of the document used for processing. Defaults to "document.pdf". |
| Advanced Options | Collection of additional options for customizing the extraction process. For example, you can specify custom profiles in JSON format to adjust API call properties or enable specific features supported by the underlying service. |
Output
The node outputs JSON data representing the extracted tables from the PDF document. The structure typically includes rows and columns corresponding to the tables found within the PDF. Each item in the output array corresponds to one input item processed.
If the node supports binary output (not explicitly shown here), it would represent extracted data or processed files in binary form, but based on the provided code and properties, the main output is structured JSON data describing the extracted tables.
Dependencies
- Requires an external PDF processing API/service capable of extracting tables from PDF documents.
- Needs appropriate API credentials or authentication tokens configured in n8n to access the PDF processing service.
- Network access is required if using the URL input method to fetch the PDF file.
Troubleshooting
Common Issues:
- Providing incorrect or inaccessible URLs will cause failures in fetching the PDF.
- Incorrect base64 encoding or corrupted binary data will result in extraction errors.
- Missing required input fields depending on the selected input data type.
Error Messages:
- Errors related to invalid PDF format or unreadable content usually indicate issues with the input file.
- Authentication or permission errors suggest misconfigured API credentials.
- Timeout or network errors when using URL input indicate connectivity problems.
Resolutions:
- Verify the correctness and accessibility of the PDF source (binary, base64, or URL).
- Ensure API credentials are valid and have necessary permissions.
- Use the "Document Name" property to help identify files during troubleshooting.
Links and References
- PDF4me API Documentation — Reference for advanced options and custom profiles.
- General information on PDF table extraction techniques and best practices can be found in various PDF processing libraries and services documentation.