Actions80
- Extract Text From Word
- Find And Replace Text
- Convert PDF To Editable PDF Using OCR
- Create Swiss QR Bill
- Split PDF By Barcode
- Split PDF By Swiss QR
- Split PDF By Text
- Split PDF Regular
- Create PDF/A
- Convert HTML To PDF
- Convert Markdown To PDF
- Upload File To PDF4me
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Fill PDF Form
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- AI-Invoice Parser
- AI-Process HealthCard
- AI-Process Contract
- Generate Barcode
- Classify Document
- Parse Document
- Linearize PDF
- Flatten PDF
- Convert To PDF
- Json To Excel
- Convert PDF To Excel
- Convert PDF To Word
- Convert PDF To PowerPoint
- Convert VISIO
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Extract Pages
- Merge Multiple PDFs
- Overlay PDFs
- Rotate Document
- Rotate Page
- Sign PDF
- URL to PDF
- Add Image Watermark To Image
- Add Text Watermark To Image
- Compress Image
- Convert Image Format
- Create Images From PDF
- Flip Image
- Get Image Metadata
- Image Extract Text
- Remove EXIF Tags From Image
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Image
- Rotate Image By EXIF Data
- Compress PDF
- Get PDF Metadata
- Repair PDF Document
- Get Document From Pdf4me
- Update Hyperlinks Annotation
- Protect Document
- Unlock PDF
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Generate Document Single
- Generate Documents Multiple
- Get Tracking Changes In Word
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Attachment From PDF
- Extract Text By Expression
- Extract Table From PDF
- Extract Resources
Overview
This node operation, "Extract Resources," is designed to extract various resources such as text and images from PDF documents. It supports multiple input methods for providing the PDF file: binary data from a previous node, a base64 encoded string, or a URL pointing to the PDF file. Users can specify extraction options to control whether text, images, or both are extracted, and can also define advanced options like which pages to process and custom API profiles.
This node is beneficial in scenarios where automated processing of PDF content is required, such as extracting textual data for analysis, retrieving embedded images for reuse, or preparing document contents for further workflows. For example, it can be used to extract all images from a set of invoices or to pull out text from specific pages of a contract PDF.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the PDF file to extract resources from. Options: Binary Data (from previous node), Base64 String (directly provide base64 encoded PDF content), URL (link to PDF file). |
| Input Binary Field | Name of the binary property containing the PDF file when using Binary Data input type. Usually "data" for file uploads. |
| Base64 PDF Content | Base64 encoded string representing the PDF document content. Used when Input Data Type is "Base64 String". |
| PDF URL | URL to the PDF file to extract resources from. Used when Input Data Type is "URL". |
| Document Name | Name assigned to the document during processing. Defaults to "document.pdf". |
| Extraction Options | Collection of options to specify what to extract from the PDF: - Extract Text: Whether to extract text content (boolean). - Extract Images: Whether to extract images (boolean). |
| Advanced Options | Additional settings for extraction: - Pages: Specify pages to extract from using formats like "all", "1,2", or "2-5". - Custom Profiles: JSON string to adjust custom properties or API-specific options. |
Output
The output contains the extracted resources from the PDF document. The json field will include the extracted text and/or images depending on the selected extraction options. If images are extracted, they may be provided as binary data or encoded strings suitable for further processing in n8n workflows.
Dependencies
- Requires access to the PDF file either as binary data, base64 content, or via a URL.
- Likely depends on an external PDF processing API or service (not explicitly named) that performs the actual resource extraction.
- Requires appropriate API credentials or authentication tokens configured in n8n to access the external PDF processing service.
Troubleshooting
- Common Issues:
- Providing an incorrect or inaccessible URL may cause failures in fetching the PDF.
- Incorrect base64 encoding or corrupted binary data will prevent successful extraction.
- Specifying invalid page ranges in the "Pages" option could lead to errors or empty results.
- Error Messages:
- Errors related to file retrieval usually indicate network issues or invalid URLs.
- Extraction errors might mention unsupported file formats or corrupted PDFs.
- Resolutions:
- Verify URLs and ensure the PDF is publicly accessible or properly authenticated.
- Confirm base64 strings are correctly encoded without extra characters.
- Use valid page range syntax and test with "all" if unsure.
Links and References
- PDF4me API Documentation — for details on custom profiles and advanced extraction options.