Actions80
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Image Watermark To Image
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- Add Text Watermark To Image
- AI-Invoice Parser
- AI-Process Contract
- AI-Process HealthCard
- Classify Document
- Compress Image
- Compress PDF
- Convert HTML To PDF
- Convert Image Format
- Convert JSON To Excel
- Convert Markdown To PDF
- Convert PDF To Editable PDF Using OCR
- Convert PDF To Excel
- Convert PDF To PowerPoint
- Convert PDF To Word
- Convert To PDF
- Convert URL to PDF
- Convert VISIO
- Convert Word to PDF Form
- Create Images From PDF
- Create PDF/A
- Create Swiss QR Bill
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Extract Attachment From PDF
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Resources
- Extract Table From PDF
- Extract Text By Expression
- Extract Text From Word
- Fill PDF Form
- Find And Replace Text
- Flip Image
- Flatten PDF
- Generate Barcode
- Generate Document Single
- Generate Documents Multiple
- Get Document From Pdf4me
- Get Image Metadata
- Get PDF Metadata
- Get Tracking Changes In Word
- Image Extract Text
- Linearize PDF
- Merge Multiple PDFs
- Overlay PDFs
- Parse Document
- Protect PDF
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Remove EXIF Tags From Image
- Repair PDF Document
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Document
- Rotate Image
- Rotate Image By EXIF Data
- Rotate PDF Page
- Sign PDF
- Split PDF By Barcode
- Split PDF By Swiss QR
- Split PDF By Text
- Split PDF Regular
- Unlock PDF
- Update Hyperlinks Annotation
- Upload File To PDF4me
Overview
This node operation, Extract Resources, is designed to extract various resources from a PDF document. It supports extracting text content and images embedded within the PDF. Users can provide the PDF input in multiple formats: as binary data from a previous node, as a base64-encoded string, or via a URL pointing to the PDF file.
Typical use cases include:
- Extracting textual content for indexing, searching, or further text processing.
- Extracting images for analysis, archiving, or reuse.
- Processing specific pages or the entire document.
- Returning extracted images either as metadata in JSON or as binary data for downstream nodes.
For example, a user might upload a PDF invoice and extract all text and images to automate data entry or archival workflows.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the PDF file to extract resources from. Options: • Binary Data (from previous node) • Base64 String (PDF content encoded as base64) • URL (link to PDF file) |
| Input Binary Field | Name of the binary property containing the PDF file (usually "data" for file uploads). Required if Input Data Type is Binary Data. |
| Base64 PDF Content | Base64 encoded PDF document content. Required if Input Data Type is Base64 String. |
| PDF URL | URL to the PDF file to extract resources from. Required if Input Data Type is URL. |
| Document Name | Name of the document used internally during processing. Defaults to "document.pdf". |
| Extract Text | Boolean flag indicating whether to extract text content from the PDF. Default is true. |
| Extract Images | Boolean flag indicating whether to extract images from the PDF. Default is false. |
| Return Images as Binary | Boolean flag indicating whether to return extracted images as binary data in addition to JSON metadata. Default is false. |
| Binary Data Name | Name for the binary data property in the output when returning images as binary. Default is "image". Only relevant if Return Images as Binary is true. |
| Advanced Options | Collection of additional options: • Pages: Specify pages to extract resources from using formats like "all", "1,2", or "2-5". • Custom Profiles: JSON string to adjust custom properties for API calls (advanced). |
Output
The node outputs an array of items where each item contains a json field with extracted resource data:
- If Extract Text is enabled, the JSON includes the extracted text content from the specified pages.
- If Extract Images is enabled, the JSON includes metadata about the extracted images such as image type, size, and position.
- If Return Images as Binary is enabled, the node also outputs the actual image files as binary data under the property name specified by Binary Data Name (default "image").
This allows downstream nodes to process extracted text and/or images either as structured JSON data or as raw binary files.
Dependencies
- Requires access to the PDF4me API service for PDF processing.
- Needs an API key credential configured in n8n to authenticate requests to the PDF4me service.
- Internet access is required if providing PDF input via URL.
Troubleshooting
Common issues:
- Providing an invalid or inaccessible PDF URL will cause extraction to fail.
- Incorrect base64 encoding or corrupted binary data input may result in errors.
- Specifying invalid page ranges in the advanced options can lead to no data being extracted or errors.
- Forgetting to enable extraction flags (text/images) will result in empty outputs.
Error messages:
- Errors related to authentication usually indicate missing or invalid API credentials.
- File format errors suggest the input is not a valid PDF.
- Network errors occur if the URL is unreachable or the API service is down.
Resolutions:
- Verify the PDF input source and format.
- Check API key configuration and permissions.
- Validate page range syntax in advanced options.
- Enable continue-on-fail mode in n8n to handle individual item failures gracefully.