PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

This node operation, Extract Resources, is designed to extract various resources from a PDF document. It supports extracting text content and images embedded within the PDF. Users can provide the PDF input in multiple formats: as binary data from a previous node, as a base64-encoded string, or via a URL pointing to the PDF file.

Typical use cases include:

  • Extracting textual content for indexing, searching, or further text processing.
  • Extracting images for analysis, archiving, or reuse.
  • Processing specific pages or the entire document.
  • Returning extracted images either as metadata in JSON or as binary data for downstream nodes.

For example, a user might upload a PDF invoice and extract all text and images to automate data entry or archival workflows.

Properties

Name Meaning
Input Data Type Choose how to provide the PDF file to extract resources from. Options:
• Binary Data (from previous node)
• Base64 String (PDF content encoded as base64)
• URL (link to PDF file)
Input Binary Field Name of the binary property containing the PDF file (usually "data" for file uploads). Required if Input Data Type is Binary Data.
Base64 PDF Content Base64 encoded PDF document content. Required if Input Data Type is Base64 String.
PDF URL URL to the PDF file to extract resources from. Required if Input Data Type is URL.
Document Name Name of the document used internally during processing. Defaults to "document.pdf".
Extract Text Boolean flag indicating whether to extract text content from the PDF. Default is true.
Extract Images Boolean flag indicating whether to extract images from the PDF. Default is false.
Return Images as Binary Boolean flag indicating whether to return extracted images as binary data in addition to JSON metadata. Default is false.
Binary Data Name Name for the binary data property in the output when returning images as binary. Default is "image". Only relevant if Return Images as Binary is true.
Advanced Options Collection of additional options:
• Pages: Specify pages to extract resources from using formats like "all", "1,2", or "2-5".
• Custom Profiles: JSON string to adjust custom properties for API calls (advanced).

Output

The node outputs an array of items where each item contains a json field with extracted resource data:

  • If Extract Text is enabled, the JSON includes the extracted text content from the specified pages.
  • If Extract Images is enabled, the JSON includes metadata about the extracted images such as image type, size, and position.
  • If Return Images as Binary is enabled, the node also outputs the actual image files as binary data under the property name specified by Binary Data Name (default "image").

This allows downstream nodes to process extracted text and/or images either as structured JSON data or as raw binary files.

Dependencies

  • Requires access to the PDF4me API service for PDF processing.
  • Needs an API key credential configured in n8n to authenticate requests to the PDF4me service.
  • Internet access is required if providing PDF input via URL.

Troubleshooting

  • Common issues:

    • Providing an invalid or inaccessible PDF URL will cause extraction to fail.
    • Incorrect base64 encoding or corrupted binary data input may result in errors.
    • Specifying invalid page ranges in the advanced options can lead to no data being extracted or errors.
    • Forgetting to enable extraction flags (text/images) will result in empty outputs.
  • Error messages:

    • Errors related to authentication usually indicate missing or invalid API credentials.
    • File format errors suggest the input is not a valid PDF.
    • Network errors occur if the URL is unreachable or the API service is down.
  • Resolutions:

    • Verify the PDF input source and format.
    • Check API key configuration and permissions.
    • Validate page range syntax in advanced options.
    • Enable continue-on-fail mode in n8n to handle individual item failures gracefully.

Links and References

Discussion