PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

The node provides functionality to split a PDF document based on the occurrence of specific text within the PDF. This operation is useful when you have a large PDF file and want to divide it into smaller parts wherever a particular text appears, such as splitting a report by chapter titles, invoices by invoice number, or contracts by section headers.

Typical use cases include:

  • Automatically segmenting multi-page PDFs into logical sections for easier processing.
  • Extracting individual documents from a batch PDF by searching for identifying text.
  • Preparing documents for separate distribution or archival based on content markers.

For example, if you have a PDF containing multiple invoices concatenated together, you can specify the invoice number text to split the PDF into individual invoice files.

Properties

Name Meaning
Input Data Type Choose how to provide the PDF data. Options:
- Base64 String: Provide PDF content as a base64 encoded string.
- Binary Data: Use PDF file from previous nodes.
- URL: Provide URL to PDF file.
Binary Property Name The name of the binary property that contains the PDF file when using "Binary Data" as input type.
Base64 Content The base64 encoded PDF content when using "Base64 String" as input type.
PDF URL The URL to the PDF file when using "URL" as input type.
Text to Search The text string to search for in the PDF to determine where to split the document.
Split Text Page Defines where to split relative to the page containing the searched text. Options:
- After: Split after the page containing the text.
- Before: Split before the page containing the text.
File Naming The naming convention for the resulting split files. Options:
- Name As Per Order: Files are named according to their order.
- Name As Per Page: Files are named according to the page number where split occurs.
Advanced Options Optional JSON string to specify custom profiles and additional API call options for advanced control over the splitting process.

Output

The output consists of one or more JSON objects representing the split PDF files. Each output item typically includes:

  • A JSON field with metadata about the split file.
  • A binary field containing the actual PDF data of the split segment.

If the node outputs binary data, it represents the individual PDF files created by splitting the original document at the specified text locations.

Dependencies

  • Requires an external PDF processing service accessible via API (likely PDF4me or similar) to perform the splitting operation.
  • Needs appropriate API authentication credentials configured in n8n to authorize requests to the PDF processing service.
  • Internet access is required if providing PDF via URL or calling external APIs.

Troubleshooting

  • Common Issues:

    • Incorrect or missing API credentials will cause authentication failures.
    • Providing invalid PDF data (corrupted file, wrong base64 encoding, or inaccessible URL) will result in errors.
    • Specifying a text string that does not exist in the PDF will likely produce no splits or return the original file unchanged.
    • Misconfiguration of the binary property name when using binary input may cause the node to fail to locate the PDF data.
  • Error Messages:

    • Authentication errors: Check API key/token configuration.
    • File not found or inaccessible URL: Verify the URL is correct and publicly accessible.
    • Invalid PDF format: Ensure the input PDF is valid and correctly encoded.
    • No matching text found: Confirm the search text exists exactly as specified in the PDF.

Resolving these usually involves verifying input data correctness, ensuring proper credential setup, and confirming the search text matches the PDF content.

Links and References

Discussion