Actions80
- Extract Text From Word
- Find And Replace Text
- Convert PDF To Editable PDF Using OCR
- Create Swiss QR Bill
- Split PDF By Barcode
- Split PDF By Swiss QR
- Split PDF By Text
- Split PDF Regular
- Create PDF/A
- Convert HTML To PDF
- Convert Markdown To PDF
- Upload File To PDF4me
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Fill PDF Form
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- AI-Invoice Parser
- AI-Process HealthCard
- AI-Process Contract
- Generate Barcode
- Classify Document
- Parse Document
- Linearize PDF
- Flatten PDF
- Convert To PDF
- Json To Excel
- Convert PDF To Excel
- Convert PDF To Word
- Convert PDF To PowerPoint
- Convert VISIO
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Extract Pages
- Merge Multiple PDFs
- Overlay PDFs
- Rotate Document
- Rotate Page
- Sign PDF
- URL to PDF
- Add Image Watermark To Image
- Add Text Watermark To Image
- Compress Image
- Convert Image Format
- Create Images From PDF
- Flip Image
- Get Image Metadata
- Image Extract Text
- Remove EXIF Tags From Image
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Image
- Rotate Image By EXIF Data
- Compress PDF
- Get PDF Metadata
- Repair PDF Document
- Get Document From Pdf4me
- Update Hyperlinks Annotation
- Protect Document
- Unlock PDF
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Generate Document Single
- Generate Documents Multiple
- Get Tracking Changes In Word
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Attachment From PDF
- Extract Text By Expression
- Extract Table From PDF
- Extract Resources
Overview
The node provides functionality to split a PDF document based on the occurrence of specific text within the PDF. This operation is useful when you have a large PDF file and want to divide it into smaller parts wherever a particular text appears, such as splitting a report by chapter titles, invoices by invoice number, or contracts by section headers.
Typical use cases include:
- Automatically segmenting multi-page PDFs into logical sections for easier processing.
- Extracting individual documents from a batch PDF by searching for identifying text.
- Preparing documents for separate distribution or archival based on content markers.
For example, if you have a PDF containing multiple invoices concatenated together, you can specify the invoice number text to split the PDF into individual invoice files.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the PDF data. Options: - Base64 String: Provide PDF content as a base64 encoded string. - Binary Data: Use PDF file from previous nodes. - URL: Provide URL to PDF file. |
| Binary Property Name | The name of the binary property that contains the PDF file when using "Binary Data" as input type. |
| Base64 Content | The base64 encoded PDF content when using "Base64 String" as input type. |
| PDF URL | The URL to the PDF file when using "URL" as input type. |
| Text to Search | The text string to search for in the PDF to determine where to split the document. |
| Split Text Page | Defines where to split relative to the page containing the searched text. Options: - After: Split after the page containing the text. - Before: Split before the page containing the text. |
| File Naming | The naming convention for the resulting split files. Options: - Name As Per Order: Files are named according to their order. - Name As Per Page: Files are named according to the page number where split occurs. |
| Advanced Options | Optional JSON string to specify custom profiles and additional API call options for advanced control over the splitting process. |
Output
The output consists of one or more JSON objects representing the split PDF files. Each output item typically includes:
- A JSON field with metadata about the split file.
- A binary field containing the actual PDF data of the split segment.
If the node outputs binary data, it represents the individual PDF files created by splitting the original document at the specified text locations.
Dependencies
- Requires an external PDF processing service accessible via API (likely PDF4me or similar) to perform the splitting operation.
- Needs appropriate API authentication credentials configured in n8n to authorize requests to the PDF processing service.
- Internet access is required if providing PDF via URL or calling external APIs.
Troubleshooting
Common Issues:
- Incorrect or missing API credentials will cause authentication failures.
- Providing invalid PDF data (corrupted file, wrong base64 encoding, or inaccessible URL) will result in errors.
- Specifying a text string that does not exist in the PDF will likely produce no splits or return the original file unchanged.
- Misconfiguration of the binary property name when using binary input may cause the node to fail to locate the PDF data.
Error Messages:
- Authentication errors: Check API key/token configuration.
- File not found or inaccessible URL: Verify the URL is correct and publicly accessible.
- Invalid PDF format: Ensure the input PDF is valid and correctly encoded.
- No matching text found: Confirm the search text exists exactly as specified in the PDF.
Resolving these usually involves verifying input data correctness, ensuring proper credential setup, and confirming the search text matches the PDF content.
Links and References
- PDF4me API Documentation — For details on custom profiles and advanced options.
- General PDF splitting concepts: https://en.wikipedia.org/wiki/PDF_split_and_merge_tools