PDF.co Api icon

PDF.co Api

Generate PDF, extract data from PDF, split PDF, merge PDF, convert PDF. Fill PDF forms, add text and images to pdf and much more with pdf.co!

Overview

This node provides functionality to split PDF files based on different criteria. It supports splitting a PDF by specifying page numbers or ranges, searching for specific text within the PDF, or detecting barcodes to determine split points. This is useful in scenarios where you need to extract sections of a large PDF into smaller documents automatically, such as splitting invoices, contracts, or reports.

Practical examples:

  • Splitting a multi-page invoice PDF into individual pages or groups of pages.
  • Extracting sections of a PDF document that contain a certain keyword or phrase.
  • Dividing a PDF based on barcode markers embedded in the document, useful for batch processing scanned forms.

Properties

Name Meaning
Url The URL of the PDF file to split.
Split By Method to split the PDF:
- Page Numbers: Split based on comma-separated page indices or ranges.
- Search Text: Split based on occurrences of specified text.
- Barcode: Split based on detected barcodes.
Page Numbers/Ranges Comma-separated list of page numbers or ranges to split by (e.g., "1,2-5,7-"). Used when splitting by page numbers.
Text Search String The text string to search for in the PDF to determine split points. Used when splitting by text search.
Barcode Search String The barcode string to search for in the PDF to determine split points. Used when splitting by barcode.

Advanced Options for Text Search

Name Meaning
Enable Case-Sensitive Search Whether the text search should be case-sensitive.
Enable Regular Expression Search Whether to treat the search string as a regular expression.
Exclude Pages with Identified Text Whether to exclude pages containing the identified text from output.
OCR Language Name or ID Language used for OCR if needed. Can be selected from a list or specified via expression.
File Name Name of the output file.
Webhook URL Callback URL or webhook to receive the output data asynchronously.
Output Links Expiration (In Minutes) Time in minutes before the output link expires.
Inline Whether to return the output directly in the response or not.
HTTP Username Username for HTTP authentication if required to access the source URL.
HTTP Password Password for HTTP authentication if required to access the source URL.
Custom Profiles JSON string to specify custom API call options, e.g., output format.

Advanced Options for Barcode Search

Same as above but related to barcode-based splitting, including case sensitivity, regex search, exclusion of pages with identified barcodes, and similar callback and authentication options.

Advanced Options for Page Number Split

Includes file naming, callback URL, expiration time, inline response option, HTTP authentication, and custom profile settings.

Output

The node outputs JSON data representing the result of the PDF split operation. This typically includes links or base64 encoded data of the resulting split PDF files. If a webhook URL is provided, the output may be delivered asynchronously via callback.

If binary data is returned, it represents the actual PDF content of the split parts.

Dependencies

  • Requires access to the PDF file via a URL, which may require HTTP basic authentication.
  • Uses an external PDF processing service (implied by references to API profiles and callbacks).
  • Optional webhook URL for asynchronous delivery of results.
  • Supports OCR language selection, implying dependency on OCR capabilities of the external service.

Troubleshooting

  • Invalid URL or inaccessible PDF: Ensure the URL is correct and accessible. If authentication is required, provide valid HTTP username and password.
  • Incorrect page number/range format: Use proper comma-separated values and ranges (e.g., "1,3-5,7-").
  • Text or barcode not found: Verify the search strings are correct and consider enabling case sensitivity or regex options if applicable.
  • Webhook callback failures: Confirm the callback URL is reachable and correctly configured to accept POST requests.
  • Expired output links: Adjust the expiration time if links expire too quickly.
  • API errors due to malformed custom profiles: Validate JSON syntax in custom profiles.

Links and References


This summary is based on static analysis of the node's properties and bundled code, focusing on the "Split PDF" operation under the "Default" resource.

Discussion