PDF-LIB icon

PDF-LIB

Perform operations on PDF files (get info, split)

Overview

This node allows you to perform operations on PDF files, specifically extracting information about a PDF or splitting a PDF into smaller chunks of pages. It is useful when you need to analyze PDF documents or divide large PDFs into manageable parts for further processing or distribution.

Common scenarios include:

  • Extracting the total number of pages from a PDF to decide subsequent workflow steps.
  • Splitting a large PDF into smaller PDFs each containing a specified number of pages, for example, splitting a 100-page document into 10 PDFs of 10 pages each.

Practical examples:

  • Automatically splitting invoices or reports received as a single PDF into individual page groups for separate processing.
  • Getting metadata about uploaded PDFs to validate their size before archiving.

Properties

Name Meaning
Operation Choose between "Get PDF Info" (extract info) and "Split PDF" (split into page chunks).
Binary Property The name of the binary property that contains the input PDF file. Default is "data".
Chunk Size Number of pages per chunk when splitting a PDF. Only applicable if Operation is "Split".

Output

The output JSON structure depends on the selected operation:

  • Get PDF Info:

    • pageCount: Number of pages in the PDF.
    • operation: Always "getInfo".
    • fileName: Original file name of the PDF or "unknown.pdf" if not available.
  • Split PDF:

    • count: Number of resulting PDF chunks.
    • pageRanges: Array of strings indicating the page ranges for each chunk (e.g., "1-3", "4-6").
    • operation: Always "split".
    • originalFileName: Original file name of the PDF or "unknown.pdf" if not available.
    • binary: An object where each key is like pdf1, pdf2, etc., representing each split PDF chunk. Each chunk includes:
      • data: Base64 encoded PDF data.
      • fileName: Generated file name such as "split_1.pdf".
      • mimeType: Always "application/pdf".

If the node encounters an error and is set to continue on fail, the output JSON will contain an error field with the error message.

Dependencies

  • Uses the pdf-lib library to load, manipulate, and save PDF documents.
  • Requires the input PDF file to be provided as base64-encoded binary data in the specified binary property.
  • No external API keys or services are required.
  • Runs entirely within n8n environment.

Troubleshooting

  • Missing binary data: If the specified binary property does not exist or does not contain valid PDF data, the node will throw an error stating no binary data was found.
  • Invalid PDF format: If the input data is not a valid PDF, loading the document will fail.
  • Chunk size issues: Setting chunk size to zero or negative values may cause unexpected behavior; ensure chunk size is a positive integer.
  • Large PDFs: Processing very large PDFs might consume significant memory and time.

To resolve errors:

  • Verify the binary property name matches the actual input.
  • Confirm the input file is a valid PDF.
  • Use reasonable chunk sizes.
  • Enable "Continue On Fail" to handle errors gracefully in workflows.

Links and References

Discussion