PDF Tools Enhanced icon

PDF Tools Enhanced

Advanced PDF operations: extract info and split documents with flexible page ranges and custom document creation

Overview

This node, named "PDF Tools Enhanced," provides advanced PDF document operations within n8n workflows. It supports two main operations: extracting detailed metadata and page information from a PDF file ("Get PDF Info"), and splitting a PDF into smaller documents based on various criteria ("Split PDF").

Use cases include:

  • Quickly retrieving PDF metadata such as title, author, creation date, page count, page sizes, orientations, and encryption status.
  • Splitting large PDFs into smaller chunks for easier processing or distribution.
  • Creating custom PDF documents by selecting specific pages or page ranges.
  • Automating PDF handling tasks in document management, reporting, or archival workflows.

For example, you might use the "Get PDF Info" operation to verify document properties before further processing, or use the "Split PDF" operation to break a large report into chapters or sections automatically.


Properties

Name Meaning
Binary Property Name of the binary property containing the PDF file. Default is "data".

Note: The "Binary Property" input specifies which binary data field contains the PDF file to process. This applies to both "Get PDF Info" and "Split PDF" operations.


Output

For "Get PDF Info" operation:

The output JSON contains detailed information about the PDF, including:

  • pageCount: Total number of pages.
  • fileName: Original file name of the PDF.
  • fileSizeBytes: Size of the PDF file in bytes.
  • fileSizeMB: Size of the PDF file in megabytes (rounded).
  • operation: The string "getInfo".
  • metadata: Object with PDF metadata fields:
    • title, author, subject, creator, producer, keywords
    • creationDate and modificationDate in ISO string format (or null if unavailable)
  • technicalInfo: Technical details:
    • version: PDF version (e.g., "PDF 1.4")
    • isEncrypted: Boolean indicating if the PDF is encrypted
    • hasAcroForm: Boolean indicating presence of AcroForm (interactive forms)
  • pageStatistics: Summary statistics about pages:
    • totalPages, landscapePages, portraitPages, rotatedPages
    • hasUniformSize: Boolean indicating if all pages have the same size
    • uniqueSizes: Array of unique page size strings (e.g., "612x792")
  • pageDetails: Array of objects describing each page:
    • pageNumber: Page index starting at 1
    • width and height: Dimensions in points (rounded to two decimals)
    • orientation: Either "portrait" or "landscape"
    • rotation: Rotation angle in degrees (0 if none)

For "Split PDF" operation:

The output JSON includes:

  • count: Number of resulting split documents.
  • pageRanges: Array of page range strings representing each split part (e.g., "1-3", "5")
  • operation: The string "split".
  • splitMode: Mode used for splitting ("chunkSize", "pageRanges", or "customDocuments").
  • originalFileName: Original PDF file name.

If using "customDocuments" mode, an additional documents array describes each created document with:

  • name: Document name.
  • pageCount: Number of pages in that document.
  • pages: Array of page numbers included.
  • pageRange: String representation of the page range.

Additionally, the node outputs binary data for each split PDF document under keys like pdf1, pdf2, etc., with:

  • data: Base64-encoded PDF content.
  • fileName: Generated file name reflecting the original name and split info.
  • mimeType: Always "application/pdf".

Dependencies

  • Uses the pdf-lib library (bundled internally) for PDF parsing, metadata extraction, and manipulation.
  • Reads PDF files either from the workflow's binary data or attempts to load from the filesystem path if available.
  • Requires the input item to contain valid binary PDF data under the specified binary property.
  • No external API or service dependencies; all processing is local within the node.

Troubleshooting

  • Missing binary data error: If the specified binary property does not exist or does not contain PDF data, the node throws an error stating no binary data found. Ensure the input item has the correct binary property set with a valid PDF file.

  • Failed to load PDF: Errors can occur if the PDF is corrupted or unreadable. The node tries loading from both filesystem and binary buffer; failure in both results in an error mentioning both attempts.

  • Invalid page ranges: When using page range inputs (for splitting), invalid formats or out-of-bound page numbers cause errors specifying the problematic range or page number. Verify page ranges are correctly formatted and within the total page count.

  • Empty or missing pages in custom documents: Custom document definitions require non-empty page specifications. Missing or empty page lists will trigger errors.

  • Encryption and unsupported features: While the node ignores encryption to some extent, heavily encrypted or protected PDFs may fail to load or provide incomplete info.

  • To handle errors gracefully, enable the node's "Continue On Fail" option to receive error details in output instead of stopping execution.


Links and References

Discussion