PDF Tools Enhanced

Advanced PDF operations: extract info and split documents with flexible page ranges and custom document creation

Actions2

- Get PDF Info
- Split PDF

Overview

This node, named "PDF Tools Enhanced," provides advanced PDF operations focusing on extracting information and splitting PDF documents. It is particularly useful when you need to analyze PDF metadata or break down large PDFs into smaller, more manageable files based on specific criteria.

Common scenarios include:

Extracting detailed metadata and page layout info from PDFs for cataloging or auditing.
Splitting a large PDF into equal-sized chunks for easier distribution or processing.
Creating custom PDF documents by selecting arbitrary pages or page ranges from the original file.

Practical examples:

A user receives a 100-page report and wants to split it into 10 smaller PDFs each containing 10 pages.
Extracting author, title, and page orientation details from a batch of PDFs before archiving.
Generating multiple customized PDFs where each contains selected pages relevant to different departments.

Properties

Name	Meaning
Binary Property	Name of the binary property that holds the input PDF file. Default is `"data"`.
Operation	The action to perform: - Get PDF Info: Extract metadata and page count. - Split PDF: Split the PDF into smaller documents.
Split Mode	Method to split the PDF (only for Split operation): - By Chunk Size: Split into equal-sized chunks by number of pages. - By Page Ranges: Split using specified page ranges. - Custom Documents: Create multiple documents with custom page selections.
Chunk Size	Number of pages per chunk when splitting by chunk size. Default is 1.
Page Ranges	Comma-separated list of page ranges used when splitting by page ranges. Examples: `"1-3,5,7-10"`, `"1,3,5"`, or `"2-5"`.
Document Definitions	Defines multiple custom documents with specific page selections. Each document has: - Document Name: Name for the output PDF (without extension). - Pages: Comma-separated page numbers or ranges (e.g., `"1,4,7"` or `"1-3,5,7-10"`).

Output

The node outputs JSON data and binary PDF files depending on the operation:

Get PDF Info: Outputs a JSON object containing:
- pageCount: Total number of pages.
- fileName: Original file name.
- fileSizeBytes and fileSizeMB: File size metrics.
- metadata: Title, author, subject, creator, producer, keywords, creation and modification dates.
- technicalInfo: PDF version, encryption status, presence of AcroForm.
- pageStatistics: Counts of landscape, portrait, rotated pages, uniformity of page sizes, unique page sizes.
- pageDetails: Array with details per page including page number, width, height, orientation, and rotation angle.
Split PDF: Outputs:
- JSON summary with:
  - count: Number of resulting documents.
  - pageRanges: List of page ranges for each split document.
  - operation: Always "split".
  - splitMode: The mode used (chunkSize, pageRanges, or customDocuments).
  - For custom documents, an array documents describing each output document's name, page count, pages, and page range.
- Binary data for each split PDF as separate binary properties named like pdf1, pdf2, etc., each containing:
  - Base64 encoded PDF data.
  - File name constructed based on original file name and split mode.
  - MIME type set to "application/pdf".

Dependencies

Uses the pdf-lib library (bundled internally) for PDF parsing, manipulation, and creation.
Reads input PDF either from the node's binary data or attempts to load from the filesystem if available.
Requires the input PDF to be provided as binary data in the specified binary property.
No external API keys or services are required.

Troubleshooting

No binary data property found: If the specified binary property does not exist on the input item, the node will throw an error. Ensure the correct binary property name is set and that the input contains valid PDF binary data.
Failed to load PDF: Errors loading the PDF can occur if the file is corrupted or encrypted beyond what the library can handle. Check the input file integrity.
Invalid page ranges: When specifying page ranges or custom documents, invalid page numbers (out of bounds or malformed) will cause errors. Verify page ranges conform to the total page count and use proper syntax.
Empty custom documents: Defining custom documents without any pages or with empty page strings will result in errors. Make sure each document has at least one valid page.
Continue on Fail: If enabled, the node will continue processing other items even if one fails, returning error details in the output JSON.

Links and References

pdf-lib GitHub Repository — The underlying PDF manipulation library used.
n8n Documentation on Binary Data — Understanding how to work with binary data in n8n nodes.
PDF Specification Overview — For deeper understanding of PDF structure and metadata.