PDF Tools

Manipulate PDF files with various operations

Overview

The "PDF Tools" node provides a comprehensive set of operations to manipulate PDF files within n8n workflows. It supports tasks such as merging multiple PDFs, adding images or watermarks, deleting or extracting pages, reordering or rotating pages, splitting PDFs, reading metadata, and extracting text content.

This node is beneficial in scenarios where automated PDF processing is required, such as:

  • Combining multiple reports or documents into a single PDF.
  • Adding branding or signatures by inserting images or watermarks.
  • Extracting specific pages for review or distribution.
  • Reordering or rotating pages to correct document orientation.
  • Splitting large PDFs into smaller parts.
  • Reading metadata for cataloging or auditing.
  • Extracting text for indexing or further text analysis.

Practical example:
A user receives multiple PDF invoices daily and wants to merge them into one consolidated PDF before sending it to accounting. Using this node's "Merge PDFs" operation with the specified binary fields containing each invoice PDF automates this process seamlessly.

Properties

Name Meaning
PDF Binary Field Names Comma-separated list of binary field names containing PDFs to merge (for Merge operation).

Additional properties relevant to the "Merge PDFs" operation (inferred from code and description):

Name Meaning
PDF Binary Field Names List of binary field names containing PDFs to merge, separated by commas.

Note: The provided JSON input property corresponds exactly to the "PDF Binary Field Names" used in the Merge operation.

Output

The node outputs the result as binary data representing a PDF file. The output structure includes:

  • json: An empty object {} for most operations except metadata and text extraction.
  • binary: Contains a single binary field named "output" holding:
    • data: Base64-encoded PDF content resulting from the operation.
    • fileName: A default filename "output.pdf".
    • mimeType: Always "application/pdf".

For the Read Metadata operation, the output JSON contains metadata fields such as title, author, subject, keywords, creator, producer, creationDate, and modificationDate.

For the Extract Text operation, the output JSON contains a single field text with the extracted textual content.

Dependencies

  • Uses the pdf-lib library for PDF manipulation.
  • Uses pdf-parse for text extraction from PDFs.
  • Requires input PDFs and images (PNG or JPEG) to be provided as binary data in the workflow.
  • No external API keys or services are needed; all processing is done locally within the node.

Troubleshooting

  • Missing binary data error: If the specified binary field name does not exist or contains no data, the node throws an error indicating the missing binary data. Ensure that the input data contains the correct binary fields with valid PDF content.
  • Invalid MIME type error: The node validates that inputs are PDFs (application/pdf) or images (image/png, image/jpeg) where applicable. Providing files with unsupported MIME types will cause errors.
  • Insufficient PDFs for merge: The Merge operation requires at least two PDFs. Providing fewer will trigger an error.
  • Invalid page numbers: For operations involving page selection (e.g., delete, extract, rotate), invalid page ranges or numbers outside the document's page count will cause errors.
  • Missing required parameters: Some operations require mandatory parameters like watermark text, pages to delete/extract/rotate, or new page order. Omitting these will result in errors.
  • General error handling: Errors during processing log the message and halt execution. Review error messages carefully to identify missing or incorrect inputs.

Links and References

Discussion