PDF Tools

Manipulate PDF files with various operations

Overview

The node "PDF Tools" provides a variety of operations to manipulate PDF files within an n8n workflow. It supports tasks such as adding images or watermarks, deleting or extracting pages, merging multiple PDFs, reading metadata, reordering or rotating pages, splitting PDFs, and extracting text content.

This node is beneficial in scenarios where automated PDF processing is required, for example:

  • Adding company logos or signatures as images to PDF reports.
  • Applying watermarks for document protection.
  • Extracting specific pages from large documents for sharing.
  • Merging multiple PDF invoices into a single file.
  • Reading metadata for cataloging or auditing purposes.
  • Rotating scanned pages that are upside down.
  • Splitting large PDFs into smaller parts.
  • Extracting text for indexing or further text analysis.

Practical example: Automatically extract the first 3 pages of a contract PDF and add a watermark before sending it out.

Properties

Name Meaning
PDF Binary Field (pdfBinaryName) Name of the binary field containing the PDF file to process.

For the "Read Metadata" operation (Resource: Default, Operation: Read Metadata), only the PDF Binary Field property is relevant.

Output

  • The output contains a JSON object with the extracted metadata fields from the PDF:

    • title: Title of the PDF document.
    • author: Author of the document.
    • subject: Subject description.
    • keywords: Keywords associated with the PDF.
    • creator: Creator application or tool.
    • producer: Producer information.
    • creationDate: Creation date in ISO string format.
    • modificationDate: Last modification date in ISO string format.
  • No binary data is output for this operation; the output is purely JSON metadata.

Example output JSON:

{
  "title": "Sample Document",
  "author": "John Doe",
  "subject": "Contract",
  "keywords": ["contract", "agreement"],
  "creator": "PDF Generator",
  "producer": "PDF Producer Tool",
  "creationDate": "2023-01-15T10:00:00.000Z",
  "modificationDate": "2023-02-01T12:30:00.000Z"
}

Dependencies

  • Requires the input PDF file to be provided as binary data in the specified binary field.
  • Uses the pdf-lib library to load and read PDF metadata.
  • No external API keys or services are needed.
  • The node expects the PDF binary data to have MIME type application/pdf.
  • Proper configuration of the binary input field name is necessary.

Troubleshooting

  • Error: No binary data found for [field name]
    This means the specified binary field does not exist or is empty in the input item. Ensure the correct binary field name is set and that the input contains valid PDF binary data.

  • Error: The file must be in PDF format (MIME type: application/pdf). Received: [type]
    The input binary data is not recognized as a PDF. Verify that the input file is a valid PDF and that its MIME type is correctly set.

  • Empty or missing metadata fields
    Some PDFs may not contain all metadata fields. This is normal if the PDF creator did not embed them.

  • Operation not supported error
    If an unsupported operation is selected, verify that the operation name matches one of the supported options.

Links and References

Discussion