PDF Tools

Manipulate PDF files with various operations

Actions10

Overview

The node "PDF Tools" provides a variety of operations to manipulate PDF files within an n8n workflow. It supports tasks such as adding images or watermarks, deleting or extracting pages, merging multiple PDFs, reading metadata, reordering or rotating pages, splitting PDFs, and extracting text content from PDFs.

This node is beneficial in scenarios where automated PDF processing is required, such as:

Adding company logos or signatures as images to PDF reports.
Applying watermarks for document protection.
Extracting specific pages from large documents for sharing or archiving.
Merging multiple PDF invoices into a single file.
Reading metadata for document management systems.
Rotating scanned pages that are incorrectly oriented.
Splitting large PDFs into smaller parts.
Extracting textual content for indexing or analysis.

For example, you could use the "Extract Text" operation to pull all text from a PDF invoice for further processing or use "Add Watermark" to add a confidential stamp on sensitive documents.

Properties

Name	Meaning
PDF Binary Field (`pdfBinaryName`)	Name of the binary field containing the PDF file to process.

Additional properties relevant to the "Extract Text" operation (inherited from the node's full property set but not explicitly listed here) include:

Pages (for some operations): Specifies which pages to target, e.g., "1", "1,3-5", or "all".

Since only the pdfBinaryName property was provided for the "Extract Text" operation, this is the main input property relevant here.

Output

The node outputs JSON data with the extracted text under the field:

{
  "text": "extracted text content from the PDF"
}

No binary output is produced for the "Extract Text" operation since it returns plain text extracted from the PDF.

Dependencies

The node uses the external library pdf-parse to extract text from PDF files.
It also depends on pdf-lib for other PDF manipulations (not directly used in "Extract Text").
No special API keys or external services are required; all processing is done locally within the node.
Input PDFs must be provided as binary data fields in the workflow.

Troubleshooting

Missing binary data error: If the specified binary field does not exist or contains no data, the node will throw an error indicating no binary data found for the given field name. Ensure the correct binary field name is provided and that the input data contains the PDF.
Invalid MIME type error: The node validates that the input file is a PDF (application/pdf). If the input is not a PDF, an error will be thrown. Verify that the input binary data is indeed a PDF.
Empty text extraction: If the PDF contains no extractable text (e.g., scanned images without OCR), the output text may be empty. Consider preprocessing the PDF with OCR if text extraction is needed.
Page selection errors: For operations involving page ranges (not applicable to "Extract Text" specifically), invalid page numbers or formats will cause errors.

Links and References

pdf-parse GitHub repository – Used for extracting text from PDFs.
pdf-lib GitHub repository – Library for PDF manipulation.
n8n Documentation – General information about creating and using custom nodes.