PDF Utils

Inspect and split PDF files using pure npm packages

Actions3

Overview

This node, named PDF Utils, provides functionality to inspect and split PDF files. It can analyze the structure of a PDF to determine if it is vectorial (text-based) and split multi-page PDFs into individual pages. Common use cases include processing PDFs to extract individual pages for further workflows or conditionally splitting PDFs based on their content type. For example, it can be used to split a multi-page invoice PDF into separate pages or inspect a PDF to decide if splitting is necessary based on text content.

Use Case Examples

Splitting a multi-page PDF into single-page PDFs for individual processing.
Inspecting a PDF to check if it contains enough text to be considered vectorial before deciding to split it.
Conditionally splitting PDFs only if they are not vectorial, preserving vectorial PDFs as is.

Properties

Name	Meaning
Binary Property	Name of the binary property containing the PDF file to be processed.
Output Binary Property	Name for the output binary property where the split PDF pages will be stored.

Output

JSON

pageNumber - The page number of the split PDF page (1-based index).
originalFileName - The original file name of the PDF being processed.
isVectorial - Indicates if the PDF is vectorial (text-based) when inspecting.
pageCount - Total number of pages in the PDF.
isMultiPage - Boolean indicating if the PDF has multiple pages.
textLength - Length of the text extracted from the first page, used to determine if the PDF is vectorial.
firstPageText - A snippet of the text extracted from the first page of the PDF.
error - Error message if the node fails to process a PDF and continueOnFail is enabled.

Dependencies

Uses 'pdfjs-dist' for PDF inspection and 'pdf-lib' for PDF splitting.

Troubleshooting

Common issues include providing an incorrect binary property name or a non-PDF file, which will cause errors during inspection or splitting.
Errors like 'Failed to inspect PDF' or 'Failed to split PDF' indicate issues with reading or processing the PDF file; ensure the input binary data is a valid PDF.
If the node is set to continue on fail, errors will be output as JSON with an 'error' property for easier debugging.

PDF Utils

Actions3

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

PDF UtilsInstall

Actions3

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

PDF Utils