PDF Utils

Inspect and split PDF files using pure npm packages

Actions3

Overview

This node, named PDF Utils, provides operations to inspect and split PDF files using pure npm packages. It is useful for workflows that need to analyze PDF structure to determine if a PDF is text-based (vectorial) or image-based, and optionally split multi-page PDFs into individual pages. Practical examples include verifying PDF content type before processing or splitting large PDFs into single pages for further handling.

Use Case Examples

Inspect a PDF to check if it is vectorial by analyzing the text content length and page count.
Inspect a PDF and conditionally split it into individual pages if it is not vectorial (image-based).
Split a multi-page PDF into separate single-page PDF files.

Properties

Name	Meaning
Binary Property	Name of the binary property containing the PDF file to be processed.
Text Threshold	Minimum text length on the first page to consider the PDF as vectorial (text-based). Only applicable for inspect and inspectAndSplit operations.

Output

JSON

pageCount - Total number of pages in the PDF.
isMultiPage - Boolean indicating if the PDF has more than one page.
isVectorial - Boolean indicating if the PDF is considered vectorial (text-based) based on the text threshold.
textLength - Length of the extracted text from the first page.
firstPageText - Extracted text snippet (up to 200 characters) from the first page.
pageNumber - Page number of the split PDF page (only present in split or inspectAndSplit outputs).
originalFileName - Original file name of the PDF being processed (only present in split or inspectAndSplit outputs).
error - Error message if the node fails and continueOnFail is enabled.

Dependencies

Uses 'pdfjs-dist' for PDF inspection and 'pdf-lib' for splitting PDFs.

Troubleshooting

Common errors include failure to read or parse the PDF file, which may be caused by corrupted or unsupported PDF formats.
If the binary property name is incorrect or the binary data is missing, the node will throw an error.
To resolve errors, ensure the input binary data contains a valid PDF file and the binary property name matches the input data.
If the node fails during splitting, verify the PDF is not encrypted or corrupted.

PDF Utils

Actions3

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

PDF UtilsInstall

Actions3

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

PDF Utils