Actions3
Overview
This node, named PDF Utils, provides functionality to inspect and split PDF files. It can analyze the structure of a PDF to determine if it is vectorial (text-based) and split multi-page PDFs into individual pages. Common use cases include processing PDFs to extract individual pages for further workflows or conditionally splitting PDFs based on their content type. For example, it can be used to split a multi-page invoice PDF into separate pages or inspect a PDF to decide if splitting is necessary based on text content.
Use Case Examples
- Splitting a multi-page PDF into single-page PDFs for individual processing.
- Inspecting a PDF to check if it contains enough text to be considered vectorial before deciding to split it.
- Conditionally splitting PDFs only if they are not vectorial, preserving vectorial PDFs as is.
Properties
| Name | Meaning |
|---|---|
| Binary Property | Name of the binary property containing the PDF file to be processed. |
| Output Binary Property | Name for the output binary property where the split PDF pages will be stored. |
Output
JSON
pageNumber- The page number of the split PDF page (1-based index).originalFileName- The original file name of the PDF being processed.isVectorial- Indicates if the PDF is vectorial (text-based) when inspecting.pageCount- Total number of pages in the PDF.isMultiPage- Boolean indicating if the PDF has multiple pages.textLength- Length of the text extracted from the first page, used to determine if the PDF is vectorial.firstPageText- A snippet of the text extracted from the first page of the PDF.error- Error message if the node fails to process a PDF and continueOnFail is enabled.
Dependencies
- Uses 'pdfjs-dist' for PDF inspection and 'pdf-lib' for PDF splitting.
Troubleshooting
- Common issues include providing an incorrect binary property name or a non-PDF file, which will cause errors during inspection or splitting.
- Errors like 'Failed to inspect PDF' or 'Failed to split PDF' indicate issues with reading or processing the PDF file; ensure the input binary data is a valid PDF.
- If the node is set to continue on fail, errors will be output as JSON with an 'error' property for easier debugging.
Links
- pdf-lib - Library used for splitting PDF documents.
- pdfjs-dist - Library used for inspecting PDF content and structure.