Actions19
- AI Invoice Parser
- URL/HTML to PDF
- Merge PDF
- Split PDF
- Convert To PDF
- Convert From PDF
- Add Text/Images to PDF
- Fill a PDF Form
- PDF Information & Form Fields
- Compress PDF
- PDF Security
- Rotate PDF Pages
- Delete PDF Pages
- Search in PDF
- Search & Replace Text or Delete
- Barcode Reader
- Barcode Generator
- Make PDF Searchable or Unsearchable
- Upload File
Overview
This node converts PDF files from a given URL into various output formats such as CSV, HTML, images (JPG, PNG, TIFF, WEBP), JSON (multiple variants), text (with different layout options), and Excel formats (XLS, XLSX). It is useful for automating the extraction and transformation of PDF content into structured or more accessible formats for further processing or analysis.
Common scenarios include:
- Extracting tabular data from invoices or reports in PDF to CSV or Excel.
- Converting PDFs to images for preview or archival purposes.
- Extracting raw or structured text for indexing or search.
- Using OCR language support to convert scanned documents into editable text.
- Integrating with webhooks to asynchronously receive converted data.
Practical example:
- Automatically download an invoice PDF from a URL, convert it to CSV to extract line items, and then import that data into an accounting system.
Properties
| Name | Meaning |
|---|---|
| Url | The URL of the PDF file to convert. |
| Convert Type | The desired output format. Options include: PDF to CSV, HTML, JPG, JSON (legacy and two enhanced versions), PNG, Text (normal and fast no-layout), TIFF, WEBP, XLS, XLSX, XML. |
| Advanced Options | Additional settings depending on the chosen convert type. These include: |
| - File Name | Name of the output file. |
| - Pages | Specific pages to convert (e.g., "0" for all pages or page ranges). |
| - Inline | Whether to return the output directly in the response or via links/webhook. |
| - Line Grouping | For table-like outputs, whether to group lines into cells. |
| - Unwrap | When line grouping is enabled, whether to unwrap lines into a single line within cells. |
| - OCR Language Name or ID | Language used for OCR on scanned documents. Selectable from a list or specified by expression. |
| - Extraction Region | Coordinates defining a rectangular region of the document to extract (format: left, top, right, bottom). |
| - Webhook URL | URL to send the output data asynchronously via webhook callback. |
| - Output Links Expiration (In Minutes) | Duration before generated output links expire. |
| - HTTP Username | Username for HTTP authentication if required to access the source PDF URL. |
| - HTTP Password | Password for HTTP authentication if required to access the source PDF URL. |
| - Custom Profiles | JSON string to specify custom API call profiles for advanced configuration (see external API documentation). |
Note: The "Advanced Options" property has three variants depending on the convert type:
- General advanced options for most formats like CSV, HTML, JSON, Text, XLS, XML.
- Image-specific advanced options for JPG, PNG, WEBP, TIFF.
- Simple text conversion advanced options for the fast no-layout text conversion.
Output
The node outputs JSON data containing the result of the PDF conversion. Depending on the selected convert type and options:
- For inline responses, the output will contain the converted data directly, typically as base64-encoded strings or structured JSON objects.
- For non-inline or webhook-based conversions, the output includes URLs pointing to the converted files, which are valid for the configured expiration time.
- When converting to image formats, the output represents the image data or links to the images.
- For JSON conversions, the output contains structured representations of the PDF content, including text objects, metadata, headers, styles, or legacy JSON formats.
- For text conversions, plain text or simplified text without layout is returned.
- Binary data (images, spreadsheets) is either embedded as base64 or provided via downloadable links.
Dependencies
- Requires an API key credential for authenticating requests to the PDF conversion service.
- The node depends on an external PDF conversion API capable of handling multiple output formats and OCR.
- Optional HTTP basic authentication credentials may be needed to access protected PDF URLs.
- The node supports webhook callbacks for asynchronous processing, requiring a reachable webhook URL.
- The node uses internal helper functions and resource loaders to fetch supported fonts and OCR languages dynamically.
Troubleshooting
- Invalid URL or inaccessible PDF: Ensure the provided URL is correct and publicly accessible or that HTTP credentials are correctly set.
- Unsupported convert type or invalid advanced options: Verify that the selected convert type matches the advanced options provided.
- OCR language not recognized: Choose a valid OCR language from the list or provide a correct language ID.
- Webhook callback failures: Confirm the webhook URL is reachable and properly configured to accept POST requests.
- Expired output links: If using output links, ensure they are accessed before the expiration time elapses.
- API authentication errors: Check that the API key credential is valid and has sufficient permissions.
- Large PDF files or complex layouts: Conversion might take longer or fail; consider limiting pages or regions extracted.