PDF.co Api

Generate PDF, extract data from PDF, split PDF, merge PDF, convert PDF. Fill PDF forms, add text and images to pdf and much more with pdf.co!

Actions19

Overview

This node converts PDF files from a given URL into various output formats such as CSV, HTML, images (JPG, PNG, TIFF, WEBP), JSON (multiple variants), text (with different layout options), and Excel formats (XLS, XLSX). It is useful for automating the extraction and transformation of PDF content into structured or more accessible formats for further processing or analysis.

Common scenarios include:

Extracting tabular data from invoices or reports in PDF to CSV or Excel.
Converting PDFs to images for preview or archival purposes.
Extracting raw or structured text for indexing or search.
Using OCR language support to convert scanned documents into editable text.
Integrating with webhooks to asynchronously receive converted data.

Practical example:

Automatically download an invoice PDF from a URL, convert it to CSV to extract line items, and then import that data into an accounting system.

Properties

Name	Meaning
Url	The URL of the PDF file to convert.
Convert Type	The desired output format. Options include: PDF to CSV, HTML, JPG, JSON (legacy and two enhanced versions), PNG, Text (normal and fast no-layout), TIFF, WEBP, XLS, XLSX, XML.
Advanced Options	Additional settings depending on the chosen convert type. These include:
- File Name	Name of the output file.
- Pages	Specific pages to convert (e.g., "0" for all pages or page ranges).
- Inline	Whether to return the output directly in the response or via links/webhook.
- Line Grouping	For table-like outputs, whether to group lines into cells.
- Unwrap	When line grouping is enabled, whether to unwrap lines into a single line within cells.
- OCR Language Name or ID	Language used for OCR on scanned documents. Selectable from a list or specified by expression.
- Extraction Region	Coordinates defining a rectangular region of the document to extract (format: left, top, right, bottom).
- Webhook URL	URL to send the output data asynchronously via webhook callback.
- Output Links Expiration (In Minutes)	Duration before generated output links expire.
- HTTP Username	Username for HTTP authentication if required to access the source PDF URL.
- HTTP Password	Password for HTTP authentication if required to access the source PDF URL.
- Custom Profiles	JSON string to specify custom API call profiles for advanced configuration (see external API documentation).

Note: The "Advanced Options" property has three variants depending on the convert type:

General advanced options for most formats like CSV, HTML, JSON, Text, XLS, XML.
Image-specific advanced options for JPG, PNG, WEBP, TIFF.
Simple text conversion advanced options for the fast no-layout text conversion.

Output

The node outputs JSON data containing the result of the PDF conversion. Depending on the selected convert type and options:

For inline responses, the output will contain the converted data directly, typically as base64-encoded strings or structured JSON objects.
For non-inline or webhook-based conversions, the output includes URLs pointing to the converted files, which are valid for the configured expiration time.
When converting to image formats, the output represents the image data or links to the images.
For JSON conversions, the output contains structured representations of the PDF content, including text objects, metadata, headers, styles, or legacy JSON formats.
For text conversions, plain text or simplified text without layout is returned.
Binary data (images, spreadsheets) is either embedded as base64 or provided via downloadable links.

Dependencies

Requires an API key credential for authenticating requests to the PDF conversion service.
The node depends on an external PDF conversion API capable of handling multiple output formats and OCR.
Optional HTTP basic authentication credentials may be needed to access protected PDF URLs.
The node supports webhook callbacks for asynchronous processing, requiring a reachable webhook URL.
The node uses internal helper functions and resource loaders to fetch supported fonts and OCR languages dynamically.

Troubleshooting

Invalid URL or inaccessible PDF: Ensure the provided URL is correct and publicly accessible or that HTTP credentials are correctly set.
Unsupported convert type or invalid advanced options: Verify that the selected convert type matches the advanced options provided.
OCR language not recognized: Choose a valid OCR language from the list or provide a correct language ID.
Webhook callback failures: Confirm the webhook URL is reachable and properly configured to accept POST requests.
Expired output links: If using output links, ensure they are accessed before the expiration time elapses.
API authentication errors: Check that the API key credential is valid and has sufficient permissions.
Large PDF files or complex layouts: Conversion might take longer or fail; consider limiting pages or regions extracted.