Overview
This node uses Puppeteer, a headless browser automation library, to interact with web pages and generate PDF documents from them. The "Get PDF" operation navigates to a specified URL, renders the page as a PDF according to user-defined settings (such as paper size, margins, scale, orientation, and header/footer templates), and outputs the PDF data as binary.
Common scenarios where this node is beneficial include:
- Automatically generating PDFs of web reports or dashboards.
- Archiving web pages in PDF format for compliance or record-keeping.
- Creating printable versions of dynamic web content.
- Generating invoices, tickets, or other documents rendered via web technologies.
Practical example:
- You want to convert a dynamically generated invoice page on your website into a PDF file for emailing or storage. By providing the invoice URL and configuring PDF options like page size and margins, this node will produce a ready-to-use PDF binary output.
Properties
| Name | Meaning |
|---|---|
| URL | The web address of the page to convert to PDF. |
| Property Name | The name of the binary property where the resulting PDF data will be stored. |
| Page Ranges | Specific page ranges to print, e.g., "1-5, 8, 11-13". Optional. |
| Scale | Scale factor for rendering the page, between 0.1 and 2. |
| Prefer CSS Page Size | Whether to prioritize any CSS @page size declarations over explicit width/height or format options. |
| Format | Paper format type when printing PDF (e.g., Letter, Legal, A4). Only used if "Prefer CSS Page Size" is false. |
| Height | Custom height of the paper. Can be a number or string with units. Used only if "Prefer CSS Page Size" is false. |
| Width | Custom width of the paper. Can be a number or string with units. Used only if "Prefer CSS Page Size" is false. |
| Landscape | Whether to print the PDF in landscape orientation (true) or portrait (false). |
| Margin | Collection of margin sizes (top, bottom, left, right) for the PDF. Each can be a string with units. |
| Display Header/Footer | Whether to show header and footer in the PDF. |
| Header Template | HTML template for the header. Supports special classes to inject values like date, title, url, pageNumber, totalPages. Only shown if "Display Header/Footer" is true. |
| Footer Template | HTML template for the footer. Supports special classes like date. Only shown if "Display Header/Footer" is true. |
| Transparent Background | If true, hides the default white background allowing transparent PDFs. |
| Background Graphics | Whether to include background graphics in the PDF. |
| Query Parameters | Additional query parameters to append to the URL before loading the page. |
| Options | Various advanced Puppeteer launch and navigation options including: batch size, browser WebSocket endpoint, headers, device emulation, executable path, file name for output, launch arguments, timeouts, wait conditions, caching, headless mode, stealth mode, human typing simulation, proxy server, and container argument flags. |
Output
The node outputs an array of items, each containing:
- binary: An object with a property named as per the "Property Name" input, holding the PDF data as binary.
- json: Metadata about the HTTP response including:
headers: HTTP response headers from the page request.statusCode: HTTP status code of the page request.url: The final URL loaded (including query parameters).
The binary data represents the generated PDF file content.
Dependencies
- Requires Puppeteer and puppeteer-extra libraries with plugins for stealth and human typing modes.
- Optionally connects to an existing browser instance via WebSocket endpoint.
- May require an API key credential or token if connecting to a secured browser WebSocket endpoint.
- No internal credential names are exposed; users must configure appropriate API authentication tokens or keys as needed.
- Environment variables can influence behavior such as allowed built-in modules and external dependencies.
Troubleshooting
- Failed to launch/connect to browser: Indicates issues starting Puppeteer or connecting to a remote browser. Check executable path, WebSocket URL, network connectivity, and credentials.
- Request failed with status code XXX: The target URL returned an error HTTP status. Verify the URL is correct and accessible.
- Invalid URL: The provided URL or query parameters are malformed. Ensure the URL is valid and query parameters are properly formatted.
- Custom script must return an array of items: When using custom scripts, ensure the script returns an array of objects.
- Memory or CPU overload: Using large batch sizes or many simultaneous pages may exhaust system resources. Reduce batch size in options.
- Timeouts: Navigation or protocol timeouts can occur if the page takes too long to load. Adjust timeout settings accordingly.
- Binary data missing or corrupted: Ensure the "Property Name" is set correctly and that the node has permission to write binary data.