Overview
This node uses Puppeteer, a headless browser automation library, to interact with web pages and generate PDF documents from them. The "Get PDF" operation navigates to a specified URL, optionally applies query parameters, and renders the page as a PDF file with customizable options such as page size, margins, orientation, headers/footers, background graphics, and scaling.
Common scenarios where this node is beneficial include:
- Automatically generating PDFs of web reports or dashboards.
- Archiving web pages in PDF format for compliance or record-keeping.
- Creating printable versions of dynamic web content.
- Automating PDF generation for invoices, tickets, or other documents rendered via web pages.
Practical example:
- You want to convert a dynamically generated invoice page at
https://example.com/invoice?id=123into a PDF with A4 paper size, landscape orientation, and custom header/footer templates. This node can navigate to that URL, apply the query parameterid=123, and produce a PDF binary output ready for further processing or storage.
Properties
| Name | Meaning |
|---|---|
| URL | The web page URL to navigate to and render as PDF. |
| Property Name | The name of the binary property where the resulting PDF data will be stored. |
| Page Ranges | Specifies which pages to print, e.g., "1-5, 8, 11-13". Optional. |
| Scale | Scales the rendering of the web page; must be between 0.1 and 2. Default is 1 (normal scale). |
| Prefer CSS Page Size | If true, any CSS @page size declared on the page takes priority over width, height, or format options. |
| Format | Paper format type when printing PDF (e.g., Letter, Legal, A4). Used only if "Prefer CSS Page Size" is false. |
| Height | Custom paper height (number or string with unit). Used only if "Prefer CSS Page Size" is false and no format is set. |
| Width | Custom paper width (number or string with unit). Used only if "Prefer CSS Page Size" is false and no format is set. |
| Landscape | Whether to print the PDF in landscape orientation (true) or portrait (false). |
| Margin | Collection to specify top, bottom, left, and right margins for the PDF. |
| Display Header/Footer | Whether to show header and footer in the PDF. |
| Header Template | HTML template for the header. Supports classes like .date, .title, .url, .pageNumber, .totalPages to inject values. Only used if "Display Header/Footer" is true. |
| Footer Template | HTML template for the footer. Supports .date class for formatted print date. Only used if "Display Header/Footer" is true. |
| Transparent Background | If true, hides the default white background allowing transparent PDFs. |
| Background Graphics | If true, includes background graphics in the PDF. |
| Query Parameters | Key-value pairs appended as query parameters to the URL before navigation. |
| Options | Various advanced options including: batch size (max pages opened simultaneously), browser WebSocket endpoint and authorization, device emulation, executable path, extra HTTP headers, launch arguments, timeouts, caching, headless mode, stealth mode, human typing simulation, proxy server, and container environment optimizations. |
| File Name | Filename to assign to the binary PDF data. |
Output
The node outputs an array of items, each containing:
- binary: An object with a property named as per the "Property Name" input (default
"data"), holding the PDF file data as binary. - json: Metadata about the response including:
headers: HTTP response headers from the page request.statusCode: HTTP status code of the page response.url: The final URL after applying query parameters.
The binary data represents the generated PDF document, ready for saving or further processing.
Dependencies
- Requires Puppeteer and puppeteer-extra libraries with plugins for stealth and human typing modes.
- Can connect to a local or remote Chromium-based browser instance, either by launching one or connecting via a WebSocket endpoint.
- Supports configuration of browser executable path, launch arguments, proxy servers, and device emulation.
- Requires network access to the target URLs.
- No internal credential names are exposed; however, if authentication is needed for the browser WebSocket connection, an authorization header can be provided.
Troubleshooting
- Invalid URL error: Occurs if the provided URL is malformed or cannot be parsed. Ensure the URL is valid and properly formatted.
- Request failed with status code XXX: Indicates the page returned an HTTP error (e.g., 404, 500). Verify the URL and server availability.
- Failed to launch/connect to browser: Happens if Puppeteer cannot start or connect to the browser instance. Check executable path, WebSocket endpoint, and required permissions.
- Timeout errors: Navigation or protocol timeouts may occur if the page takes too long to load. Adjust the timeout settings or verify network conditions.
- Memory or CPU overload: Using a large batch size to open many pages simultaneously can exhaust system resources. Reduce batch size accordingly.
- Stealth mode issues: Enabling stealth mode may cause unexpected behavior on some sites. Disable it if problems arise.
- Human typing mode: Intended for simulating realistic typing; disable if not needed to improve performance.