Overview
This node uses Puppeteer, a headless browser automation library, to interact with web pages and generate PDF documents from them. The "Get PDF" operation navigates to a specified URL, optionally applies query parameters, and renders the page as a PDF file with customizable options such as page size, margins, orientation, scaling, headers/footers, and background settings.
Common scenarios where this node is beneficial include:
- Automatically generating PDFs of web reports or dashboards.
- Archiving web pages in PDF format for compliance or record-keeping.
- Creating printable versions of dynamic web content.
- Generating invoices, tickets, or other documents rendered as web pages.
Practical example:
- You want to convert a dynamically generated invoice page at
https://example.com/invoice/123into a PDF with A4 paper size, landscape orientation, and custom header/footer templates. This node can navigate to that URL, apply those settings, and output the PDF binary data for further use or storage.
Properties
| Name | Meaning |
|---|---|
| URL | The web page URL to load and render as PDF. |
| Property Name | The name of the binary property where the resulting PDF data will be stored. |
| Page Ranges | Specifies which pages to print, e.g., "1-5, 8, 11-13". Optional. |
| Scale | Scales the rendering of the web page; must be between 0.1 and 2. Default is 1. |
| Prefer CSS Page Size | If true, any CSS @page size declared on the page takes priority over width, height, or format options. |
| Format | Paper format type when printing PDF (e.g., Letter, Legal, Tabloid, Ledger, A0-A6). Only used if "Prefer CSS Page Size" is false. |
| Height | Custom paper height (number or string with unit). Used only if "Prefer CSS Page Size" is false and no format is set. |
| Width | Custom paper width (number or string with unit). Used only if "Prefer CSS Page Size" is false and no format is set. |
| Landscape | Whether to print in landscape orientation (true) or portrait (false). |
| Margin | Collection to specify top, bottom, left, and right margins for the PDF. |
| Display Header/Footer | Whether to show header and footer in the PDF. |
| Header Template | HTML template for the header when displaying header/footer. Supports classes like .date, .title, .url, .pageNumber, .totalPages. |
| Footer Template | HTML template for the footer when displaying header/footer. Supports classes like .date. |
| Transparent Background | If true, hides the default white background allowing transparent PDFs. |
| Background Graphics | If true, includes background graphics in the PDF. |
| Query Parameters | Key-value pairs of query parameters to append to the URL before loading the page. |
| Options | Additional Puppeteer launch and navigation options including: batch size, browser WebSocket endpoint, device emulation, executable path, extra HTTP headers, file name for output, launch arguments, timeout, waitUntil event, caching, headless mode, stealth mode, proxy server, and container argument toggling. |
Output
The node outputs an array of items corresponding to each input item processed. Each output item contains:
- A
binaryfield with a property named as per the "Property Name" input, containing the PDF file data in binary form. - A
jsonfield with metadata about the request, including:headers: HTTP response headers from the page request.statusCode: HTTP status code of the page response.url: The final URL loaded (including query parameters).
The binary data can be saved or passed to subsequent nodes for storage, emailing, or other processing.
Dependencies
- Requires Puppeteer and puppeteer-extra libraries with optional stealth plugin for headless browser automation.
- No direct external API keys are required, but the node needs network access to the target URLs.
- Optionally supports connecting to an existing browser instance via WebSocket endpoint.
- Environment should allow launching Chromium or Chrome browsers unless connecting remotely.
- For device emulation, uses Puppeteer's known devices list.
Troubleshooting
- Failed to launch/connect to browser: Indicates issues starting or connecting to the Chromium browser. Check executable path, permissions, and system dependencies. Using a remote browser requires a valid WebSocket URL.
- Invalid URL: The provided URL or query parameters may be malformed. Ensure URLs are properly formatted.
- Request failed with status code X: The page returned a non-200 HTTP status. Verify the URL is accessible and correct.
- Timeouts: Navigation may time out if the page takes too long to load. Adjust the "Timeout" property or check network conditions.
- PDF generation errors: Invalid margin values, unsupported formats, or conflicting options may cause failures. Review property values carefully.
- Memory/CPU usage: Opening many pages simultaneously (batch size) can consume significant resources. Reduce batch size if encountering performance issues.
- Stealth mode issues: Enabling stealth mode may help avoid detection but can sometimes cause unexpected behavior depending on the target site.