Overview
This node uses the Playwright library to interact with web pages programmatically, specifically to generate PDF documents from given URLs. It launches or connects to a Chromium browser instance, navigates to the specified URL, and produces a PDF snapshot of the page content according to user-defined settings.
Common scenarios where this node is beneficial include:
- Automatically generating PDFs of web pages for archiving or reporting.
- Creating printable versions of dynamic web content.
- Capturing styled documents that respect CSS page size rules.
- Batch processing multiple URLs to PDFs in workflows.
Practical example:
- A marketing team wants to archive daily snapshots of competitor landing pages as PDFs for analysis.
- A legal department needs to convert online contracts into standardized PDF files with custom headers and footers.
Properties
| Name | Meaning |
|---|---|
| URL | The web address of the page to convert to PDF. |
| Property Name | The name of the binary property where the generated PDF data will be stored. |
| Page Ranges | Specifies which pages to print in the PDF (e.g., "1-5, 8, 11-13"). |
| Scale | Scales the rendering of the web page; must be between 0.1 and 2. |
| Prefer CSS Page Size | If true, any CSS @page size declared on the page takes priority over width, height, or format options. |
| Format | Paper format type when printing PDF (e.g., Letter, Legal, A4). Only used if "Prefer CSS Page Size" is false. |
| Height | Custom paper height (number or string with unit). Used only if "Prefer CSS Page Size" is false and set. |
| Width | Custom paper width (number or string with unit). Used only if "Prefer CSS Page Size" is false and set. |
| Landscape | Whether to print the PDF in landscape orientation. |
| Margin | Collection to set PDF margins: top, bottom, left, right. |
| Display Header/Footer | Whether to show header and footer in the PDF. |
| Header Template | HTML template for the print header, supporting classes like .date, .title, .url, .pageNumber, .totalPages. |
| Footer Template | HTML template for the print footer, supporting classes like .date. |
| Transparent Background | If true, hides the default white background allowing transparent PDFs. |
| Background Graphics | Whether to include background graphics in the PDF. |
| Query Parameters | Additional query parameters to append to the URL before loading the page. |
| Options | Various advanced options including: - Batch Size: number of pages processed simultaneously. - Browser WebSocket Endpoint: connect to existing browser. - Emulate Device: emulate specific device viewport. - Executable Path: path to browser executable. - Extra Headers: HTTP headers to send. - File Name: filename for binary data. - Launch Arguments: additional command line args. - Timeout: max navigation time. - Protocol Timeout: max protocol response wait. - Wait Until: event to consider navigation succeeded. - Page Caching: enable/disable caching. - Headless mode: run browser headless. - Proxy Server: proxy configuration. - Add Container Arguments: add recommended container args. |
Output
The node outputs an array of items corresponding to each input item processed. Each output item contains:
- binary: A binary property named as per the "Property Name" input, containing the generated PDF file data.
- json: Metadata about the HTTP response including:
headers: HTTP response headers from the page request.statusCode: HTTP status code of the page response.url: The final URL loaded (including query parameters).
The binary data represents the PDF document generated from the webpage.
Dependencies
- Requires the Playwright library to launch or connect to a Chromium browser instance.
- May require an API key or authentication if accessing protected URLs (handled externally).
- Environment variables can influence behavior, e.g., executable path for the browser.
- Supports connecting to an existing browser via WebSocket endpoint.
- Optional device emulation depends on Playwright's predefined devices.
Troubleshooting
- Failed to launch/connect to browser: Check that the browser executable path is correct or that the WebSocket endpoint is reachable. Ensure required dependencies for Playwright are installed.
- Invalid URL error: Verify the URL syntax and ensure it is properly formatted.
- Request failed with status code >= 400: The target page returned an error; check URL accessibility and permissions.
- Timeout errors: Increase the navigation timeout or protocol timeout if pages take longer to load.
- Failed to generate PDF: Could be due to page load issues or invalid PDF options; verify all PDF-related properties.
- Memory or CPU overload: Reduce batch size to limit simultaneous page loads.
- Container environment issues: Enable "Add Container Arguments" to add recommended flags for running Chromium in containers.