Playwright icon

Playwright

Automate browser interactions using Playwright

Overview

This node uses the Playwright library to interact with web pages programmatically, specifically to generate PDF documents from given URLs. It launches or connects to a Chromium browser instance, navigates to the specified URL, and produces a PDF snapshot of the page content according to user-defined settings.

Common scenarios where this node is beneficial include:

  • Automatically generating PDFs of web pages for archiving or reporting.
  • Creating printable versions of dynamic web content.
  • Capturing styled documents that respect CSS page size rules.
  • Batch processing multiple URLs to PDFs in workflows.

Practical example:

  • A marketing team wants to archive daily snapshots of competitor landing pages as PDFs for analysis.
  • A legal department needs to convert online contracts into standardized PDF files with custom headers and footers.

Properties

Name Meaning
URL The web address of the page to convert to PDF.
Property Name The name of the binary property where the generated PDF data will be stored.
Page Ranges Specifies which pages to print in the PDF (e.g., "1-5, 8, 11-13").
Scale Scales the rendering of the web page; must be between 0.1 and 2.
Prefer CSS Page Size If true, any CSS @page size declared on the page takes priority over width, height, or format options.
Format Paper format type when printing PDF (e.g., Letter, Legal, A4). Only used if "Prefer CSS Page Size" is false.
Height Custom paper height (number or string with unit). Used only if "Prefer CSS Page Size" is false and set.
Width Custom paper width (number or string with unit). Used only if "Prefer CSS Page Size" is false and set.
Landscape Whether to print the PDF in landscape orientation.
Margin Collection to set PDF margins: top, bottom, left, right.
Display Header/Footer Whether to show header and footer in the PDF.
Header Template HTML template for the print header, supporting classes like .date, .title, .url, .pageNumber, .totalPages.
Footer Template HTML template for the print footer, supporting classes like .date.
Transparent Background If true, hides the default white background allowing transparent PDFs.
Background Graphics Whether to include background graphics in the PDF.
Query Parameters Additional query parameters to append to the URL before loading the page.
Options Various advanced options including:
- Batch Size: number of pages processed simultaneously.
- Browser WebSocket Endpoint: connect to existing browser.
- Emulate Device: emulate specific device viewport.
- Executable Path: path to browser executable.
- Extra Headers: HTTP headers to send.
- File Name: filename for binary data.
- Launch Arguments: additional command line args.
- Timeout: max navigation time.
- Protocol Timeout: max protocol response wait.
- Wait Until: event to consider navigation succeeded.
- Page Caching: enable/disable caching.
- Headless mode: run browser headless.
- Proxy Server: proxy configuration.
- Add Container Arguments: add recommended container args.

Output

The node outputs an array of items corresponding to each input item processed. Each output item contains:

  • binary: A binary property named as per the "Property Name" input, containing the generated PDF file data.
  • json: Metadata about the HTTP response including:
    • headers: HTTP response headers from the page request.
    • statusCode: HTTP status code of the page response.
    • url: The final URL loaded (including query parameters).

The binary data represents the PDF document generated from the webpage.

Dependencies

  • Requires the Playwright library to launch or connect to a Chromium browser instance.
  • May require an API key or authentication if accessing protected URLs (handled externally).
  • Environment variables can influence behavior, e.g., executable path for the browser.
  • Supports connecting to an existing browser via WebSocket endpoint.
  • Optional device emulation depends on Playwright's predefined devices.

Troubleshooting

  • Failed to launch/connect to browser: Check that the browser executable path is correct or that the WebSocket endpoint is reachable. Ensure required dependencies for Playwright are installed.
  • Invalid URL error: Verify the URL syntax and ensure it is properly formatted.
  • Request failed with status code >= 400: The target page returned an error; check URL accessibility and permissions.
  • Timeout errors: Increase the navigation timeout or protocol timeout if pages take longer to load.
  • Failed to generate PDF: Could be due to page load issues or invalid PDF options; verify all PDF-related properties.
  • Memory or CPU overload: Reduce batch size to limit simultaneous page loads.
  • Container environment issues: Enable "Add Container Arguments" to add recommended flags for running Chromium in containers.

Links and References

Discussion