Puppeteer

Automate browser interactions using Puppeteer

Actions4

Overview

This node uses Puppeteer, a headless browser automation library, to interact with web pages and generate PDF documents from them. The "Get PDF" operation navigates to a specified URL, optionally applies query parameters, and renders the page as a PDF file with customizable options such as page size, margins, orientation, scaling, headers/footers, and background settings.

Common scenarios where this node is beneficial include:

Automatically generating PDFs of web reports or dashboards.
Archiving web pages in PDF format for compliance or record-keeping.
Creating printable versions of dynamic web content.
Generating invoices, tickets, or other documents rendered as web pages.

Practical example:

You want to convert a dynamically generated invoice page at https://example.com/invoice/123 into a PDF with A4 paper size, landscape orientation, and custom header/footer templates. This node can navigate to that URL, apply those settings, and output the PDF binary data for further use or storage.

Properties

Name	Meaning
URL	The web page URL to load and render as PDF.
Property Name	The name of the binary property where the resulting PDF data will be stored.
Page Ranges	Specifies which pages to print, e.g., "1-5, 8, 11-13". Optional.
Scale	Scales the rendering of the web page; must be between 0.1 and 2. Default is 1.
Prefer CSS Page Size	If true, any CSS `@page` size declared on the page takes priority over width, height, or format options.
Format	Paper format type when printing PDF (e.g., Letter, Legal, Tabloid, Ledger, A0-A6). Only used if "Prefer CSS Page Size" is false.
Height	Custom paper height (number or string with unit). Used only if "Prefer CSS Page Size" is false and no format is set.
Width	Custom paper width (number or string with unit). Used only if "Prefer CSS Page Size" is false and no format is set.
Landscape	Whether to print in landscape orientation (true) or portrait (false).
Margin	Collection to specify top, bottom, left, and right margins for the PDF.
Display Header/Footer	Whether to show header and footer in the PDF.
Header Template	HTML template for the header when displaying header/footer. Supports classes like `.date`, `.title`, `.url`, `.pageNumber`, `.totalPages`.
Footer Template	HTML template for the footer when displaying header/footer. Supports classes like `.date`.
Transparent Background	If true, hides the default white background allowing transparent PDFs.
Background Graphics	If true, includes background graphics in the PDF.
Query Parameters	Key-value pairs of query parameters to append to the URL before loading the page.
Options	Additional Puppeteer launch and navigation options including: batch size, browser WebSocket endpoint, device emulation, executable path, extra HTTP headers, file name for output, launch arguments, timeout, waitUntil event, caching, headless mode, stealth mode, proxy server, and container argument toggling.

Output

The node outputs an array of items corresponding to each input item processed. Each output item contains:

A binary field with a property named as per the "Property Name" input, containing the PDF file data in binary form.
A json field with metadata about the request, including:
- headers: HTTP response headers from the page request.
- statusCode: HTTP status code of the page response.
- url: The final URL loaded (including query parameters).

The binary data can be saved or passed to subsequent nodes for storage, emailing, or other processing.

Dependencies

Requires Puppeteer and puppeteer-extra libraries with optional stealth plugin for headless browser automation.
No direct external API keys are required, but the node needs network access to the target URLs.
Optionally supports connecting to an existing browser instance via WebSocket endpoint.
Environment should allow launching Chromium or Chrome browsers unless connecting remotely.
For device emulation, uses Puppeteer's known devices list.

Troubleshooting

Failed to launch/connect to browser: Indicates issues starting or connecting to the Chromium browser. Check executable path, permissions, and system dependencies. Using a remote browser requires a valid WebSocket URL.
Invalid URL: The provided URL or query parameters may be malformed. Ensure URLs are properly formatted.
Request failed with status code X: The page returned a non-200 HTTP status. Verify the URL is accessible and correct.
Timeouts: Navigation may time out if the page takes too long to load. Adjust the "Timeout" property or check network conditions.
PDF generation errors: Invalid margin values, unsupported formats, or conflicting options may cause failures. Review property values carefully.
Memory/CPU usage: Opening many pages simultaneously (batch size) can consume significant resources. Reduce batch size if encountering performance issues.
Stealth mode issues: Enabling stealth mode may help avoid detection but can sometimes cause unexpected behavior depending on the target site.