Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This node uses Puppeteer, a headless browser automation library, to interact with web pages and generate PDF documents from them. The "Get PDF" operation navigates to a specified URL, optionally applies query parameters, and renders the page as a PDF file with customizable options such as page size, margins, orientation, scaling, headers/footers, and background settings.

Common scenarios where this node is beneficial include:

  • Automatically generating PDFs of web reports or dashboards.
  • Archiving web pages in PDF format for compliance or record-keeping.
  • Creating printable versions of dynamic web content.
  • Generating invoices, tickets, or other documents rendered via web technologies.

Practical example: You want to create a PDF snapshot of a sales dashboard hosted on an internal website. You provide the dashboard URL, specify A4 paper size, enable header/footer with custom HTML templates, and set margins. The node fetches the page, renders it as a PDF, and outputs the binary PDF data for further use or storage.

Properties

Name Meaning
URL The web address of the page to convert into a PDF.
Property Name The name of the binary property where the generated PDF data will be stored.
Page Ranges Specifies which pages to print, e.g., "1-5, 8, 11-13". Optional.
Scale Scales the rendering of the web page; must be between 0.1 and 2. Default is 1 (normal scale).
Prefer CSS Page Size If true, any CSS @page size declared in the page takes priority over width, height, or format options.
Format Paper format type when printing PDF (e.g., Letter, Legal, Tabloid, Ledger, A0-A6). Only used if "Prefer CSS Page Size" is false.
Height Custom paper height (number or string with unit). Used only if "Prefer CSS Page Size" is false.
Width Custom paper width (number or string with unit). Used only if "Prefer CSS Page Size" is false.
Landscape Whether to print the PDF in landscape orientation (true) or portrait (false).
Margin Collection to set PDF margins: top, bottom, left, right. Each margin can be a string with units (e.g., "10mm").
Display Header/Footer Whether to show header and footer in the PDF.
Header Template HTML template for the header. Supports classes like .date, .title, .url, .pageNumber, .totalPages to inject dynamic values. Only shown if "Display Header/Footer" is true.
Footer Template HTML template for the footer. Supports .date class for formatted print date. Only shown if "Display Header/Footer" is true.
Transparent Background If true, hides the default white background allowing transparent PDFs.
Background Graphics If true, includes background graphics in the PDF.
Query Parameters Key-value pairs appended as query parameters to the URL before loading the page.
Options Additional Puppeteer launch and navigation options including: batch size, browser WebSocket endpoint, device emulation, executable path, extra HTTP headers, file name for output, launch arguments, timeout, waitUntil event, caching, headless mode, stealth mode, proxy server, and container environment arguments.

Output

The node outputs items containing:

  • A binary property (name defined by "Property Name") holding the generated PDF file data in standard PDF format (application/pdf).
  • A JSON object with metadata about the request, including:
    • headers: HTTP response headers from the page request.
    • statusCode: HTTP status code of the page response.
    • url: The final URL loaded (including query parameters).

This allows downstream nodes to access both the PDF binary data and related HTTP metadata.

Dependencies

  • Requires Puppeteer and puppeteer-extra libraries for browser automation.
  • Optionally uses a stealth plugin to avoid detection when enabled.
  • Can connect to a remote browser instance via WebSocket or launch a local Chromium/Chrome browser.
  • May require an API key or authentication if the target URL requires it (handled externally).
  • Environment variables can control allowed Node.js built-in modules and external modules for script execution.
  • For containerized environments, recommended launch arguments are added automatically unless disabled.

Troubleshooting

  • Invalid URL error: Occurs if the provided URL is malformed. Ensure the URL is valid and properly encoded.
  • Request failed with status code X: The page returned a non-200 HTTP status. Check the URL accessibility and permissions.
  • Failed to launch/connect to browser: Indicates issues starting Puppeteer or connecting to a remote browser. Verify executable paths, WebSocket URLs, and system dependencies.
  • Custom script errors: When running custom scripts, ensure they return an array of items as expected.
  • Timeouts: Navigation may time out if the page takes too long to load. Adjust the "Timeout" property or check network conditions.
  • Memory/CPU usage: Opening many pages simultaneously (batch size) can consume significant resources. Reduce batch size if encountering performance issues.
  • Headless detection: Some sites detect headless browsers. Enable stealth mode to reduce detection risk.
  • Proxy configuration: Incorrect proxy server strings can cause connection failures. Use correct formats like localhost:8080 or socks5://localhost:1080.

Links and References

Discussion