Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This node uses Puppeteer, a headless browser automation library, to interact with web pages programmatically. Specifically, the Get Screenshot operation captures screenshots of web pages given their URLs. It supports capturing full-page screenshots or just the visible viewport, and allows customization of image format and quality.

Common scenarios where this node is beneficial include:

  • Automatically generating website previews or thumbnails.
  • Monitoring visual changes on web pages over time.
  • Archiving web page appearances for compliance or record-keeping.
  • Creating images for reports or social media from dynamic web content.

For example, you can input a URL of a product page and get a PNG screenshot of the entire page, which can then be used in marketing materials or automated reports.

Properties

Name Meaning
URL The web address of the page to capture. Required.
Property Name The name of the binary property where the screenshot image data will be stored in the output.
Type The image format for the screenshot. Options: PNG, JPEG, or WebP.
Quality Image quality from 0 to 100, applicable only for JPEG and WebP formats (ignored for PNG). Default is 100.
Full Page Whether to capture the entire scrollable page (true) or just the visible viewport (false).
Query Parameters Additional query parameters to append to the URL before loading the page. Each parameter has a name and value.
Batch Size Maximum number of pages to open simultaneously. Higher values increase resource usage. Default is 1.
Browser WebSocket Endpoint Optional WebSocket URL to connect to an existing browser instance instead of launching a new one.
Emulate Device Optionally emulate a specific device's screen size and user agent (e.g., iPhone, iPad).
Executable Path Path to a custom browser executable to use instead of the bundled one. Ignored if connecting via WebSocket.
Extra Headers Custom HTTP headers to send with the page request.
File Name Filename to assign to the binary data output. Only applies to screenshot and PDF operations.
Launch Arguments Additional command line arguments to pass to the browser instance when launching.
Timeout Maximum navigation timeout in milliseconds. Set to 0 to disable timeout. Default is 30,000 ms.
Wait Until When to consider navigation successful. Options: load, domcontentloaded, networkidle0, networkidle2. Default is load.
Page Caching Enable or disable page-level caching. Defaults to enabled (true).
Headless mode Run the browser in headless mode (no UI). Defaults to true.
Use Chrome Headless Shell Run browser in headless shell mode, requires chrome-headless-shell in system path. Defaults to false.
Stealth mode Apply techniques to make headless browser detection harder. Defaults to false.
Proxy Server Use a custom proxy server for browser requests (e.g., localhost:8080, socks5://localhost:1080).
Add Container Arguments Add recommended launch arguments for container environments (e.g., --no-sandbox). Defaults to true.

Output

The node outputs items containing:

  • A binary property named as specified by the "Property Name" input, holding the screenshot image data in the chosen format (PNG, JPEG, or WebP).
  • A json object with metadata about the response, including:
    • headers: HTTP response headers from the page request.
    • statusCode: HTTP status code of the page load (should be 200 for success).
    • url: The final URL loaded (including any appended query parameters).

The binary data can be used downstream in workflows for saving files, sending emails, or further processing.

Dependencies

  • Requires Puppeteer and puppeteer-extra libraries for browser automation.
  • Optionally uses the puppeteer-extra-plugin-stealth plugin to evade detection when stealth mode is enabled.
  • If using device emulation, relies on Puppeteer's known device descriptors.
  • No internal credential types are required, but if accessing pages behind authentication, users must handle that externally or via custom headers.
  • Environment variables can influence behavior, e.g., enabling stdout logging or allowing external modules in custom scripts.
  • For advanced use, a running browser instance can be connected via WebSocket endpoint.

Troubleshooting

  • Failed to launch/connect to browser:
    This error indicates Puppeteer could not start or connect to a browser instance. Check that the executable path is correct, dependencies are installed, and no conflicting processes block browser launch. Also verify that the WebSocket endpoint URL is valid if used.

  • Invalid URL:
    If the provided URL is malformed or cannot be parsed, the node will throw an error. Ensure URLs are complete and properly formatted.

  • Request failed with status code X:
    Non-200 HTTP responses indicate the page did not load successfully. Verify the URL is accessible and not blocked by firewalls or requiring authentication.

  • Timeout errors:
    If navigation takes longer than the configured timeout, increase the timeout value or check network conditions.

  • Binary data missing or empty:
    Could happen if screenshot capture fails silently. Check logs for errors during screenshot generation.

  • Stealth mode issues:
    Enabling stealth mode may cause unexpected behavior on some sites. Disable it if problems occur.

  • Resource limits:
    Opening many pages simultaneously (batch size) can exhaust memory or CPU. Reduce batch size if performance degrades.

Links and References

Discussion