Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This node allows you to run custom JavaScript code using Puppeteer, a headless browser automation library. It provides direct access to Puppeteer's browser and page objects ($browser and $page) within a sandboxed environment, enabling advanced web scraping, automation, or testing scenarios.

Typical use cases include:

  • Extracting data from dynamic websites that require JavaScript execution.
  • Automating form submissions or interactions on web pages.
  • Taking screenshots or generating PDFs of web pages (though these are separate operations).
  • Running any custom Puppeteer script tailored to specific needs, such as IP lookups, content extraction, or navigation flows.

For example, the default script navigates to an IP lookup service, extracts the IP address from the page content, logs it, and returns it as output.

Properties

Name Meaning
Script Code JavaScript code to execute with Puppeteer. You have access to the $browser, $page, and $puppeteer objects representing the Puppeteer browser instance, page, and Puppeteer library respectively. This code runs in a sandboxed VM and must return an array of items.
Use <code>$page</code>, <code>$browser</code>, or <code>$puppeteer</code> vars to access Puppeteer. Special vars/methods are available. Debug by using console.log() statements and viewing their output in the browser console. Informational notice explaining how to use Puppeteer variables and debug scripts.
Options Collection of optional settings controlling Puppeteer behavior:
  Batch Size Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage.
  Browser WebSocket Endpoint WebSocket URL to connect to an existing browser instance instead of launching a new one.
  Browser WebSocket Headers HTTP headers to send when connecting to the browser WebSocket endpoint.
  Emulate Device Emulate a specific device profile (viewport size, user agent, etc.) from Puppeteer's known devices.
  Executable path Path to a bundled browser executable to launch. Ignored if connecting via WebSocket.
  Extra Headers Additional HTTP headers to send with each page request.
  File Name File name to assign to binary outputs (only relevant for screenshot or PDF operations).
  Launch Arguments Extra command line arguments passed to the browser process.
  Timeout Maximum navigation time in milliseconds (not applicable to "Run Custom Script" operation).
  Protocol Timeout Maximum time to wait for protocol responses in milliseconds.
  Wait Until When to consider navigation succeeded (load, domcontentloaded, networkidle0, networkidle2). Not applicable to "Run Custom Script".
  Page Caching Enable or disable page-level caching (default true).
  Headless mode Run browser in headless mode (default true).
  Use Chrome Headless Shell Run browser in headless shell mode (requires headless mode enabled and chrome-headless-shell in PATH).
  Stealth mode Apply techniques to make Puppeteer harder to detect as headless.
  Human typing mode Enables .typeHuman() function on pages to simulate human-like typing.
  Human Typing Options Configuration for human typing delays, typo chances, and backspace simulation.
  Proxy Server Proxy server configuration string (e.g., localhost:8080, socks5://localhost:1080).
  Add Container Arguments Automatically add recommended arguments for container environments like --no-sandbox (default true).

Output

The node expects the custom script to return an array of items. Each item is an object that will be normalized and returned as the node's output JSON data.

  • The output JSON structure depends entirely on what the custom script returns.
  • For example, the default script returns an array with one object containing an ip field holding the extracted IP address plus all original input JSON fields spread into it.
  • If the script returns binary data (not typical for this operation), it would be handled accordingly, but this operation focuses on JSON results.
  • Console logs inside the script appear in the workflow execution logs or UI console depending on the mode.

Dependencies

  • Requires Puppeteer and Puppeteer Extra libraries with plugins for stealth and human typing.
  • Supports connecting to an existing browser instance via WebSocket or launching a new Chromium browser.
  • Optional device emulation uses Puppeteer's known device profiles.
  • May require an API key credential or authentication token if connecting to a secured browser WebSocket endpoint.
  • Environment variables can control allowed built-in and external modules for the sandboxed script execution.
  • Recommended container arguments are added automatically for running inside containerized environments.

Troubleshooting

  • Common issues:

    • Script does not return an array: The node throws an error if the custom script does not return an array of items.
    • Browser launch/connect failures: Errors occur if the browser executable path is invalid or WebSocket connection fails.
    • Navigation errors: If a page navigation returns a status code >= 400, the node reports a request failure.
    • Resource limits: Opening too many pages simultaneously (batch size) may cause high memory/CPU usage or crashes.
    • Script runtime errors: Any exceptions thrown inside the custom script are caught and reported with context.
  • Error messages and resolutions:

    • "Custom script must return an array of items...": Ensure your script ends with return [{...}] or similar returning an array.
    • "Failed to launch/connect to browser: ...": Verify browser executable path, WebSocket URL, credentials, and network connectivity.
    • "Request failed with status code XXX": Check the target URL and network availability.
    • "Invalid URL: ...": Confirm URLs used in the script or parameters are valid and properly formatted.
  • Use console.log() inside your script to debug and inspect variables during execution.

Links and References

Discussion