Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This node allows you to run custom JavaScript code using Puppeteer, a headless browser automation library. It provides direct access to Puppeteer's browser and page objects ($browser and $page) within your script, enabling advanced web scraping, automated browsing, or interaction with web pages.

Common scenarios include:

  • Extracting data from websites that require JavaScript rendering.
  • Automating form submissions or navigation flows.
  • Running complex scraping logic that cannot be achieved with simple HTTP requests.
  • Debugging or testing web pages by executing arbitrary scripts in a controlled browser environment.

For example, you can write a script to navigate to an IP lookup service, extract the IP address shown on the page, and return it as output.

Properties

Name Meaning
Script Code JavaScript code to execute with Puppeteer. You have access to $browser, $page, and $puppeteer variables representing the Puppeteer browser and page instances. Use this to define your custom automation or scraping logic.
Options A collection of optional settings to control Puppeteer behavior:
- Batch Size Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage.
- Browser WebSocket Endpoint WebSocket URL to connect to an existing browser instance instead of launching a new one.
- Emulate Device Select a device profile to emulate (e.g., mobile devices).
- Executable Path Path to the browser executable to use. Ignored if connecting via WebSocket endpoint.
- Extra Headers Custom HTTP headers to send with each request.
- File Name Filename for binary outputs like PDFs or screenshots (not applicable for custom script operation).
- Launch Arguments Additional command line arguments passed to the browser instance.
- Timeout Maximum navigation time in milliseconds (ignored for custom script operation).
- Wait Until When to consider navigation succeeded (ignored for custom script operation). Options: load, domcontentloaded, networkidle0, networkidle2.
- Page Caching Enable or disable page-level caching (default enabled).
- Headless Mode Run browser in headless mode (default true).
- Use Chrome Headless Shell Run browser in headless shell mode (requires headless mode enabled and chrome-headless-shell in PATH).
- Stealth Mode Apply techniques to make Puppeteer harder to detect as a bot.
- Proxy Server Use a custom proxy server for browser traffic (e.g., localhost:8080, socks5://localhost:1080).
- Add Container Arguments Automatically add recommended arguments for container environments (e.g., --no-sandbox). Default is true.

Output

The node expects the custom script to return an array of items, where each item is an object containing JSON data. The output structure is:

[
  {
    "json": {
      // Your returned key-value pairs from the script
    }
  }
]

If your script returns multiple items, they will be normalized accordingly.

Binary data output is not applicable for the "Run Custom Script" operation itself but is supported in other operations of the node (like screenshots or PDFs).

Dependencies

  • Requires Puppeteer and Puppeteer Extra libraries bundled with the node.
  • Supports stealth plugin to avoid detection when enabled.
  • Optionally connects to an existing browser instance via WebSocket.
  • No external API keys are required unless your script uses them explicitly.
  • Environment variables can influence allowed built-in and external modules for script execution.

Troubleshooting

  • Error: Custom script must return an array of items
    Ensure your script returns an array, e.g., return [{ key: value }];. Returning a single object or no return will cause this error.

  • Failed to launch/connect to browser
    Check that the browser executable path is correct or that the WebSocket endpoint is reachable. Also verify permissions and dependencies for running Puppeteer.

  • Request failed with status code XXX
    Indicates navigation to a URL failed or returned an unexpected HTTP status. Verify the URL and network connectivity.

  • Timeouts or slow performance
    Adjust batch size to reduce simultaneous pages or increase timeout settings if applicable.

  • Stealth mode not working as expected
    Some sites may still detect Puppeteer despite stealth mode. Consider additional evasion techniques or manual debugging.

  • Errors closing pages or browser
    These are logged but usually do not stop execution. They indicate cleanup issues after page operations.

Links and References

Discussion