Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This node allows you to run custom JavaScript code using Puppeteer, a headless browser automation library. It provides direct access to Puppeteer's browser and page objects ($browser and $page) within your script, enabling advanced web scraping, automated browsing, or interaction with web pages.

Common scenarios include:

  • Extracting data from websites that require JavaScript rendering.
  • Automating form submissions or navigation flows.
  • Capturing screenshots or PDFs of web pages (though these are separate operations).
  • Running complex custom scripts that interact with the page DOM or network.

For example, you can write a script to navigate to an IP lookup service, extract the IP address shown on the page, and return it as output. This flexibility makes it ideal for users needing fine-grained control over browser automation beyond predefined operations.

Properties

Name Meaning
Script Code JavaScript code to execute with Puppeteer. You have access to $browser, $page, and $puppeteer variables representing the Puppeteer browser instance, page, and Puppeteer library respectively.
Options Collection of optional settings to configure Puppeteer behavior:
- Batch Size Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage.
- Browser WebSocket Endpoint WebSocket URL to connect to an existing browser instance instead of launching a new one.
- Emulate Device Select a device profile to emulate (e.g., iPhone, iPad).
- Executable Path Path to the browser executable to use. Ignored if connecting via WebSocket.
- Extra Headers Custom HTTP headers to send with requests. Specify multiple name-value pairs.
- File Name Filename to assign to binary outputs (only relevant for screenshot or PDF operations, not used in custom script).
- Launch Arguments Additional command line arguments to pass to the browser on launch.
- Timeout Maximum navigation time in milliseconds. Disabled (0) by default for this operation.
- Wait Until When to consider navigation succeeded (load, domcontentloaded, networkidle0, networkidle2). Not applicable for custom script operation.
- Page Caching Enable or disable page-level caching. Defaults to enabled.
- Headless Mode Run browser in headless mode (no UI). Defaults to true.
- Use Chrome Headless Shell Run browser in headless shell mode (requires chrome-headless-shell in system PATH). Requires headless mode enabled.
- Stealth Mode Apply techniques to make Puppeteer harder to detect as a bot. Defaults to false.
- Proxy Server Use a custom proxy server for browser traffic (e.g., localhost:8080, socks5://localhost:1080).
- Add Container Arguments Automatically add recommended arguments for container environments (e.g., --no-sandbox). Defaults to true.

Output

The node expects the custom script to return an array of items, where each item is an object containing JSON data. The output structure is:

[
  {
    "json": {
      // user-defined key-value pairs returned by the script
    }
  }
]

For example, if your script returns [ { ip: "1.2.3.4" } ], the output will contain an item with a JSON field holding the IP address.

If the script does not return an array, the node throws an error.

Note: This operation does not produce binary data outputs like screenshots or PDFs; those are handled by other operations.

Dependencies

  • Requires Puppeteer and puppeteer-extra libraries bundled with the node.
  • Optionally uses the stealth plugin to evade detection when enabled.
  • Supports connecting to an external browser instance via WebSocket.
  • No external API keys or credentials are required by default, but your script may perform authenticated requests if you provide necessary tokens inside the script.

Troubleshooting

  • Error: Custom script must return an array of items
    Your script did not return an array. Ensure your script ends with a statement like return [{ key: value }];.

  • Failed to launch/connect to browser
    Could be caused by invalid executable path, missing browser binaries, or incorrect WebSocket endpoint. Verify paths and URLs.

  • Invalid URL
    If your script or parameters specify a URL, ensure it is valid and properly formatted.

  • Timeouts or navigation failures
    Although timeout is disabled for this operation, network issues or page errors can cause failures. Check your script's navigation logic.

  • Memory or CPU overload
    Setting batch size too high can exhaust resources. Reduce batch size if you encounter performance issues.

  • Stealth mode not working as expected
    Some sites may still detect Puppeteer despite stealth mode. Consider additional evasion techniques or manual debugging.

Links and References

Discussion