Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This node allows running custom JavaScript code using Puppeteer, a headless browser automation library. It provides full access to Puppeteer's browser and page objects ($browser and $page), enabling advanced web scraping, automation, and interaction scenarios within n8n workflows.

Typical use cases include:

  • Extracting data from websites that require JavaScript rendering.
  • Automating form submissions or navigation flows.
  • Downloading files or screenshots from dynamic pages.
  • Running complex scripts that interact with the page DOM or network requests.

For example, you can write a script to navigate to a webpage, extract the IP address shown on the page, or download files by intercepting network requests.

Properties

Name Meaning
Script Code JavaScript code to execute with Puppeteer. You have access to $browser, $page, $fetch, and $puppeteer variables representing Puppeteer browser and page instances. The script must return an array of items for output.
Options Collection of optional settings:
- Batch Size Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage.
- Browser WebSocket Endpoint WebSocket URL to connect to an existing browser instance instead of launching a new one.
- Emulate Device Select a device profile to emulate (e.g., iPhone, iPad).
- Executable Path Path to the browser executable Puppeteer should use. Ignored if connecting via WebSocket.
- Extra Headers Custom HTTP headers to send with each request.
- File Name Filename to assign to binary outputs (only applies to certain operations like PDF or screenshot capture).
- Launch Arguments Additional command line arguments to pass to the browser instance. Ignored if connecting via WebSocket.
- Timeout Maximum navigation time in milliseconds. Pass 0 to disable timeout. Does not affect "Run Custom Script" operation.
- Wait Until When to consider navigation succeeded (load, domcontentloaded, networkidle0, networkidle2). Does not affect "Run Custom Script" operation.
- Page Caching Enable or disable page-level caching (default true).
- Headless mode Run browser in headless mode (default true).
- Use Chrome Headless Shell Run browser in headless shell mode (requires headless mode enabled and chrome-headless-shell in PATH).
- Stealth mode Apply techniques to make headless Puppeteer harder to detect (default false).
- Proxy Server Use a custom proxy server (e.g., localhost:8080, socks5://localhost:1080).

Output

The node expects the custom script to return an array of items formatted as n8n items:

  • Each item is an object with a json property containing extracted data.
  • Items may optionally include a binary property for binary data such as downloaded files or screenshots.
  • Binary data is base64 encoded and includes metadata like filename and MIME type.

Example output item structure:

{
  "json": {
    "key": "value"
  },
  "binary": {
    "fileName.ext": {
      "data": "<base64-encoded-content>",
      "fileName": "fileName.ext",
      "mimeType": "application/octet-stream"
    }
  }
}

If the script does not return an array, the node throws an error.

Dependencies

  • Requires Puppeteer and puppeteer-extra libraries with stealth and recaptcha plugins.
  • Supports integration with a two-factor captcha solving service via API key (optional).
  • Uses a sandboxed VM environment to safely run user-provided JavaScript code.
  • Access to Puppeteer browser and page objects is provided automatically.
  • No additional external services are mandatory unless used in the custom script.

Troubleshooting

  • Error: Custom script must return an array of items
    Ensure your script returns an array, e.g., return [{ json: { key: value } }];.

  • Failed to launch/connect to browser
    Check that the browser executable path is correct or the WebSocket endpoint is reachable. Verify system dependencies for Puppeteer.

  • Request failed with status code XXX
    The page navigation returned an HTTP error. Verify the URL and network connectivity.

  • Timeouts or slow performance
    Adjust batch size to limit concurrent pages. Increase timeout if needed (though it does not affect custom script operation).

  • Binary data issues
    Make sure to encode binary content as base64 and provide proper MIME types and filenames.

  • Stealth mode detection
    If sites detect headless browsing, enable stealth mode to reduce detection risk.

Links and References

Discussion