Puppeteer

Automate browser interactions using Puppeteer

Actions4

Overview

This node allows you to run custom JavaScript code using Puppeteer, a headless browser automation library. It provides direct access to Puppeteer's browser and page objects ($browser and $page) within your script, enabling advanced web scraping, automated browsing, or interaction with web pages.

Common scenarios include:

Extracting data from websites that require JavaScript rendering.
Automating form submissions or navigation flows.
Capturing screenshots or PDFs of web pages (though these are separate operations).
Running complex custom scripts that interact with the page DOM or network.

For example, you can write a script to navigate to an IP lookup service, extract the IP address shown on the page, and return it as output. This flexibility makes it ideal for users needing fine-grained control over browser automation beyond predefined operations.

Properties

Name	Meaning
Script Code	JavaScript code to execute with Puppeteer. You have access to `$browser`, `$page`, and `$puppeteer` variables representing the Puppeteer browser instance, page, and Puppeteer library respectively.
Options	Collection of optional settings to configure Puppeteer behavior:
- Batch Size	Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage.
- Browser WebSocket Endpoint	WebSocket URL to connect to an existing browser instance instead of launching a new one.
- Emulate Device	Select a device profile to emulate (e.g., iPhone, iPad).
- Executable Path	Path to the browser executable to use. Ignored if connecting via WebSocket.
- Extra Headers	Custom HTTP headers to send with requests. Specify multiple name-value pairs.
- File Name	Filename to assign to binary outputs (only relevant for screenshot or PDF operations, not used in custom script).
- Launch Arguments	Additional command line arguments to pass to the browser on launch.
- Timeout	Maximum navigation time in milliseconds. Disabled (0) by default for this operation.
- Wait Until	When to consider navigation succeeded (load, domcontentloaded, networkidle0, networkidle2). Not applicable for custom script operation.
- Page Caching	Enable or disable page-level caching. Defaults to enabled.
- Headless Mode	Run browser in headless mode (no UI). Defaults to true.
- Use Chrome Headless Shell	Run browser in headless shell mode (requires `chrome-headless-shell` in system PATH). Requires headless mode enabled.
- Stealth Mode	Apply techniques to make Puppeteer harder to detect as a bot. Defaults to false.
- Proxy Server	Use a custom proxy server for browser traffic (e.g., `localhost:8080`, `socks5://localhost:1080`).
- Add Container Arguments	Automatically add recommended arguments for container environments (e.g., `--no-sandbox`). Defaults to true.

Output

The node expects the custom script to return an array of items, where each item is an object containing JSON data. The output structure is:

[
  {
    "json": {
      // user-defined key-value pairs returned by the script
    }
  }
]

For example, if your script returns [ { ip: "1.2.3.4" } ], the output will contain an item with a JSON field holding the IP address.

If the script does not return an array, the node throws an error.

Note: This operation does not produce binary data outputs like screenshots or PDFs; those are handled by other operations.

Dependencies

Requires Puppeteer and puppeteer-extra libraries bundled with the node.
Optionally uses the stealth plugin to evade detection when enabled.
Supports connecting to an external browser instance via WebSocket.
No external API keys or credentials are required by default, but your script may perform authenticated requests if you provide necessary tokens inside the script.

Troubleshooting

Error: Custom script must return an array of items
Your script did not return an array. Ensure your script ends with a statement like return [{ key: value }];.
Failed to launch/connect to browser
Could be caused by invalid executable path, missing browser binaries, or incorrect WebSocket endpoint. Verify paths and URLs.
Invalid URL
If your script or parameters specify a URL, ensure it is valid and properly formatted.
Timeouts or navigation failures
Although timeout is disabled for this operation, network issues or page errors can cause failures. Check your script's navigation logic.
Memory or CPU overload
Setting batch size too high can exhaust resources. Reduce batch size if you encounter performance issues.
Stealth mode not working as expected
Some sites may still detect Puppeteer despite stealth mode. Consider additional evasion techniques or manual debugging.

Links and References

Puppeteer Documentation – Official Puppeteer API reference.
n8n Puppeteer Node Docs – General info about Puppeteer node usage in n8n.
Special Variables and Methods in n8n – Details on $page, $browser, and $puppeteer available in the script environment.