Puppeteer Cartier icon

Puppeteer Cartier

Automate browser interactions using Puppeteer

Overview

This node integrates Puppeteer, a powerful browser automation library, into n8n workflows. It allows users to programmatically control a Chromium-based browser to perform tasks such as loading web pages, taking screenshots, generating PDFs, and running custom scripts within the browser context.

Common scenarios where this node is beneficial include:

  • Automating website data extraction or scraping.
  • Generating visual snapshots or PDFs of web pages for reporting.
  • Testing web page rendering or behavior under different devices or conditions.
  • Running custom JavaScript in the browser environment to interact with dynamic content.

For example, you could use this node to automatically capture screenshots of product pages on an e-commerce site or generate PDFs of invoices from a web application.

Properties

Name Meaning
Batch Size Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage.
Browser WebSocket Endpoint Authorization Authorization header value used when connecting to an existing browser instance via WebSocket. Only applies if "Browser WebSocket Endpoint" is set.
Browser WebSocket Endpoint WebSocket URL of an already running browser to connect to instead of launching a new one.
Emulate Device Emulates a specific device's viewport and user agent (e.g., iPhone, iPad). Options are loaded dynamically from known Puppeteer devices.
Executable path Path to the browser executable to launch. Ignored if connecting to a browser via WebSocket.
Extra Headers Additional HTTP headers to send with each request. Specify multiple name-value pairs.
File Name Filename to assign to binary output files (screenshots or PDFs).
Launch Arguments Additional command line arguments passed to the browser on launch. Ignored if connecting via WebSocket.
Timeout Maximum navigation time in milliseconds before timing out. Set to 0 to disable timeout. Does not affect custom script execution.
Protocol Timeout Maximum time in milliseconds to wait for protocol responses. Set to 0 to disable timeout.
Wait Until Defines when navigation is considered finished: options include 'load', 'domcontentloaded', 'networkidle0', and 'networkidle2'.
Page Caching Enables or disables page-level caching. Defaults to enabled.
Headless mode Runs the browser in headless mode (no GUI). Defaults to true.
Use Chrome Headless Shell Runs the browser in headless shell mode, which requires headless mode enabled and the chrome-headless-shell binary available in PATH. Defaults to false.
Stealth mode Applies techniques to make Puppeteer harder to detect as a bot. Defaults to false.
Human typing mode Enables a human-like typing function .typeHuman() that simulates natural typing delays and typos. Defaults to false.
Human Typing Options Configuration for human typing behavior, including delays between keystrokes, backspace delays, and typo chances. Visible only if Human typing mode is enabled.
Proxy Server Configures a proxy server for the browser to use (e.g., localhost:8080, socks5://localhost:1080).
Add Container Arguments Adds recommended arguments for running Puppeteer inside container environments (--no-sandbox, --disable-setuid-sandbox, etc.). Defaults to true.

Output

The node outputs an array of items corresponding to the input items processed. Each output item contains:

  • json: Metadata about the page request, including HTTP headers, status code, and URL.
  • binary (optional): Contains binary data for operations that produce files:
    • For screenshot operations, the binary field holds image data (PNG, JPEG, etc.).
    • For PDF generation, the binary field contains the generated PDF file.

If the operation is "Run Custom Script," the output is normalized from the returned array of objects by the user script.

Errors during processing result in output items containing error information in the JSON field, paired with the original input item.

Dependencies

  • Requires Puppeteer and puppeteer-extra libraries along with plugins for stealth mode and human typing simulation.
  • Optionally connects to an existing browser instance via WebSocket if configured.
  • No internal credential types are required, but if connecting to a remote browser, an authorization token may be needed.
  • The node supports additional configuration for proxy servers and container-friendly launch arguments.
  • If using headless shell mode, the chrome-headless-shell binary must be installed and accessible in the system PATH.

Troubleshooting

  • Failed to launch/connect to browser: This error indicates issues starting the browser or connecting to a WebSocket endpoint. Check the executable path, WebSocket URL, authorization headers, and ensure no conflicting processes are blocking the port.
  • Invalid URL: Occurs if the provided URL parameter is malformed. Verify URLs are correctly formatted.
  • Request failed with status code XXX: The page returned an HTTP error status. Confirm the target URL is accessible and correct.
  • Timeout errors: Navigation or protocol timeouts can happen if pages take too long to load. Adjust the timeout settings or check network connectivity.
  • Memory/CPU overload: Setting a very high batch size can exhaust system resources. Reduce batch size accordingly.
  • Binary data missing or corrupt: Ensure file names are valid and that the node has write permissions if saving files locally.
  • Stealth mode not working: Some sites may still detect Puppeteer despite stealth mode. Consider updating plugins or adjusting browser arguments.

Links and References

Discussion