Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This node integrates Puppeteer, a powerful browser automation library, into n8n workflows. It allows users to programmatically control a headless (or full) browser instance to perform web scraping, automated testing, screenshot capture, PDF generation, and custom script execution within a browser context.

Common scenarios where this node is beneficial include:

  • Extracting dynamic content from websites that require JavaScript rendering.
  • Taking screenshots or generating PDFs of web pages for reporting or archiving.
  • Running custom scripts inside the browser environment to interact with page elements or gather data.
  • Automating repetitive browser tasks such as form submissions or navigation flows.

Practical examples:

  • Automatically capturing screenshots of product pages for visual monitoring.
  • Generating PDFs of invoices or reports from web applications.
  • Scraping data from SPA (Single Page Applications) that load content dynamically.
  • Executing custom JavaScript to extract specific information not accessible via simple HTTP requests.

Properties

Name Meaning
Batch Size Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage.
Browser WebSocket Endpoint The WebSocket URL to connect to an existing browser instance. When set, the node connects instead of launching a new browser.
Emulate Device Emulates a specific device's viewport and user agent. Options are loaded dynamically from known devices (e.g., iPhone, iPad).
Executable Path File system path to the browser executable. Ignored if connecting via WebSocket endpoint.
Extra Headers Custom HTTP headers to send with each request. Specify multiple name-value pairs.
File Name Filename to assign to binary output data (applies only to "Get PDF" and "Get Screenshot" operations).
Launch Arguments Additional command line arguments passed to the browser on launch. Ignored if connecting via WebSocket endpoint.
Timeout Maximum navigation time in milliseconds. Set to 0 to disable timeout. Does not affect "Run Custom Script" operation.
Wait Until Defines when navigation is considered successful: load, domcontentloaded, networkidle0, or networkidle2. Does not affect "Run Custom Script" operation.
Page Caching Enables or disables page-level caching. Defaults to enabled (true).
Headless mode Runs the browser in headless mode (no UI). Defaults to enabled (true).
Use Chrome Headless Shell Runs the browser in headless shell mode, which requires chrome-headless-shell in the system PATH. Only works if headless mode is enabled. Defaults to disabled (false).
Stealth mode Applies techniques to make headless Puppeteer harder to detect by websites. Defaults to disabled (false).
Proxy Server Configures a proxy server for browser traffic (e.g., localhost:8080, socks5://localhost:1080).
Add Container Arguments Adds recommended arguments for running Puppeteer in container environments (--no-sandbox, --disable-setuid-sandbox, etc.). Defaults to enabled (true).

Output

The node outputs an array of items corresponding to the input items processed. Each output item contains:

  • json: Metadata about the page request including:

    • headers: HTTP response headers.
    • statusCode: HTTP status code of the response.
    • url: Final URL after navigation and redirects.
    • For "Get Page Content" operation: body containing the HTML content of the page.
  • binary (optional): Contains binary data for operations that generate files:

    • For "Get Screenshot": image data in the specified format (e.g., PNG, JPEG).
    • For "Get PDF": PDF file data.

Binary data includes the filename and MIME type as configured.

  • pairedItem: Links output to the corresponding input item index.

Dependencies

  • Requires Puppeteer and puppeteer-extra libraries for browser automation.
  • Uses puppeteer-extra-plugin-stealth for stealth mode functionality.
  • Supports connection to an existing browser via WebSocket or launching a new browser instance.
  • If using "Use Chrome Headless Shell", requires chrome-headless-shell executable available in system PATH.
  • Optional proxy configuration supported.
  • No internal credential names exposed; API keys or authentication tokens must be provided externally if needed for target websites.

Troubleshooting

  • Failed to launch/connect to browser: Check that the browser executable path is correct or that the WebSocket endpoint is reachable. Ensure no conflicting browser instances block connections.
  • Timeout errors during navigation: Increase the Timeout property or verify network connectivity and page responsiveness.
  • Invalid URL error: Ensure URLs and query parameters are correctly formatted.
  • Custom script errors: The custom script must return an array of items. Errors in script syntax or logic will cause failures.
  • High resource usage: Reduce Batch Size to limit simultaneous pages and lower CPU/memory consumption.
  • Stealth mode not working: Some sites may still detect headless browsers despite stealth mode; consider additional anti-detection measures.
  • Container environment issues: Ensure Add Container Arguments is enabled to avoid sandboxing problems.

Links and References

Discussion