Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

This n8n node provides advanced browser automation capabilities using Puppeteer. It allows you to control a headless (or full) browser to perform tasks such as loading web pages, taking screenshots, generating PDFs, scraping content, and running custom scripts in the browser context. The node is highly configurable, supporting device emulation, proxy settings, stealth mode, human-like typing simulation, and more.

Common scenarios:

  • Automated website testing or monitoring.
  • Web scraping and data extraction from dynamic sites.
  • Generating screenshots or PDFs of web pages for reporting or archiving.
  • Running custom JavaScript in the context of a loaded page.
  • Bypassing anti-bot measures with stealth and human typing plugins.

Practical examples:

  • Capture daily screenshots of a dashboard for archival.
  • Scrape product prices from an e-commerce site that requires JavaScript rendering.
  • Generate PDFs of invoices from a web application.
  • Automate login and navigation flows for testing purposes.

Properties

Below are the supported input properties for this node, based on your provided definition:

Display Name Type Description
Options collection A group of advanced configuration options for browser behavior and performance.
├─ Batch Size number Maximum number of pages to open simultaneously. Higher values use more memory/CPU.
├─ Browser WebSocket Endpoint string WebSocket URL to connect to an existing browser instance instead of launching a new one.
├─ Emulate Device options Emulate a specific device (e.g., mobile, tablet).
├─ Executable path string Path to the browser executable. Ignored if connecting via WebSocket.
├─ Extra Headers fixedCollection Custom HTTP headers to send with requests.
├─ File Name string File name for binary output (PDF/Screenshot).
├─ Launch Arguments fixedCollection Additional command-line arguments for the browser.
├─ Timeout number Max navigation time in ms (0 disables timeout).
├─ Protocol Timeout number Max protocol response wait time in ms (0 disables timeout).
├─ Wait Until options When to consider navigation successful (load, domcontentloaded, networkidle0, networkidle2).
├─ Page Caching boolean Enable/disable page-level caching.
├─ Headless mode boolean Run browser in headless mode (no UI).
├─ Use Chrome Headless Shell boolean Use chrome-headless-shell (requires it in $PATH).
├─ Stealth mode boolean Makes detection of headless Puppeteer harder.
├─ Human typing mode boolean Simulates human-like typing in input fields.
├─ Human Typing Options collection Fine-tune delays and typo probabilities for human typing simulation.
├─ Proxy Server string Use a custom proxy (e.g., localhost:8080, socks5://localhost:1080).
└─ Add Container Arguments boolean Adds recommended args for container environments (e.g., --no-sandbox).

Output

The structure of the output depends on the operation performed. In general, each output item contains:

  • json:

    • For page content:
      {
        "body": "<html>...</html>",
        "headers": { ... },
        "statusCode": 200,
        "url": "https://example.com"
      }
      
    • For errors:
      {
        "error": "Error message",
        "url": "https://example.com" // optional
      }
      
    • For other operations, relevant metadata (headers, status code, url).
  • binary (for PDF or Screenshot operations):

    • Contains the file data under the property name specified by the user (e.g., "data").
    • The binary field includes the file with correct MIME type (image/png, image/jpeg, or application/pdf), and the filename if set.
  • pairedItem:

    • Links the output to the corresponding input item.

Dependencies


Troubleshooting

Common issues:

  • Browser fails to launch:

    • Check that the browser executable exists at the specified path, or that Docker/container permissions allow execution.
    • If using "Use Chrome Headless Shell", ensure it's installed and in $PATH.
  • Timeouts:

    • Increase the "Timeout" or "Protocol Timeout" values if pages take longer to load.
    • Setting these to 0 disables the respective timeouts.
  • Invalid URL error:

    • Ensure the "URL" parameter is a valid, fully qualified URL.
  • Proxy errors:

    • Verify the proxy server address and credentials.
    • Ensure the proxy is reachable from the n8n host.
  • Custom script errors:

    • Scripts must return an array of items. If not, you'll see:
      "Custom script must return an array of items. Please ensure your script returns an array, e.g., return [{ key: value }]."
    • Syntax errors or runtime exceptions in the script will be reported in the output's error field.
  • Resource limitations:

    • High "Batch Size" or many simultaneous pages may exhaust system resources (memory/CPU).

Links and References

Discussion