Overview
This node integrates Puppeteer, a powerful browser automation library, into n8n workflows. It allows users to programmatically control a headless (or full) browser instance to perform web scraping, automated testing, screenshot capture, PDF generation, and custom script execution within a browser context.
Common scenarios where this node is beneficial include:
- Extracting dynamic content from websites that require JavaScript rendering.
- Taking screenshots or generating PDFs of web pages for reporting or archiving.
- Running custom scripts inside the browser environment to interact with page elements or gather data.
- Automating repetitive browser tasks such as form submissions or navigation flows.
Practical examples:
- Automatically capturing screenshots of product pages for visual monitoring.
- Generating PDFs of invoices or reports from web applications.
- Scraping data from SPA (Single Page Applications) that load content dynamically.
- Executing custom JavaScript to extract specific information not accessible via simple HTTP requests.
Properties
| Name | Meaning |
|---|---|
| Batch Size | Maximum number of pages to open simultaneously. Higher values increase memory and CPU usage. |
| Browser WebSocket Endpoint | The WebSocket URL to connect to an existing browser instance. When set, the node connects instead of launching a new browser. |
| Emulate Device | Emulates a specific device's viewport and user agent. Options are loaded dynamically from known devices (e.g., iPhone, iPad). |
| Executable Path | File system path to the browser executable. Ignored if connecting via WebSocket endpoint. |
| Extra Headers | Custom HTTP headers to send with each request. Specify multiple name-value pairs. |
| File Name | Filename to assign to binary output data (applies only to "Get PDF" and "Get Screenshot" operations). |
| Launch Arguments | Additional command line arguments passed to the browser on launch. Ignored if connecting via WebSocket endpoint. |
| Timeout | Maximum navigation time in milliseconds. Set to 0 to disable timeout. Does not affect "Run Custom Script" operation. |
| Wait Until | Defines when navigation is considered successful: load, domcontentloaded, networkidle0, or networkidle2. Does not affect "Run Custom Script" operation. |
| Page Caching | Enables or disables page-level caching. Defaults to enabled (true). |
| Headless mode | Runs the browser in headless mode (no UI). Defaults to enabled (true). |
| Use Chrome Headless Shell | Runs the browser in headless shell mode, which requires chrome-headless-shell in the system PATH. Only works if headless mode is enabled. Defaults to disabled (false). |
| Stealth mode | Applies techniques to make headless Puppeteer harder to detect by websites. Defaults to disabled (false). |
| Proxy Server | Configures a proxy server for browser traffic (e.g., localhost:8080, socks5://localhost:1080). |
| Add Container Arguments | Adds recommended arguments for running Puppeteer in container environments (--no-sandbox, --disable-setuid-sandbox, etc.). Defaults to enabled (true). |
Output
The node outputs an array of items corresponding to the input items processed. Each output item contains:
json: Metadata about the page request including:
headers: HTTP response headers.statusCode: HTTP status code of the response.url: Final URL after navigation and redirects.- For "Get Page Content" operation:
bodycontaining the HTML content of the page.
binary (optional): Contains binary data for operations that generate files:
- For "Get Screenshot": image data in the specified format (e.g., PNG, JPEG).
- For "Get PDF": PDF file data.
Binary data includes the filename and MIME type as configured.
- pairedItem: Links output to the corresponding input item index.
Dependencies
- Requires Puppeteer and puppeteer-extra libraries for browser automation.
- Uses puppeteer-extra-plugin-stealth for stealth mode functionality.
- Supports connection to an existing browser via WebSocket or launching a new browser instance.
- If using "Use Chrome Headless Shell", requires
chrome-headless-shellexecutable available in system PATH. - Optional proxy configuration supported.
- No internal credential names exposed; API keys or authentication tokens must be provided externally if needed for target websites.
Troubleshooting
- Failed to launch/connect to browser: Check that the browser executable path is correct or that the WebSocket endpoint is reachable. Ensure no conflicting browser instances block connections.
- Timeout errors during navigation: Increase the
Timeoutproperty or verify network connectivity and page responsiveness. - Invalid URL error: Ensure URLs and query parameters are correctly formatted.
- Custom script errors: The custom script must return an array of items. Errors in script syntax or logic will cause failures.
- High resource usage: Reduce
Batch Sizeto limit simultaneous pages and lower CPU/memory consumption. - Stealth mode not working: Some sites may still detect headless browsers despite stealth mode; consider additional anti-detection measures.
- Container environment issues: Ensure
Add Container Argumentsis enabled to avoid sandboxing problems.