Overview
This n8n node provides advanced browser automation capabilities using Puppeteer. It allows you to control a headless (or full) browser session, interact with web pages, extract content, take screenshots, and generate PDFs—all within your workflow. Common use cases include:
- Web scraping and data extraction from dynamic websites.
- Automated website testing or monitoring.
- Generating screenshots or PDFs of web pages for reporting.
- Filling out forms or simulating user interactions on web pages.
Practical examples:
- Extracting product prices from an e-commerce site.
- Taking periodic screenshots of a dashboard for archival.
- Downloading invoices as PDFs after logging into a portal.
Properties
| Name | Meaning |
|---|---|
| Global options | Collection of settings that apply to all Puppeteer nodes in the workflow. Includes device emulation, executable path, extra headers, launch arguments, viewport size, navigation timeout, wait conditions, page caching, headless/stealth mode, proxy server, and code injection (HTML/CSS/JS). These must be set on the first Puppeteer node; later changes are ignored. Options: - Emulate Device - Executable path - Extra Headers - Launch Arguments - Viewport - Timeout - Wait Until ( load, networkidle0, networkidle2)- Time to Wait - Wait for Selector - Page Caching - Headless mode - Stealth mode - Proxy Server - Inject HTML - Inject CSS - Inject JS |
| Node options | Collection of settings that override global options for this specific node. Includes timeout, wait conditions, time to wait, wait for selector, and code injection (HTML/CSS/JS). Options: - Timeout - Wait Until ( load, networkidle0, networkidle2)- Time to Wait - Wait for Selector - Inject HTML - Inject CSS - Inject JS |
| URL | The target URL to navigate to. Leave empty to stay on the current page (must be set on the first Puppeteer node). |
| Query Parameters | List of query parameters to append to the URL. Each parameter has a name and value. |
| Interactions | List of actions to perform on the page, such as clicking elements or filling fields. Each interaction specifies: - Selector: CSS selector for the element. - Value (optional): If provided, fills the field; otherwise, clicks the element. - Wait for navigation: If true, waits for page load after the action. |
| Output | Specifies what to extract or generate from the page. Multiple outputs can be defined: Page content: - Property Name: Key for the extracted content. - CSS selector: Extracts content from matching elements. - Select All: Return all matches. - innerHTML: Use innerHTML instead of outerHTML. - HTML to JSON: Convert HTML to JSON. - No attributes: Ignore attributes when converting. Screenshot: - Property Name: Key for binary image. - CSS selector: Screenshot a specific area. - Type: Image format ( jpeg, png, webp).- Quality: For JPEG/WebP. - Full Page: Capture entire page. PDF: - Property Name: Key for binary PDF. - Page Ranges, Scale, Prefer CSS Page Size, Format, Height, Width, Landscape, Margin, Display Header/Footer, Header/Footer Template, Transparent Background, Background Graphics. |
Output
- JSON output:
- Contains the results of the specified outputs. For example:
- If "Page content" is selected, the output will have a property (as named in "Property Name") containing the extracted HTML/text or its JSON representation.
- If "Screenshot" or "PDF" is selected, the output will include a binary property (as named in "Property Name") containing the image or PDF data.
- Contains the results of the specified outputs. For example:
- Binary output:
- When screenshot or PDF generation is requested, the corresponding binary data is included in the output under the specified property name. The MIME type is set appropriately (
image/png,image/jpeg,image/webp, orapplication/pdf).
- When screenshot or PDF generation is requested, the corresponding binary data is included in the output under the specified property name. The MIME type is set appropriately (
Dependencies
- External Services: None required by default.
- API Keys / Credentials:
- Requires n8n API credentials for internal communication (
n8nApi).
- Requires n8n API credentials for internal communication (
- Node.js Dependencies:
- n8n Configuration:
- Ensure the node has access to the necessary environment for running Puppeteer (e.g., proper permissions, headless browser support, and any custom executable paths if needed).
Troubleshooting
Common Issues:
- Browser launch failures:
- May occur if the system lacks required dependencies for Chromium or if the executable path is incorrect.
- Timeout errors:
- If the page takes too long to load or a selector is not found, increase the "Timeout" or check the selector's correctness.
- Invalid selectors or missing elements:
- Double-check CSS selectors used in interactions or output definitions.
- Binary data issues:
- If binary output is missing or corrupt, ensure the correct property name is used and that the page renders as expected.
Error Messages:
"Error: <message>"- Indicates a failure during browser launch or execution. Review the error message for details (e.g., invalid options, navigation errors).
- "Cannot find module 'puppeteer'"
- Ensure Puppeteer is installed in the environment where n8n runs.