Puppeteer icon

Puppeteer

Request a webpage using Puppeteer

Overview

This n8n node provides advanced browser automation capabilities using Puppeteer. It allows you to control a headless (or full) browser session, interact with web pages, extract content, take screenshots, and generate PDFs—all within your workflow. Common use cases include:

  • Web scraping and data extraction from dynamic websites.
  • Automated website testing or monitoring.
  • Generating screenshots or PDFs of web pages for reporting.
  • Filling out forms or simulating user interactions on web pages.

Practical examples:

  • Extracting product prices from an e-commerce site.
  • Taking periodic screenshots of a dashboard for archival.
  • Downloading invoices as PDFs after logging into a portal.

Properties

Name Meaning
Global options Collection of settings that apply to all Puppeteer nodes in the workflow. Includes device emulation, executable path, extra headers, launch arguments, viewport size, navigation timeout, wait conditions, page caching, headless/stealth mode, proxy server, and code injection (HTML/CSS/JS). These must be set on the first Puppeteer node; later changes are ignored. Options:
- Emulate Device
- Executable path
- Extra Headers
- Launch Arguments
- Viewport
- Timeout
- Wait Until (load, networkidle0, networkidle2)
- Time to Wait
- Wait for Selector
- Page Caching
- Headless mode
- Stealth mode
- Proxy Server
- Inject HTML
- Inject CSS
- Inject JS
Node options Collection of settings that override global options for this specific node. Includes timeout, wait conditions, time to wait, wait for selector, and code injection (HTML/CSS/JS). Options:
- Timeout
- Wait Until (load, networkidle0, networkidle2)
- Time to Wait
- Wait for Selector
- Inject HTML
- Inject CSS
- Inject JS
URL The target URL to navigate to. Leave empty to stay on the current page (must be set on the first Puppeteer node).
Query Parameters List of query parameters to append to the URL. Each parameter has a name and value.
Interactions List of actions to perform on the page, such as clicking elements or filling fields. Each interaction specifies:
- Selector: CSS selector for the element.
- Value (optional): If provided, fills the field; otherwise, clicks the element.
- Wait for navigation: If true, waits for page load after the action.
Output Specifies what to extract or generate from the page. Multiple outputs can be defined:
Page content:
- Property Name: Key for the extracted content.
- CSS selector: Extracts content from matching elements.
- Select All: Return all matches.
- innerHTML: Use innerHTML instead of outerHTML.
- HTML to JSON: Convert HTML to JSON.
- No attributes: Ignore attributes when converting.
Screenshot:
- Property Name: Key for binary image.
- CSS selector: Screenshot a specific area.
- Type: Image format (jpeg, png, webp).
- Quality: For JPEG/WebP.
- Full Page: Capture entire page.
PDF:
- Property Name: Key for binary PDF.
- Page Ranges, Scale, Prefer CSS Page Size, Format, Height, Width, Landscape, Margin, Display Header/Footer, Header/Footer Template, Transparent Background, Background Graphics.

Output

  • JSON output:
    • Contains the results of the specified outputs. For example:
      • If "Page content" is selected, the output will have a property (as named in "Property Name") containing the extracted HTML/text or its JSON representation.
      • If "Screenshot" or "PDF" is selected, the output will include a binary property (as named in "Property Name") containing the image or PDF data.
  • Binary output:
    • When screenshot or PDF generation is requested, the corresponding binary data is included in the output under the specified property name. The MIME type is set appropriately (image/png, image/jpeg, image/webp, or application/pdf).

Dependencies

  • External Services: None required by default.
  • API Keys / Credentials:
    • Requires n8n API credentials for internal communication (n8nApi).
  • Node.js Dependencies:
  • n8n Configuration:
    • Ensure the node has access to the necessary environment for running Puppeteer (e.g., proper permissions, headless browser support, and any custom executable paths if needed).

Troubleshooting

Common Issues:

  • Browser launch failures:
    • May occur if the system lacks required dependencies for Chromium or if the executable path is incorrect.
  • Timeout errors:
    • If the page takes too long to load or a selector is not found, increase the "Timeout" or check the selector's correctness.
  • Invalid selectors or missing elements:
    • Double-check CSS selectors used in interactions or output definitions.
  • Binary data issues:
    • If binary output is missing or corrupt, ensure the correct property name is used and that the page renders as expected.

Error Messages:

  • "Error: <message>"
    • Indicates a failure during browser launch or execution. Review the error message for details (e.g., invalid options, navigation errors).
  • "Cannot find module 'puppeteer'"
    • Ensure Puppeteer is installed in the environment where n8n runs.

Links and References

Discussion