Puppeteer

Automate browser interactions using Puppeteer

Actions4

Overview

The Get Page Content operation of the Puppeteer node retrieves the full HTML content of a web page. It automates browser actions using Puppeteer, allowing you to fetch dynamic or static web pages as they would appear in a real browser. This is particularly useful for scraping data from websites that require JavaScript rendering, testing web page output, or archiving web content.

Common scenarios:

Scraping product details from e-commerce sites that use client-side rendering.
Capturing the state of a web page after user interactions or authentication.
Monitoring changes on dynamic web pages.

Example:
You can use this node to fetch the rendered HTML of a news article page, including content loaded via JavaScript.

Properties

Below are the supported input properties for the Get Page Content operation:

Display Name	Type	Meaning
URL	String (required)	The web address of the page to retrieve.
Query Parameters	Collection	List of key-value pairs to append as query parameters to the URL.
Options	Collection	Advanced settings for browser behavior and request customization.
├─ Batch Size	Number	Maximum number of pages to open simultaneously. Higher values use more memory/CPU.
├─ Browser WebSocket Endpoint	String	Connects to an existing browser instance via WebSocket instead of launching a new one.
├─ Emulate Device	Options	Emulates a specific device (e.g., mobile, tablet) for the browser session.
├─ Executable path	String	Path to the browser executable. Ignored if WebSocket endpoint is set.
├─ Extra Headers	Collection	Additional HTTP headers to send with the request.
├─ File Name	String	Not used in this operation. (Relevant for PDF/Screenshot only.)
├─ Launch Arguments	Collection	Additional command-line arguments for the browser process.
├─ Timeout	Number	Maximum navigation time in milliseconds (default: 30000).
├─ Protocol Timeout	Number	Max time to wait for protocol responses (default: 30000 ms).
├─ Wait Until	Options	When to consider navigation successful (e.g., load, domcontentloaded, networkidle0/2).
├─ Page Caching	Boolean	Enable/disable page-level caching (default: true).
├─ Headless mode	Boolean	Run browser in headless mode (default: true).
├─ Use Chrome Headless Shell	Boolean	Use chrome-headless-shell binary (requires headless mode and shell in $PATH).
├─ Stealth mode	Boolean	Makes detection of automation harder (anti-bot evasion).
├─ Human typing mode	Boolean	Simulates human-like typing in input fields.
├─ Human Typing Options	Collection	Fine-tune delays and typo simulation for human typing mode.
├─ Proxy Server	String	Use a proxy server for outgoing requests.
└─ Add Container Arguments	Boolean	Adds recommended flags for container environments (default: true).

Output

The node outputs an array of items, each containing the following structure in the json field:

{
  "body": "<string>",         // The full HTML content of the fetched page.
  "headers": { ... },         // HTTP response headers returned by the server.
  "statusCode": <number>,     // HTTP status code of the response.
  "url": "<string>"           // The final URL after any redirects.
}

If an error occurs, the output will contain an error field with the error message.

Note: This operation does not output binary data.

Dependencies

External Services: None required for basic usage.
API Keys: Not required.
n8n Configuration:
- For advanced options, you may need:
  - A compatible version of Puppeteer and its plugins.
  - Access to a browser executable (Chrome/Chromium) if not connecting via WebSocket.
  - Proper environment variables if running in a containerized environment (for example, to ensure Chrome runs correctly).

Troubleshooting

Common Issues:

Invalid URL:
- Error: "Invalid URL: <your-url>"
- Cause: The provided URL is malformed or missing.
- Solution: Ensure the URL is complete and valid (including protocol, e.g., https://).
Navigation Timeout:
- Error: "Navigation timeout of <timeout> ms exceeded"
- Cause: The page took too long to load.
- Solution: Increase the "Timeout" property or check your network connection.
Request failed with status code X:
- Error: "Request failed with status code <number>"
- Cause: The server responded with an error (e.g., 404, 500).
- Solution: Check the target URL and server availability.
Failed to launch/connect to browser:
- Error: "Failed to launch/connect to browser: <details>"
- Cause: Missing browser executable, incompatible environment, or misconfigured options.
- Solution: Verify Puppeteer dependencies, browser path, and environment setup.
Resource Limits:
- High batch sizes or multiple simultaneous pages may exhaust system resources.
- Solution: Lower the "Batch Size" or increase available memory/CPU.