Overview
The Get Page Content operation of the Puppeteer node retrieves the full HTML content of a web page. It automates browser actions using Puppeteer, allowing you to fetch dynamic or static web pages as they would appear in a real browser. This is particularly useful for scraping data from websites that require JavaScript rendering, testing web page output, or archiving web content.
Common scenarios:
- Scraping product details from e-commerce sites that use client-side rendering.
- Capturing the state of a web page after user interactions or authentication.
- Monitoring changes on dynamic web pages.
Example:
You can use this node to fetch the rendered HTML of a news article page, including content loaded via JavaScript.
Properties
Below are the supported input properties for the Get Page Content operation:
| Display Name | Type | Meaning |
|---|---|---|
| URL | String (required) | The web address of the page to retrieve. |
| Query Parameters | Collection | List of key-value pairs to append as query parameters to the URL. |
| Options | Collection | Advanced settings for browser behavior and request customization. |
| ├─ Batch Size | Number | Maximum number of pages to open simultaneously. Higher values use more memory/CPU. |
| ├─ Browser WebSocket Endpoint | String | Connects to an existing browser instance via WebSocket instead of launching a new one. |
| ├─ Emulate Device | Options | Emulates a specific device (e.g., mobile, tablet) for the browser session. |
| ├─ Executable path | String | Path to the browser executable. Ignored if WebSocket endpoint is set. |
| ├─ Extra Headers | Collection | Additional HTTP headers to send with the request. |
| ├─ File Name | String | Not used in this operation. (Relevant for PDF/Screenshot only.) |
| ├─ Launch Arguments | Collection | Additional command-line arguments for the browser process. |
| ├─ Timeout | Number | Maximum navigation time in milliseconds (default: 30000). |
| ├─ Protocol Timeout | Number | Max time to wait for protocol responses (default: 30000 ms). |
| ├─ Wait Until | Options | When to consider navigation successful (e.g., load, domcontentloaded, networkidle0/2). |
| ├─ Page Caching | Boolean | Enable/disable page-level caching (default: true). |
| ├─ Headless mode | Boolean | Run browser in headless mode (default: true). |
| ├─ Use Chrome Headless Shell | Boolean | Use chrome-headless-shell binary (requires headless mode and shell in $PATH). |
| ├─ Stealth mode | Boolean | Makes detection of automation harder (anti-bot evasion). |
| ├─ Human typing mode | Boolean | Simulates human-like typing in input fields. |
| ├─ Human Typing Options | Collection | Fine-tune delays and typo simulation for human typing mode. |
| ├─ Proxy Server | String | Use a proxy server for outgoing requests. |
| └─ Add Container Arguments | Boolean | Adds recommended flags for container environments (default: true). |
Output
The node outputs an array of items, each containing the following structure in the json field:
{
"body": "<string>", // The full HTML content of the fetched page.
"headers": { ... }, // HTTP response headers returned by the server.
"statusCode": <number>, // HTTP status code of the response.
"url": "<string>" // The final URL after any redirects.
}
- If an error occurs, the output will contain an
errorfield with the error message.
Note: This operation does not output binary data.
Dependencies
- External Services: None required for basic usage.
- API Keys: Not required.
- n8n Configuration:
- For advanced options, you may need:
- A compatible version of Puppeteer and its plugins.
- Access to a browser executable (Chrome/Chromium) if not connecting via WebSocket.
- Proper environment variables if running in a containerized environment (for example, to ensure Chrome runs correctly).
- For advanced options, you may need:
Troubleshooting
Common Issues:
Invalid URL:
- Error:
"Invalid URL: <your-url>" - Cause: The provided URL is malformed or missing.
- Solution: Ensure the URL is complete and valid (including protocol, e.g.,
https://).
- Error:
Navigation Timeout:
- Error:
"Navigation timeout of <timeout> ms exceeded" - Cause: The page took too long to load.
- Solution: Increase the "Timeout" property or check your network connection.
- Error:
Request failed with status code X:
- Error:
"Request failed with status code <number>" - Cause: The server responded with an error (e.g., 404, 500).
- Solution: Check the target URL and server availability.
- Error:
Failed to launch/connect to browser:
- Error:
"Failed to launch/connect to browser: <details>" - Cause: Missing browser executable, incompatible environment, or misconfigured options.
- Solution: Verify Puppeteer dependencies, browser path, and environment setup.
- Error:
Resource Limits:
- High batch sizes or multiple simultaneous pages may exhaust system resources.
- Solution: Lower the "Batch Size" or increase available memory/CPU.