Overview
The Get Page Content operation of this custom n8n node uses Puppeteer (with optional stealth mode) to fetch and render the HTML content of a web page. It allows you to specify advanced browser options, emulate devices, set headers, use proxies, and control navigation timing. This is particularly useful for scraping dynamic websites that require JavaScript execution or bypassing bot detection.
Common scenarios:
- Scraping content from pages that require JavaScript rendering.
- Extracting data from sites protected by anti-bot measures.
- Automating website testing or monitoring changes in rendered HTML.
Practical examples:
- Fetching product details from an e-commerce site that loads data dynamically.
- Capturing the fully rendered HTML of a news article for further processing.
- Integrating with APIs that require browser-based authentication flows.
Properties
| Name | Type | Meaning |
|---|---|---|
| URL | string | The target web page address to fetch. |
| Query Parameters | fixedCollection | List of query parameters (name/value pairs) to append to the URL. |
| Options | collection | Advanced settings for browser behavior (see below). |
| └ Emulate Device | options | Emulates a specific device (e.g., iPhone, Pixel) for the browser session. |
| └ Executable path | string | Path to a custom Chromium/Chrome executable for Puppeteer to use. |
| └ Extra Headers | fixedCollection | Additional HTTP headers (name/value pairs) to send with the request. |
| └ File Name | string | If binary output is generated, sets the file name (not used in Get Page Content). |
| └ Launch Arguments | fixedCollection | Additional command-line arguments for launching the browser instance. |
| └ Timeout | number | Maximum navigation time in milliseconds (default: 30ms; 0 disables timeout). |
| └ Wait Until | options | Event to consider navigation complete: load, DOMContentLoaded, networkidle0, or networkidle2. |
| └ Page Caching | boolean | Enables/disables browser cache during navigation (default: enabled). |
| └ Headless mode | boolean | Runs browser in headless mode (no UI, default: true). |
| └ Stealth mode | boolean | Applies anti-detection techniques to make automation harder to detect (default: false). |
| └ Proxy Server | string | Proxy server configuration (e.g., localhost:8080, socks5://localhost:1080). |
Output
The node outputs a single item per input, with the following json structure:
{
"body": "<string>", // The full HTML content of the fetched page
"headers": { ... }, // Object containing response headers
"statusCode": <number> // HTTP status code returned by the server
}
- No binary data is produced by this operation.
Dependencies
- External Services: None required by default, but the node will access external web pages as specified by the URL property.
- API Keys: Not required unless accessing protected resources.
- Node.js Packages:
puppeteer-extrapuppeteer-extra-plugin-stealthpuppeteer
- n8n Configuration:
- Ensure the environment running n8n has access to install and run Puppeteer and its dependencies.
- For custom Chrome/Chromium executables, ensure the path is accessible and compatible.
Troubleshooting
Common issues:
- Timeouts: If the page takes too long to load, increase the "Timeout" option or set it to 0 to disable.
- Blocked by anti-bot: Enable "Stealth mode" to reduce detection risk.
- Invalid URL: Ensure the "URL" property is a valid, reachable address.
- Proxy errors: Double-check proxy format and credentials if using "Proxy Server".
- Missing browser executable: If specifying a custom "Executable path", verify the path is correct and points to a supported browser.
Error messages:
Request failed with status code <code>: The target server responded with a non-200 status. Check the URL, headers, and any required authentication.Navigation timeout: The page did not finish loading within the specified timeout. Increase the timeout or check network conditions.Cannot find module 'puppeteer': Ensure all required npm packages are installed in your n8n environment.