Overview
This node uses Puppeteer, a headless browser automation library, to interact with web pages programmatically. The "Get Page Content" operation loads a specified URL and retrieves the full HTML content of the page along with HTTP response headers and status code.
Common scenarios where this node is beneficial include:
- Web scraping: Extracting raw HTML content for further parsing or data extraction.
- Monitoring website changes by fetching page content regularly.
- Testing or validating web page responses in workflows.
- Integrating dynamic web content into automation pipelines.
For example, you can use this node to fetch the HTML of a product page on an e-commerce site, then parse it in subsequent nodes to extract pricing or availability information.
Properties
| Name | Meaning |
|---|---|
| URL | The web address of the page to load and retrieve content from. |
| Query Parameters | Optional key-value pairs appended as query parameters to the URL before loading the page. |
| Options | A collection of advanced settings controlling browser behavior and page loading: |
| - Batch Size | Maximum number of pages to open simultaneously (affects memory and CPU usage). |
| - Browser WebSocket Endpoint | WebSocket URL to connect to an existing browser instance instead of launching a new one. |
| - Browser WebSocket Headers | Headers sent when connecting to the browser WebSocket endpoint. |
| - Emulate Device | Select a device profile to emulate (e.g., mobile devices with specific screen sizes and user agents). |
| - Executable Path | Path to a custom browser executable to use instead of the bundled one. |
| - Extra Headers | Additional HTTP headers to send with page requests. |
| - File Name | Filename to assign to binary outputs (not applicable for Get Page Content but used in other operations). |
| - Launch Arguments | Extra command line arguments passed to the browser on launch. |
| - Timeout | Maximum navigation time in milliseconds before aborting. Set 0 to disable timeout. |
| - Protocol Timeout | Maximum wait time for protocol responses in milliseconds. Set 0 to disable timeout. |
| - Wait Until | Event that determines when navigation is considered finished: load, domcontentloaded, networkidle0, or networkidle2. |
| - Page Caching | Enable or disable page-level caching (default enabled). |
| - Headless Mode | Run browser without UI (default true). |
| - Use Chrome Headless Shell | Run browser in headless shell mode (requires headless mode enabled and chrome-headless-shell in PATH). |
| - Stealth Mode | Apply techniques to make headless browser detection harder. |
| - Human Typing Mode | Enables .typeHuman() function to simulate human-like typing. |
| - Human Typing Options | Settings controlling delays and typo chances for human typing simulation. |
| - Proxy Server | Custom proxy server configuration (e.g., localhost:8080 or socks5://localhost:1080). |
| - Add Container Arguments | Automatically add recommended arguments for running inside container environments (--no-sandbox, etc.). |
Output
The output contains JSON data with the following structure:
{
"body": "<html>...</html>", // Full HTML content of the loaded page
"headers": { // HTTP response headers received
"content-type": "text/html; charset=UTF-8",
...
},
"statusCode": 200, // HTTP status code of the response
"url": "https://example.com" // Final URL after any redirects
}
- The output is paired with the input item it corresponds to.
- No binary data is produced by this operation.
Dependencies
- Requires Puppeteer and puppeteer-extra libraries with stealth and human typing plugins.
- Optionally connects to an existing browser via WebSocket if configured.
- If using WebSocket connection with authentication, requires an API key credential or bearer token credential configured in n8n.
- For emulating devices, relies on Puppeteer's known device descriptors.
- Running in containerized environments may require enabling container-specific launch arguments.
- If using headless shell mode,
chrome-headless-shellmust be installed and available in system PATH.
Troubleshooting
- Timeout errors: If navigation takes longer than the configured timeout, increase the "Timeout" property or set it to 0 to disable.
- Invalid URL error: Ensure the URL provided is valid and properly formatted.
- Browser launch failures: Check that the executable path is correct or that the environment supports launching Chromium. In containers, ensure sandboxing flags are set correctly.
- WebSocket connection issues: Verify the WebSocket endpoint URL and authentication headers if connecting to an existing browser.
- Page content empty or incomplete: Adjust the "Wait Until" option to wait for appropriate page load events.
- Stealth mode not working: Some sites may still detect headless browsers despite stealth mode; consider disabling or adjusting stealth settings.
- Human typing simulation slow: Adjust human typing delay options to balance realism and speed.