Overview
This node uses the Playwright library to interact with web pages programmatically. It supports operations such as retrieving the full HTML content of a page ("Get Page Content"), taking screenshots, generating PDFs, and running custom scripts within the browser context.
The "Get Page Content" operation fetches the complete HTML content of a specified URL, including headers and HTTP status code. This is useful for web scraping, monitoring website changes, or extracting data from dynamic web pages that require a real browser environment.
Practical examples:
- Extracting product details from an e-commerce site.
- Monitoring news headlines by fetching updated page content.
- Collecting metadata or SEO information from webpages.
Properties
| Name | Meaning |
|---|---|
| URL | The web address of the page to retrieve content from. Must be a valid URL string. |
| Query Parameters | Optional key-value pairs appended to the URL as query parameters. Useful for customizing requests or passing filters. |
| Options | A collection of advanced settings: |
| - Batch Size | Maximum number of pages to open simultaneously. Higher values increase resource usage (CPU, memory). |
| - Browser WebSocket Endpoint | WebSocket URL to connect to an existing browser instance instead of launching a new one. |
| - Emulate Device | Select a device profile to emulate (viewport size, user agent). Examples include various mobile phones and tablets. |
| - Executable Path | File system path to a specific browser executable to use. Ignored if connecting via WebSocket. |
| - Extra Headers | Additional HTTP headers to send with the request, e.g., custom User-Agent or authentication tokens. |
| - File Name | Filename to assign to binary outputs (screenshots or PDFs). Not applicable for "Get Page Content". |
| - Launch Arguments | Extra command line arguments passed to the browser process on launch, e.g., disabling sandboxing. |
| - Timeout | Maximum navigation time in milliseconds before aborting. Set 0 to disable timeout. |
| - Protocol Timeout | Maximum wait time for protocol responses in milliseconds. Set 0 to disable timeout. |
| - Wait Until | Defines when navigation is considered finished: load (page load event), domcontentloaded (DOM ready), or networkidle (no network connections for 500ms). |
| - Page Caching | Enable or disable caching at the page level. Defaults to enabled. |
| - Headless mode | Run the browser without a visible UI. Defaults to true (headless). |
| - Proxy Server | Custom proxy server address to route browser traffic through, e.g., localhost:8080 or socks5://localhost:1080. |
| - Add Container Arguments | Automatically add recommended browser launch arguments for container environments (e.g., --no-sandbox). Defaults to true. |
Output
The output is an array of items, each corresponding to an input item processed.
For the "Get Page Content" operation, each output item contains:
json:body: The full HTML content of the loaded page as a string.headers: An object containing HTTP response headers.statusCode: The HTTP status code returned by the server.url: The final URL after any redirects.
No binary data is produced for this operation.
Dependencies
- Playwright: The node relies on the Playwright library to launch or connect to Chromium-based browsers for page rendering and interaction.
- Browser Executable: Requires access to a Chromium browser executable either bundled with Playwright or specified via options.
- Optional External Browser: Can connect to an existing browser instance via a WebSocket endpoint.
- n8n Environment Variables: Supports configuration via environment variables for executable paths and allowed modules.
Troubleshooting
- Invalid URL Error: If the provided URL is malformed or invalid, the node will throw an error indicating "Invalid URL". Ensure the URL is correctly formatted.
- Navigation Timeout: If the page takes longer than the specified timeout to load, a timeout error may occur. Increase the timeout or set it to 0 to disable.
- Failed to Launch Browser: Errors during browser launch often relate to missing executables or incompatible launch arguments. Verify the executable path and arguments.
- Request Failed with Status Code: If the server returns an HTTP error (status code >= 400), the node reports this. Check the URL and query parameters.
- Resource Limits: Setting a very high batch size can exhaust system resources causing failures or slowdowns. Adjust batch size according to available CPU and memory.
- Proxy Issues: Incorrect proxy server settings can prevent page loading. Confirm proxy format and availability.
- Page Closing Errors: Occasionally, errors closing pages or the browser are logged but do not stop execution. These can usually be ignored unless persistent.