Playwright

Automate browser interactions using Playwright

Actions4

Overview

This node uses the Playwright library to interact with web pages programmatically. It supports operations such as retrieving the full HTML content of a page ("Get Page Content"), taking screenshots, generating PDFs, and running custom scripts within the browser context.

The "Get Page Content" operation fetches the complete HTML content of a specified URL, including headers and HTTP status code. This is useful for web scraping, monitoring website changes, or extracting data from dynamic web pages that require a real browser environment.

Practical examples:

Extracting product details from an e-commerce site.
Monitoring news headlines by fetching updated page content.
Collecting metadata or SEO information from webpages.

Properties

Name	Meaning
URL	The web address of the page to retrieve content from. Must be a valid URL string.
Query Parameters	Optional key-value pairs appended to the URL as query parameters. Useful for customizing requests or passing filters.
Options	A collection of advanced settings:
- Batch Size	Maximum number of pages to open simultaneously. Higher values increase resource usage (CPU, memory).
- Browser WebSocket Endpoint	WebSocket URL to connect to an existing browser instance instead of launching a new one.
- Emulate Device	Select a device profile to emulate (viewport size, user agent). Examples include various mobile phones and tablets.
- Executable Path	File system path to a specific browser executable to use. Ignored if connecting via WebSocket.
- Extra Headers	Additional HTTP headers to send with the request, e.g., custom User-Agent or authentication tokens.
- File Name	Filename to assign to binary outputs (screenshots or PDFs). Not applicable for "Get Page Content".
- Launch Arguments	Extra command line arguments passed to the browser process on launch, e.g., disabling sandboxing.
- Timeout	Maximum navigation time in milliseconds before aborting. Set 0 to disable timeout.
- Protocol Timeout	Maximum wait time for protocol responses in milliseconds. Set 0 to disable timeout.
- Wait Until	Defines when navigation is considered finished: `load` (page load event), `domcontentloaded` (DOM ready), or `networkidle` (no network connections for 500ms).
- Page Caching	Enable or disable caching at the page level. Defaults to enabled.
- Headless mode	Run the browser without a visible UI. Defaults to true (headless).
- Proxy Server	Custom proxy server address to route browser traffic through, e.g., `localhost:8080` or `socks5://localhost:1080`.
- Add Container Arguments	Automatically add recommended browser launch arguments for container environments (e.g., `--no-sandbox`). Defaults to true.

Output

The output is an array of items, each corresponding to an input item processed.

For the "Get Page Content" operation, each output item contains:

json:
- body: The full HTML content of the loaded page as a string.
- headers: An object containing HTTP response headers.
- statusCode: The HTTP status code returned by the server.
- url: The final URL after any redirects.

No binary data is produced for this operation.

Dependencies

Playwright: The node relies on the Playwright library to launch or connect to Chromium-based browsers for page rendering and interaction.
Browser Executable: Requires access to a Chromium browser executable either bundled with Playwright or specified via options.
Optional External Browser: Can connect to an existing browser instance via a WebSocket endpoint.
n8n Environment Variables: Supports configuration via environment variables for executable paths and allowed modules.

Troubleshooting

Invalid URL Error: If the provided URL is malformed or invalid, the node will throw an error indicating "Invalid URL". Ensure the URL is correctly formatted.
Navigation Timeout: If the page takes longer than the specified timeout to load, a timeout error may occur. Increase the timeout or set it to 0 to disable.
Failed to Launch Browser: Errors during browser launch often relate to missing executables or incompatible launch arguments. Verify the executable path and arguments.
Request Failed with Status Code: If the server returns an HTTP error (status code >= 400), the node reports this. Check the URL and query parameters.
Resource Limits: Setting a very high batch size can exhaust system resources causing failures or slowdowns. Adjust batch size according to available CPU and memory.
Proxy Issues: Incorrect proxy server settings can prevent page loading. Confirm proxy format and availability.
Page Closing Errors: Occasionally, errors closing pages or the browser are logged but do not stop execution. These can usually be ignored unless persistent.

Playwright

Actions4

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

PlaywrightInstall

Actions4

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

Playwright