Playwright icon

Playwright

Automate browser interactions using Playwright

Overview

This node uses the Playwright library to interact with web pages programmatically. It supports operations such as retrieving the full HTML content of a page ("Get Page Content"), taking screenshots, generating PDFs, and running custom scripts within the browser context.

The "Get Page Content" operation fetches the complete HTML content of a specified URL, including headers and HTTP status code. This is useful for web scraping, monitoring website changes, or extracting data from dynamic web pages that require a real browser environment.

Practical examples:

  • Extracting product details from an e-commerce site.
  • Monitoring news headlines by fetching updated page content.
  • Collecting metadata or SEO information from webpages.

Properties

Name Meaning
URL The web address of the page to retrieve content from. Must be a valid URL string.
Query Parameters Optional key-value pairs appended to the URL as query parameters. Useful for customizing requests or passing filters.
Options A collection of advanced settings:
- Batch Size Maximum number of pages to open simultaneously. Higher values increase resource usage (CPU, memory).
- Browser WebSocket Endpoint WebSocket URL to connect to an existing browser instance instead of launching a new one.
- Emulate Device Select a device profile to emulate (viewport size, user agent). Examples include various mobile phones and tablets.
- Executable Path File system path to a specific browser executable to use. Ignored if connecting via WebSocket.
- Extra Headers Additional HTTP headers to send with the request, e.g., custom User-Agent or authentication tokens.
- File Name Filename to assign to binary outputs (screenshots or PDFs). Not applicable for "Get Page Content".
- Launch Arguments Extra command line arguments passed to the browser process on launch, e.g., disabling sandboxing.
- Timeout Maximum navigation time in milliseconds before aborting. Set 0 to disable timeout.
- Protocol Timeout Maximum wait time for protocol responses in milliseconds. Set 0 to disable timeout.
- Wait Until Defines when navigation is considered finished: load (page load event), domcontentloaded (DOM ready), or networkidle (no network connections for 500ms).
- Page Caching Enable or disable caching at the page level. Defaults to enabled.
- Headless mode Run the browser without a visible UI. Defaults to true (headless).
- Proxy Server Custom proxy server address to route browser traffic through, e.g., localhost:8080 or socks5://localhost:1080.
- Add Container Arguments Automatically add recommended browser launch arguments for container environments (e.g., --no-sandbox). Defaults to true.

Output

The output is an array of items, each corresponding to an input item processed.

For the "Get Page Content" operation, each output item contains:

  • json:
    • body: The full HTML content of the loaded page as a string.
    • headers: An object containing HTTP response headers.
    • statusCode: The HTTP status code returned by the server.
    • url: The final URL after any redirects.

No binary data is produced for this operation.

Dependencies

  • Playwright: The node relies on the Playwright library to launch or connect to Chromium-based browsers for page rendering and interaction.
  • Browser Executable: Requires access to a Chromium browser executable either bundled with Playwright or specified via options.
  • Optional External Browser: Can connect to an existing browser instance via a WebSocket endpoint.
  • n8n Environment Variables: Supports configuration via environment variables for executable paths and allowed modules.

Troubleshooting

  • Invalid URL Error: If the provided URL is malformed or invalid, the node will throw an error indicating "Invalid URL". Ensure the URL is correctly formatted.
  • Navigation Timeout: If the page takes longer than the specified timeout to load, a timeout error may occur. Increase the timeout or set it to 0 to disable.
  • Failed to Launch Browser: Errors during browser launch often relate to missing executables or incompatible launch arguments. Verify the executable path and arguments.
  • Request Failed with Status Code: If the server returns an HTTP error (status code >= 400), the node reports this. Check the URL and query parameters.
  • Resource Limits: Setting a very high batch size can exhaust system resources causing failures or slowdowns. Adjust batch size according to available CPU and memory.
  • Proxy Issues: Incorrect proxy server settings can prevent page loading. Confirm proxy format and availability.
  • Page Closing Errors: Occasionally, errors closing pages or the browser are logged but do not stop execution. These can usually be ignored unless persistent.

Links and References

Discussion