Puppeteer

Automate browser interactions using Puppeteer

Actions4

Overview

This node uses Puppeteer, a headless browser automation library, to interact with web pages programmatically. Specifically, the Get Screenshot operation captures screenshots of web pages given their URLs. It supports capturing full-page screenshots or just the visible viewport, and allows customization of image format and quality.

Common scenarios where this node is beneficial include:

Automatically generating website previews or thumbnails.
Monitoring visual changes on web pages over time.
Archiving web page appearances for compliance or record-keeping.
Creating images for social media sharing or reports.

For example, you can input a URL of a product page and get a PNG screenshot of the entire scrollable page, which can then be used in marketing materials or automated reports.

Properties

Name	Meaning
URL	The web address of the page to capture. Required.
Property Name	The name of the binary property where the screenshot image data will be stored in the output.
Type	The image format for the screenshot. Options: `PNG`, `JPEG`, `WebP`.
Quality	Image quality from 0 to 100, applicable only for JPEG and WebP formats (ignored for PNG). Default is 100.
Full Page	Whether to capture the entire scrollable page (`true`) or just the visible viewport (`false`).
Query Parameters	Additional query parameters to append to the URL before loading the page. Each parameter has a name and value.
Batch Size	Maximum number of pages to open simultaneously. Higher values increase resource usage. Default is 1.
Browser WebSocket Endpoint	Optional WebSocket URL to connect to an existing browser instance instead of launching a new one.
Emulate Device	Optionally emulate a specific device's viewport and user agent (e.g., iPhone, iPad).
Executable Path	Path to a custom browser executable to use instead of the bundled one. Ignored if connecting via WebSocket.
Extra Headers	Custom HTTP headers to send with the page request.
File Name	Filename to assign to the binary data output. Only applies to screenshot and PDF operations.
Launch Arguments	Additional command line arguments to pass when launching the browser. Ignored if connecting via WebSocket.
Timeout	Maximum navigation time in milliseconds. Set to 0 to disable timeout.
Wait Until	When to consider navigation successful. Options: `load`, `domcontentloaded`, `networkidle0`, `networkidle2`.
Page Caching	Enable or disable page-level caching. Defaults to enabled (`true`).
Headless mode	Run the browser in headless mode (no UI). Defaults to `true`.
Use Chrome Headless Shell	Run browser in headless shell mode, requires `chrome-headless-shell` in system path. Defaults to `false`.
Stealth mode	Apply techniques to make headless browser detection harder. Defaults to `false`.
Proxy Server	Use a custom proxy server for browser requests (e.g., `localhost:8080`, `socks5://localhost:1080`).

Output

The node outputs an array of items corresponding to each input item processed. For the Get Screenshot operation, each output item contains:

A binary property with the screenshot image data stored under the user-defined property name (e.g., "data"). This binary data includes the image buffer and metadata such as filename and MIME type (image/png, image/jpeg, or image/webp).
A json property containing metadata about the HTTP response, including:
- headers: The HTTP response headers from the page request.
- statusCode: The HTTP status code returned by the page.
- url: The final URL loaded (including any query parameters).

This structure allows downstream nodes to access both the raw image data and relevant HTTP information.

Dependencies

Requires Puppeteer and puppeteer-extra libraries for browser automation.
Supports optional integration with a CAPTCHA solving service via an API key credential (used internally if configured).
Can connect to an existing browser instance via WebSocket or launch a new Chromium-based browser.
Uses environment variables to control allowed Node.js modules and console output behavior.
If stealth mode is enabled, it uses a plugin to reduce detection of headless browsing.
To emulate devices, it relies on Puppeteer's known device descriptors.

Troubleshooting

Failed to launch/connect to browser:
Ensure that the specified executable path is correct and accessible, or that the WebSocket endpoint URL is valid and reachable. Also verify that required dependencies like Chromium are installed.
Invalid URL error:
The URL provided must be a valid absolute URL. Check for typos or missing protocol (e.g., https://).
Timeout errors:
If navigation takes longer than the configured timeout, increase the timeout value or check network connectivity.
Unsupported image type or quality settings:
Quality settings apply only to JPEG and WebP formats; using them with PNG will have no effect.
Memory or CPU overload with high batch size:
Opening many pages simultaneously consumes more resources. Reduce batch size if the node crashes or slows down.
Stealth mode not working as expected:
Some websites may still detect headless browsers despite stealth mode. Consider additional anti-detection measures or manual interaction.

Puppeteer

Actions4

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

PuppeteerInstall

Actions4

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

Puppeteer