Overview
This node uses Puppeteer, a headless browser automation library, to interact with web pages programmatically. Specifically, the Get Screenshot operation captures screenshots of web pages given their URLs. It supports capturing full-page screenshots or just the visible viewport, and allows customization of image format and quality.
Common scenarios where this node is beneficial include:
- Automatically generating website previews or thumbnails.
- Monitoring visual changes on web pages over time.
- Archiving web page appearances for compliance or record-keeping.
- Creating images for social media sharing or reports.
For example, you can input a URL of a product page and get a PNG screenshot of the entire scrollable page, which can then be used in marketing materials or automated reports.
Properties
| Name | Meaning |
|---|---|
| URL | The web address of the page to capture. Required. |
| Property Name | The name of the binary property where the screenshot image data will be stored in the output. |
| Type | The image format for the screenshot. Options: PNG, JPEG, WebP. |
| Quality | Image quality from 0 to 100, applicable only for JPEG and WebP formats (ignored for PNG). Default is 100. |
| Full Page | Whether to capture the entire scrollable page (true) or just the visible viewport (false). |
| Query Parameters | Additional query parameters to append to the URL before loading the page. Each parameter has a name and value. |
| Batch Size | Maximum number of pages to open simultaneously. Higher values increase resource usage. Default is 1. |
| Browser WebSocket Endpoint | Optional WebSocket URL to connect to an existing browser instance instead of launching a new one. |
| Emulate Device | Optionally emulate a specific device's viewport and user agent (e.g., iPhone, iPad). |
| Executable Path | Path to a custom browser executable to use instead of the bundled one. Ignored if connecting via WebSocket. |
| Extra Headers | Custom HTTP headers to send with the page request. |
| File Name | Filename to assign to the binary data output. Only applies to screenshot and PDF operations. |
| Launch Arguments | Additional command line arguments to pass when launching the browser. Ignored if connecting via WebSocket. |
| Timeout | Maximum navigation time in milliseconds. Set to 0 to disable timeout. |
| Wait Until | When to consider navigation successful. Options: load, domcontentloaded, networkidle0, networkidle2. |
| Page Caching | Enable or disable page-level caching. Defaults to enabled (true). |
| Headless mode | Run the browser in headless mode (no UI). Defaults to true. |
| Use Chrome Headless Shell | Run browser in headless shell mode, requires chrome-headless-shell in system path. Defaults to false. |
| Stealth mode | Apply techniques to make headless browser detection harder. Defaults to false. |
| Proxy Server | Use a custom proxy server for browser requests (e.g., localhost:8080, socks5://localhost:1080). |
Output
The node outputs an array of items corresponding to each input item processed. For the Get Screenshot operation, each output item contains:
- A
binaryproperty with the screenshot image data stored under the user-defined property name (e.g.,"data"). This binary data includes the image buffer and metadata such as filename and MIME type (image/png,image/jpeg, orimage/webp). - A
jsonproperty containing metadata about the HTTP response, including:headers: The HTTP response headers from the page request.statusCode: The HTTP status code returned by the page.url: The final URL loaded (including any query parameters).
This structure allows downstream nodes to access both the raw image data and relevant HTTP information.
Dependencies
- Requires Puppeteer and puppeteer-extra libraries for browser automation.
- Supports optional integration with a CAPTCHA solving service via an API key credential (used internally if configured).
- Can connect to an existing browser instance via WebSocket or launch a new Chromium-based browser.
- Uses environment variables to control allowed Node.js modules and console output behavior.
- If stealth mode is enabled, it uses a plugin to reduce detection of headless browsing.
- To emulate devices, it relies on Puppeteer's known device descriptors.
Troubleshooting
Failed to launch/connect to browser:
Ensure that the specified executable path is correct and accessible, or that the WebSocket endpoint URL is valid and reachable. Also verify that required dependencies like Chromium are installed.Invalid URL error:
The URL provided must be a valid absolute URL. Check for typos or missing protocol (e.g.,https://).Timeout errors:
If navigation takes longer than the configured timeout, increase the timeout value or check network connectivity.Unsupported image type or quality settings:
Quality settings apply only to JPEG and WebP formats; using them with PNG will have no effect.Memory or CPU overload with high batch size:
Opening many pages simultaneously consumes more resources. Reduce batch size if the node crashes or slows down.Stealth mode not working as expected:
Some websites may still detect headless browsers despite stealth mode. Consider additional anti-detection measures or manual interaction.