Overview
This node uses Puppeteer, a headless browser automation library, to interact with web pages programmatically. Specifically, the "Get Screenshot" operation captures screenshots of web pages given their URLs. It supports capturing full-page or viewport-only screenshots in various image formats (PNG, JPEG, WebP) and allows customization of image quality for applicable formats.
Common scenarios where this node is beneficial include:
- Automatically generating website previews or thumbnails.
- Monitoring visual changes on web pages over time.
- Archiving web page appearances for compliance or record-keeping.
- Creating images for social media sharing or reports.
For example, you can input a URL of a product page and get a PNG screenshot of the entire scrollable page, which can then be used in marketing materials or automated reports.
Properties
| Name | Meaning |
|---|---|
| URL | The web address of the page to capture. |
| Property Name | The name of the binary property where the screenshot image data will be stored. |
| Type | The image format for the screenshot: PNG, JPEG, or WebP. |
| Quality | Image quality from 0 to 100; applies only to JPEG and WebP formats (not PNG). |
| Full Page | Whether to capture the entire scrollable page (true) or just the visible viewport (false). |
| Query Parameters | Additional query parameters to append to the URL before loading the page. |
| Batch Size | Number of pages to open simultaneously; higher values use more memory and CPU. |
| Browser WebSocket Endpoint | WebSocket URL to connect to an existing browser instance instead of launching a new one. |
| Emulate Device | Optionally emulate a specific device's viewport and user agent. |
| Executable Path | Path to a custom browser executable to use instead of the bundled one. |
| Extra Headers | Custom HTTP headers to send with the page request. |
| File Name | Filename to assign to the binary data output (useful for saving files downstream). |
| Launch Arguments | Additional command line arguments to pass when launching the browser. |
| Timeout | Maximum navigation time in milliseconds; 0 disables timeout. |
| Wait Until | Event to wait for before considering navigation complete: load, domcontentloaded, networkidle0, networkidle2. |
| Page Caching | Enable or disable page-level caching (default enabled). |
| Headless mode | Run browser in headless mode (default true). |
| Use Chrome Headless Shell | Run browser in headless shell mode (requires headless mode enabled and chrome-headless-shell in PATH). |
| Stealth mode | Apply techniques to make headless browser detection harder. |
| Proxy Server | Proxy server configuration string (e.g., localhost:8080, socks5://localhost:1080). |
| Add Container Arguments | Add recommended launch arguments for container environments (--no-sandbox, etc.). |
Output
The node outputs items containing binary data representing the screenshot image. The binary property name is configurable via the "Property Name" input. Each item includes:
binary: Contains the image data in the specified format (PNG, JPEG, or WebP).json: Metadata about the response including:headers: HTTP response headers from the page request.statusCode: HTTP status code of the page response.url: The final URL loaded (including any query parameters).
The binary data can be used downstream for saving to disk, uploading, or further processing.
Dependencies
- Requires Puppeteer and puppeteer-extra libraries for browser automation.
- Uses puppeteer-extra-plugin-stealth if stealth mode is enabled.
- Supports connecting to an existing browser instance via WebSocket endpoint or launching a new Chromium browser.
- No internal credential types are required, but if accessing protected pages, appropriate authentication headers or proxy settings may be needed.
- Environment variables can influence behavior, e.g., enabling stdout logging or allowing external modules.
Troubleshooting
- Invalid URL error: If the provided URL is malformed, the node will throw an error indicating an invalid URL. Ensure URLs are properly formatted.
- Navigation timeout: If the page takes longer than the configured timeout to load, a timeout error occurs. Increase the timeout or check network conditions.
- Failed to launch/connect to browser: Errors launching Chromium or connecting to a WebSocket endpoint indicate misconfiguration or missing dependencies. Verify executable paths, WebSocket URLs, and that Chromium is installed.
- Permission errors in container environments: If running inside containers, ensure container-specific launch arguments are enabled (
Add Container Arguments) to avoid sandboxing issues. - Stealth mode issues: Enabling stealth mode may cause unexpected behavior on some sites; disable it if problems arise.
- Memory/CPU overload: Setting a high batch size opens many pages simultaneously, which can exhaust system resources. Reduce batch size if performance degrades.