Puppeteer

Automate browser interactions using Puppeteer

Actions4

Overview

This node uses Puppeteer, a headless browser automation library, to interact with web pages programmatically. Specifically, the "Get Screenshot" operation captures screenshots of web pages given their URLs. It supports capturing full-page or viewport-only screenshots in various image formats (PNG, JPEG, WebP) and allows customization of image quality for applicable formats.

Common scenarios where this node is beneficial include:

Automatically generating website previews or thumbnails.
Monitoring visual changes on web pages over time.
Archiving web page appearances for compliance or record-keeping.
Creating images for social media sharing or reports.

For example, you can input a URL of a product page and get a PNG screenshot of the entire scrollable page, which can then be used in marketing materials or automated reports.

Properties

Name	Meaning
URL	The web address of the page to capture.
Property Name	The name of the binary property where the screenshot image data will be stored.
Type	The image format for the screenshot: PNG, JPEG, or WebP.
Quality	Image quality from 0 to 100; applies only to JPEG and WebP formats (not PNG).
Full Page	Whether to capture the entire scrollable page (true) or just the visible viewport (false).
Query Parameters	Additional query parameters to append to the URL before loading the page.
Batch Size	Number of pages to open simultaneously; higher values use more memory and CPU.
Browser WebSocket Endpoint	WebSocket URL to connect to an existing browser instance instead of launching a new one.
Emulate Device	Optionally emulate a specific device's viewport and user agent.
Executable Path	Path to a custom browser executable to use instead of the bundled one.
Extra Headers	Custom HTTP headers to send with the page request.
File Name	Filename to assign to the binary data output (useful for saving files downstream).
Launch Arguments	Additional command line arguments to pass when launching the browser.
Timeout	Maximum navigation time in milliseconds; 0 disables timeout.
Wait Until	Event to wait for before considering navigation complete: load, domcontentloaded, networkidle0, networkidle2.
Page Caching	Enable or disable page-level caching (default enabled).
Headless mode	Run browser in headless mode (default true).
Use Chrome Headless Shell	Run browser in headless shell mode (requires headless mode enabled and chrome-headless-shell in PATH).
Stealth mode	Apply techniques to make headless browser detection harder.
Proxy Server	Proxy server configuration string (e.g., localhost:8080, socks5://localhost:1080).
Add Container Arguments	Add recommended launch arguments for container environments (--no-sandbox, etc.).

Output

The node outputs items containing binary data representing the screenshot image. The binary property name is configurable via the "Property Name" input. Each item includes:

binary: Contains the image data in the specified format (PNG, JPEG, or WebP).
json: Metadata about the response including:
- headers: HTTP response headers from the page request.
- statusCode: HTTP status code of the page response.
- url: The final URL loaded (including any query parameters).

The binary data can be used downstream for saving to disk, uploading, or further processing.

Dependencies

Requires Puppeteer and puppeteer-extra libraries for browser automation.
Uses puppeteer-extra-plugin-stealth if stealth mode is enabled.
Supports connecting to an existing browser instance via WebSocket endpoint or launching a new Chromium browser.
No internal credential types are required, but if accessing protected pages, appropriate authentication headers or proxy settings may be needed.
Environment variables can influence behavior, e.g., enabling stdout logging or allowing external modules.

Troubleshooting

Invalid URL error: If the provided URL is malformed, the node will throw an error indicating an invalid URL. Ensure URLs are properly formatted.
Navigation timeout: If the page takes longer than the configured timeout to load, a timeout error occurs. Increase the timeout or check network conditions.
Failed to launch/connect to browser: Errors launching Chromium or connecting to a WebSocket endpoint indicate misconfiguration or missing dependencies. Verify executable paths, WebSocket URLs, and that Chromium is installed.
Permission errors in container environments: If running inside containers, ensure container-specific launch arguments are enabled (Add Container Arguments) to avoid sandboxing issues.
Stealth mode issues: Enabling stealth mode may cause unexpected behavior on some sites; disable it if problems arise.
Memory/CPU overload: Setting a high batch size opens many pages simultaneously, which can exhaust system resources. Reduce batch size if performance degrades.

Links and References

Puppeteer Documentation
puppeteer-extra Plugin Stealth
Chromium Command Line Switches
MDN Web Docs: Screenshots with Puppeteer (example usage)