Puppeteer

Automate browser interactions using Puppeteer

Actions4

Overview

This node uses Puppeteer to automate browser actions for web scraping and content retrieval. It supports operations like fetching page HTML content, taking screenshots, generating PDFs, and running custom scripts on web pages. It is useful for scenarios such as extracting data from websites, capturing visual snapshots, or automating interactions with web pages.

Use Case Examples

Extract the HTML content of a webpage for data analysis.
Capture a screenshot of a webpage for visual documentation.
Generate a PDF of a webpage for offline reading or archiving.
Run custom JavaScript on a webpage to interact with dynamic content or scrape specific elements.

Properties

Name	Meaning
URL	The web page URL to navigate to and interact with.
Query Parameters	Additional query parameters to append to the URL when making the request.
Batch Size	Maximum number of pages to open simultaneously to control resource usage.
Browser WebSocket Endpoint	WebSocket URL to connect to an existing browser instance instead of launching a new one.
Protocol	Protocol to use for browser communication, e.g., Chrome DevTools Protocol or WebDriver BiDi.
Emulate Device	Emulate a specific device's viewport and user agent.
Executable path	Path to a custom browser executable to use instead of the bundled one.
Extra Headers	Custom HTTP headers to send with the page requests.
File Name	File name to assign to binary data outputs like screenshots or PDFs.
Launch Arguments	Additional command line arguments to pass to the browser instance.
Timeout	Maximum navigation time in milliseconds before timing out.
Protocol Timeout	Maximum time to wait for a protocol response in milliseconds.
Wait Until	Event to wait for to consider navigation successful (e.g., load, domcontentloaded).
Page Caching	Enable or disable page level caching.
Headless mode	Run the browser in headless mode (no UI).
Use Chrome Headless Shell	Run browser in headless shell mode, requires chrome-headless-shell in PATH.
Stealth mode	Apply techniques to make headless Puppeteer harder to detect.
Human typing mode	Enable human-like typing simulation on input elements.
Human Typing Options	Settings to customize the human typing simulation behavior.
Proxy Server	Custom proxy server configuration for browser requests.
Capture Downloads	Automatically capture and return files downloaded during script execution.
Add Container Arguments	Automatically add recommended Chrome arguments when running in container environments.

Output

JSON

body - The HTML content of the page (for Get Page Content operation).
headers - HTTP response headers from the page request.
statusCode - HTTP status code of the page response.
url - The final URL of the page after navigation and redirects.

Dependencies

puppeteer-extra
puppeteer-extra-plugin-stealth
puppeteer-extra-plugin-human-typing
puppeteer
vm2

Troubleshooting

Ensure the URL is valid and accessible to avoid navigation errors.
Check that the browser executable path is correct if using a custom browser.
If running in a container, verify that container arguments are properly set or disabled as needed.
Timeout errors can occur if the page takes too long to load; adjust the timeout settings accordingly.
When using stealth mode, some websites may still detect automation; try toggling stealth mode off if issues arise.
If capturing downloads, ensure the download path is writable and has sufficient space.

Puppeteer

Actions4

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

PuppeteerInstall

Actions4

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

Puppeteer