Overview
This node uses Puppeteer to interact with web pages programmatically. It supports operations like fetching page content, taking screenshots, and generating PDFs from web pages. It is useful for web scraping, automated testing, and capturing visual representations of web pages. For example, it can extract HTML content from a URL, capture a screenshot of a webpage for documentation, or create a PDF report of a webpage.
Use Case Examples
- Extract HTML content from a product page to analyze pricing information.
- Capture a screenshot of a webpage to monitor visual changes over time.
- Generate a PDF of a webpage for offline reading or archiving.
Properties
| Name | Meaning |
|---|---|
| URL | The web address of the page to interact with, required for fetching content, screenshots, or PDFs. |
| Query Parameters | Optional query parameters to append to the URL when making the request. |
| Options | Various configuration options for Puppeteer including batch size, browser connection details, device emulation, headers, timeout, caching, headless mode, stealth mode, proxy settings, and more. |
Output
Binary
Binary output contains screenshots or PDFs of the webpage when applicable.
JSON
body- HTML content of the fetched webpage (for Get Page Content operation).headers- HTTP response headers from the webpage request.statusCode- HTTP status code of the webpage response.url- Final URL of the webpage after any redirects.
Dependencies
- puppeteer-extra
- puppeteer-extra-plugin-stealth
- puppeteer
Troubleshooting
- Ensure the URL is valid and accessible; invalid URLs cause request failures.
- Timeout errors may occur if the page takes too long to load; adjust the timeout option accordingly.
- Launching the browser may fail if the executable path is incorrect or dependencies are missing.
- Stealth mode requires additional plugin setup; disable if issues arise.
- Proxy server settings must be correctly formatted; invalid proxies cause connection failures.
Links
- Puppeteer Documentation - Official documentation for Puppeteer, the underlying library used by this node.
- puppeteer-extra GitHub - Repository for puppeteer-extra, which extends Puppeteer with plugins like stealth mode.