Capture

Capture website screenshots, generate PDFs, extract content and metadata

Actions4

Overview

This node enables capturing various types of data from webpages by interacting with an external capture service. It supports operations such as taking screenshots, generating PDFs, extracting webpage content, and retrieving metadata. This is useful for automating web monitoring, archiving pages, or gathering structured information from websites.

For example:

Automatically take a screenshot of a product page to archive its current state.
Generate a PDF report of a webpage for offline review.
Extract the main content or metadata (like title, description) from a news article for further processing.

Properties

Name	Meaning
URL	The URL of the webpage to capture.
Additional Options	A collection of optional settings to customize the capture:
- Best Format	Whether to automatically select the optimal image format.
- Block Ads	Whether to block advertisements on the page.
- Block Cookie Banners	Whether to automatically dismiss cookie consent banners.
- Block Trackers	Whether to block tracking scripts.
- Bypass Bot Detection	Whether to bypass bot detection systems.
- Dark Mode	Whether to enable dark mode rendering.
- Emulate Device	Specify a device name (e.g., "iPhone X") to emulate for screenshots.
- File Name	Custom filename for the saved file.
- Fresh	Force a new capture ignoring any cached results.
- HTTP Authentication	HTTP Basic Authentication credentials encoded in base64url format.
- Mobile	Whether to emulate a mobile device.
- User Agent	Custom user agent string to use when accessing the webpage.
- Wait For ID	Element ID to wait for before capturing.
- Wait For Selector	CSS selector to wait for before capturing.

Output

The node outputs JSON data describing the result of the capture operation. The structure varies depending on the operation:

Metadata Operation: Outputs metadata extracted from the webpage, merged into the JSON output along with the original URL and operation type.
Screenshot Operation: Outputs either a URL to the captured image or binary image data (PNG, JPG, or WebP) if configured to download the file.
PDF Operation: Outputs either a URL to the generated PDF or binary PDF data if configured to download the file.
Content Operation: Outputs the extracted content of the webpage as JSON.

If binary data is included (for screenshots or PDFs), it is attached under a binary property named data with appropriate MIME type.

Dependencies

Requires an API key credential for the external capture service.
Uses the capture service's API at https://cdn.capture.page.
Relies on helper functions for URL validation, building capture URLs, downloading binary content, and creating binary data within n8n.

Troubleshooting

Invalid URL Error: If the provided URL is malformed or invalid, the node throws an error indicating "Invalid URL". Ensure the URL is correctly formatted and accessible.
Unsupported Operation Error: If an unsupported operation is specified, the node will throw an error. Verify that the operation is one of: screenshot, pdf, content, or metadata.
API Authentication Issues: Failure to authenticate with the capture service may cause errors. Confirm that the API key and secret are correctly configured.
Timeouts or Delays: If the webpage takes too long to load or required elements do not appear, consider using the "Wait For ID" or "Wait For Selector" options or increasing delay settings.
Binary Data Handling: When outputting binary files, ensure downstream nodes can handle binary data properly.

Links and References

Capture Page API Documentation (hypothetical link based on service domain)
n8n Documentation on Binary Data
HTTP Basic Authentication

CaptureInstall