Capture

Capture website screenshots, generate PDFs, extract content and metadata

Actions4

Overview

This node captures website data in various formats including screenshots, PDFs, webpage content, and metadata. It is useful for automating the extraction of visual or textual information from web pages without manual browsing.

Common scenarios include:

Generating PDF reports of webpages for archiving or sharing.
Taking screenshots of websites for monitoring or documentation.
Extracting raw HTML content or metadata for analysis or integration.
Automating capture workflows triggered by other processes.

For example, you can use this node to generate a PDF snapshot of a product page after a delay to ensure all dynamic content loads, or capture a full-page screenshot of a news article for record keeping.

Properties

Name	Meaning
URL	The URL of the webpage to capture.
Format	PDF page format. Options: A3, A4, A5, Legal, Letter, Tabloid.
Orientation	PDF page orientation. Options: Portrait, Landscape.
Full Page	Whether to generate PDF of the full scrollable page (true/false).
Delay	Delay in seconds before generating the PDF (0–30).
Output	How to return the PDF data. Options: Binary Data (download PDF), URL Only (return PDF URL).
Scale	Scale factor for PDF rendering (0.1 to 2).
Print Background	Whether to print background graphics in the PDF (true/false).
Margins	PDF page margins with options for Top, Right, Bottom, Left (e.g., "1cm").
Additional Options	Collection of advanced settings including:
	- Best Format: Automatically select optimal image format (boolean)
	- Block Ads: Block advertisements (boolean)
	- Block Cookie Banners: Dismiss cookie banners automatically (boolean)
	- Block Trackers: Block tracking scripts (boolean)
	- Bypass Bot Detection: Bypass bot detection systems (boolean)
	- Dark Mode: Enable dark mode (boolean)
	- Emulate Device: Device name to emulate for screenshots (string, e.g., "iPhone X")
	- File Name: Custom filename for saved file (string)
	- Fresh: Force new capture ignoring cache (boolean)
	- HTTP Authentication: Base64url encoded username:password for HTTP Basic Auth (string)
	- Mobile: Emulate mobile device (boolean)
	- User Agent: Custom user agent string (string)
	- Wait For ID: Element ID to wait for before capturing (string)
	- Wait For Selector: CSS selector to wait for before capturing (string)

Output

The node outputs JSON data describing the capture operation and its parameters, along with either:

Binary data containing the actual PDF file if "Binary Data" output is selected. This binary data can be used downstream for saving or further processing.
A URL pointing to the generated PDF if "URL Only" output is selected.

The JSON output includes fields such as:

url: The URL of the generated PDF.
operation: The operation performed ("pdf").
format: The PDF page format used.
orientation: The page orientation.
fullPage: Whether full page was captured.

If binary output is selected, the PDF file is attached under the binary property named data.

Dependencies

Requires an API key credential for authentication with the external Capture API service.
The node communicates with the Capture API at https://cdn.capture.page.
No additional environment variables are required beyond the API credentials.
Uses internal helper functions to build URLs, download binary content, and create binary data objects.

Troubleshooting

Invalid URL error: If the provided URL is malformed or invalid, the node throws an error "Invalid URL". Ensure the URL is correctly formatted and accessible.
Unsupported operation error: If an unsupported operation is specified, the node will throw an error indicating so. Use only supported operations like "pdf", "screenshot", "content", or "metadata".
API authentication errors: If the API key or secret is missing or incorrect, requests to the Capture API will fail. Verify that valid credentials are configured.
Timeouts or delays: If the webpage takes too long to load, consider increasing the delay parameter to allow full rendering before capture.
Binary data handling: When selecting binary output, downstream nodes must support binary data input to process the PDF correctly.

Links and References

Capture API Documentation (hypothetical link)
n8n Documentation on Working with Binary Data
Webpage Screenshot and PDF Generation Concepts

CaptureInstall