Capture icon

Capture

Capture website screenshots, generate PDFs, extract content and metadata

Overview

This node enables capturing website data in various forms: screenshots, PDFs, extracted content, and metadata. It is useful for automating the process of visually documenting web pages, archiving them as PDFs, or extracting textual and metadata information for further processing.

Common scenarios include:

  • Generating visual snapshots of webpages for monitoring or reporting.
  • Creating PDF versions of web content for offline reading or archival.
  • Extracting raw HTML content or metadata such as page titles, descriptions, or other SEO-related data.
  • Automating web capture workflows without manual browser interaction.

For example, you could use this node to automatically take daily screenshots of a competitor’s homepage, save them as images or PDFs, and analyze changes over time.

Properties

Name Meaning
URL The webpage URL to capture. Must be a valid URL.
Additional Options A collection of optional settings to customize the capture behavior:
- Best Format Automatically select the optimal image format for screenshots.
- Block Ads Block advertisements on the page during capture.
- Block Cookie Banners Automatically dismiss cookie consent banners.
- Block Trackers Block tracking scripts to avoid analytics or ads tracking.
- Bypass Bot Detection Attempt to bypass bot detection mechanisms on the target site.
- Dark Mode Enable dark mode rendering for the capture.
- Emulate Device Specify a device name (e.g., "iPhone X") to emulate its viewport and user agent.
- File Name Custom filename prefix for saved files (screenshots or PDFs).
- Fresh Force a new capture ignoring any cached results.
- HTTP Authentication Provide HTTP Basic Auth credentials encoded in base64url format for sites requiring authentication.
- Mobile Emulate a mobile device viewport and user agent.
- User Agent Custom user agent string to use when requesting the page.
- Wait For ID Wait for an element with this ID to appear before capturing.
- Wait For Selector Wait for a CSS selector to appear before capturing.

Additional properties are available depending on the operation selected (e.g., viewport size, delay, output format).

Output

The node outputs JSON objects describing the capture result, including:

  • url: The URL used for the capture request.
  • operation: The type of capture performed (screenshot, pdf, content, or metadata).
  • Additional fields depending on the operation, such as format, viewport dimensions, orientation, and full page flag.

If the output mode is set to return binary data (for screenshots or PDFs), the node attaches the captured file as binary data under the key data. This binary data contains the actual image or PDF file content, ready for saving or further processing.

For content and metadata operations, the node returns the extracted data directly in JSON form.

Dependencies

  • Requires an API key credential for the external Capture API service at https://cdn.capture.page.
  • The node uses this API key and secret to build authenticated requests.
  • No additional environment variables are needed beyond the configured API credentials.

Troubleshooting

  • Invalid URL error: If the provided URL is malformed or invalid, the node throws an error indicating the URL is invalid. Ensure URLs are correctly formatted and accessible.
  • Unsupported operation error: If an unsupported operation is specified, the node will throw an error. Use only supported operations: screenshot, pdf, content, or metadata.
  • Authentication issues: If the API key or secret is incorrect or missing, the node will fail to authenticate with the capture service. Verify that the API credentials are properly configured.
  • Timeouts or delays: If the page takes too long to load or the wait selectors/IDs do not appear, the capture may fail or timeout. Adjust delay settings or wait conditions accordingly.
  • Binary data handling: When outputting binary files, ensure subsequent nodes can handle binary data to avoid errors.

Using the "Continue On Fail" option allows the workflow to proceed even if some captures fail, returning error details in the output JSON.

Links and References

Discussion