Capture icon

Capture

Capture website screenshots, generate PDFs, extract content and metadata

Overview

This node enables capturing various types of data from webpages by interacting with an external capture service. It supports operations such as taking screenshots, generating PDFs, extracting webpage content, and retrieving metadata. This is useful for automating web monitoring, archiving pages, or gathering structured information from websites.

For example:

  • Automatically take a screenshot of a product page to archive its current state.
  • Generate a PDF report of a webpage for offline review.
  • Extract the main content or metadata (like title, description) from a news article for further processing.

Properties

Name Meaning
URL The URL of the webpage to capture.
Additional Options A collection of optional settings to customize the capture:
- Best Format Whether to automatically select the optimal image format.
- Block Ads Whether to block advertisements on the page.
- Block Cookie Banners Whether to automatically dismiss cookie consent banners.
- Block Trackers Whether to block tracking scripts.
- Bypass Bot Detection Whether to bypass bot detection systems.
- Dark Mode Whether to enable dark mode rendering.
- Emulate Device Specify a device name (e.g., "iPhone X") to emulate for screenshots.
- File Name Custom filename for the saved file.
- Fresh Force a new capture ignoring any cached results.
- HTTP Authentication HTTP Basic Authentication credentials encoded in base64url format.
- Mobile Whether to emulate a mobile device.
- User Agent Custom user agent string to use when accessing the webpage.
- Wait For ID Element ID to wait for before capturing.
- Wait For Selector CSS selector to wait for before capturing.

Output

The node outputs JSON data describing the result of the capture operation. The structure varies depending on the operation:

  • Metadata Operation: Outputs metadata extracted from the webpage, merged into the JSON output along with the original URL and operation type.
  • Screenshot Operation: Outputs either a URL to the captured image or binary image data (PNG, JPG, or WebP) if configured to download the file.
  • PDF Operation: Outputs either a URL to the generated PDF or binary PDF data if configured to download the file.
  • Content Operation: Outputs the extracted content of the webpage as JSON.

If binary data is included (for screenshots or PDFs), it is attached under a binary property named data with appropriate MIME type.

Dependencies

  • Requires an API key credential for the external capture service.
  • Uses the capture service's API at https://cdn.capture.page.
  • Relies on helper functions for URL validation, building capture URLs, downloading binary content, and creating binary data within n8n.

Troubleshooting

  • Invalid URL Error: If the provided URL is malformed or invalid, the node throws an error indicating "Invalid URL". Ensure the URL is correctly formatted and accessible.
  • Unsupported Operation Error: If an unsupported operation is specified, the node will throw an error. Verify that the operation is one of: screenshot, pdf, content, or metadata.
  • API Authentication Issues: Failure to authenticate with the capture service may cause errors. Confirm that the API key and secret are correctly configured.
  • Timeouts or Delays: If the webpage takes too long to load or required elements do not appear, consider using the "Wait For ID" or "Wait For Selector" options or increasing delay settings.
  • Binary Data Handling: When outputting binary files, ensure downstream nodes can handle binary data properly.

Links and References

Discussion