Exa Websets

Create, manage, and query structured datasets from web sources using Exa Websets API

Actions28

Webset Actions
Enrichment Actions
Import Actions
Item Actions
Monitor Actions
Search Actions

Overview

The node "Exa Websets" enables importing content from various sources into structured datasets called websets using the Exa Websets API. Specifically, the Import - Create operation allows users to ingest data from CSV files, JSON data, sitemaps, or URL lists into either a new or existing webset.

This node is beneficial for scenarios such as:

Aggregating URLs or content from multiple sources into a centralized dataset.
Crawling and importing website data based on sitemap structures with configurable depth and filters.
Importing structured data (CSV/JSON) for further enrichment, search, or monitoring within the Exa Websets ecosystem.
Automating content ingestion workflows where content classification, duplicate detection, and language detection are required.

Practical examples:

Importing a list of blog post URLs to create a searchable webset of articles.
Crawling a sitemap to import all pages up to a certain depth while excluding images or PDFs.
Uploading CSV data containing product URLs and metadata for analysis and enrichment.

Properties

Name	Meaning
Import Type	Type of import to perform. Options: `CSV`, `JSON`, `Sitemap`, `URL List`.
Data Source	Source data for the import. Depending on Import Type, this can be URLs (one per line), raw CSV content, JSON data, or a sitemap URL.
Target Webset ID	Optional ID of an existing webset to import into. Leave empty to create a new webset.
Import Configuration	Collection of settings controlling import behavior: - Exclude Patterns: Comma-separated patterns to exclude (e.g., `.pdf,.jpg,/admin/*`). - Follow Redirects: Whether HTTP redirects should be followed. - Headers: Additional HTTP headers in JSON format. - Include Content: Extract content from URLs. - Include Metadata: Extract metadata. - Include Patterns: Comma-separated patterns to include (only matching URLs imported). - Include Screenshots: Take screenshots of pages. - Max Depth: Maximum crawl depth for sitemaps (1-10). - Rate Limit: Requests per second limit (1-100). - Timeout: Timeout in seconds for each URL (5-300). - User Agent: Custom User-Agent string for requests.
Processing Options	Collection of options controlling automatic extraction and processing: - Auto-Extract Forms: Extract form structures. - Auto-Extract Images: Extract image info. - Auto-Extract Links: Extract all links (default true). - Auto-Extract Tables: Extract and structure table data. - Content Classification: Automatically classify content type. - Duplicate Detection: Skip duplicate content during import (default true). - Language Detection: Detect content language automatically (default true).

Output

The node outputs JSON data representing the result of the import operation. This typically includes information about the created or updated webset and details about the imported items such as URLs, extracted content, metadata, and any processing results like classifications or detected languages.

If configured, the node may also handle binary data such as screenshots of imported pages, but this is not explicitly detailed in the provided code snippet.

Dependencies

Requires an active connection to the Exa Websets API via an API key credential.
The node uses the base URL https://api.exa.ai for API requests.
Proper configuration of authentication credentials in n8n is necessary.
Network access to external URLs or sitemaps specified in the import data source is required.

Troubleshooting

Common Issues:
- Invalid or missing API credentials will cause authentication failures.
- Incorrectly formatted data source input (e.g., malformed JSON or CSV) may lead to import errors.
- Specifying an invalid or non-existent Target Webset ID will cause the import to fail.
- Overly restrictive include/exclude patterns might result in no URLs being imported.
- Rate limits set too high could trigger API throttling or network timeouts.
Error Messages:
- "Unknown resource": Occurs if the resource parameter is incorrect; ensure "imports" is selected.
- Errors related to HTTP requests (timeouts, redirects) can often be resolved by adjusting timeout, follow redirects, or user agent settings.
- JSON parsing errors indicate malformed JSON in headers or data source fields.
Resolutions:
- Verify API credentials and permissions.
- Validate input data formats before running the node.
- Adjust import configuration parameters to match the target data.
- Use smaller batches or lower rate limits if facing timeouts.

Links and References

Exa Websets API Documentation (hypothetical link based on base URL)
n8n Documentation on Creating Custom Nodes
Understanding Web Crawling and Sitemaps