Overview
The node provides web crawling and AI-powered smart search capabilities via the Eddie.surf service. Specifically, the "Crawl Batch" operation enables batch crawling of 200 or more URLs with optimized processing to extract structured data based on a user-defined JSON schema and contextual guidance.
This operation is beneficial when you need to gather large-scale structured information from many websites efficiently, such as market research, competitive analysis, or content aggregation. For example, a marketing team could batch crawl hundreds of competitor websites to extract pricing, contact info, or product details automatically.
Properties
| Name | Meaning |
|---|---|
| URLs | Comma-separated list of URLs (minimum 200) to crawl in batch mode. Each URL must start with http:// or https://. |
| Context | JSON object providing context to guide AI processing and data extraction during crawling. |
| JSON Schema | JSON schema defining the expected structure of the extracted data from the crawled pages. |
| Advanced Options | Collection of optional settings: |
| - Callback Mode | Notification callback mode: "Once" or "Multi". |
| - Callback URL | Optional webhook URL to receive job completion notifications. |
| - Include Technical Data | Whether to include technical data collection (costs additional credits per page). |
| - Max Depth | Maximum link depth to follow during crawling (1-10). |
| - Max Pages | Maximum number of pages to crawl per URL (minimum 1). |
| - Mock Mode | Enable test mode without consuming credits. |
| - Rules | Comma-separated custom processing instructions (e.g., "Extract pricing, Extract contact info"). |
| - Timeout Per Page | Timeout in seconds for loading each page (1-180 seconds). |
Output
The output is a JSON object representing the result of the batch crawl request. It typically contains structured data extracted according to the provided JSON schema and context. The exact structure depends on the schema and the crawled content.
If the operation succeeds, the output JSON includes the crawl results; if it fails, an error message is returned in the json.error field.
The node does not explicitly output binary data.
Dependencies
- Requires an API key credential for authenticating requests to the Eddie.surf service.
- The node makes HTTP POST requests to the
/crawl-batchendpoint of the Eddie.surf API. - Proper configuration of the API authentication credential in n8n is necessary.
- Optional webhook URL can be configured for asynchronous job completion callbacks.
Troubleshooting
- Invalid URL format: URLs must start with
http://orhttps://. Ensure all URLs are correctly formatted. - Minimum URL count: The "Crawl Batch" operation requires at least 200 URLs. Use the "Crawl" operation for fewer URLs.
- Max Depth and Max Pages validation: Max Depth must be between 1 and 10; Max Pages must be at least 1.
- Timeout Per Page limits: Must be between 1 and 180 seconds.
- API errors: If the API returns errors, check your API key validity, network connectivity, and that the input parameters meet the requirements.
- Empty URLs list: At least one URL is required; empty or whitespace-only entries will cause errors.
- Mock Mode: When enabled, no credits are consumed but results may be simulated.