Eddie.surf

Web crawling and smart search with Eddie.surf

Actions4

Overview

The node provides web crawling and AI-powered smart search capabilities via the Eddie.surf service. Specifically, the "Crawl Batch" operation enables batch crawling of 200 or more URLs with optimized processing to extract structured data based on a user-defined JSON schema and contextual guidance.

This operation is beneficial when you need to gather large-scale structured information from many websites efficiently, such as market research, competitive analysis, or content aggregation. For example, a marketing team could batch crawl hundreds of competitor websites to extract pricing, contact info, or product details automatically.

Properties

Name	Meaning
URLs	Comma-separated list of URLs (minimum 200) to crawl in batch mode. Each URL must start with http:// or https://.
Context	JSON object providing context to guide AI processing and data extraction during crawling.
JSON Schema	JSON schema defining the expected structure of the extracted data from the crawled pages.
Advanced Options	Collection of optional settings:
- Callback Mode	Notification callback mode: "Once" or "Multi".
- Callback URL	Optional webhook URL to receive job completion notifications.
- Include Technical Data	Whether to include technical data collection (costs additional credits per page).
- Max Depth	Maximum link depth to follow during crawling (1-10).
- Max Pages	Maximum number of pages to crawl per URL (minimum 1).
- Mock Mode	Enable test mode without consuming credits.
- Rules	Comma-separated custom processing instructions (e.g., "Extract pricing, Extract contact info").
- Timeout Per Page	Timeout in seconds for loading each page (1-180 seconds).

Output

The output is a JSON object representing the result of the batch crawl request. It typically contains structured data extracted according to the provided JSON schema and context. The exact structure depends on the schema and the crawled content.

If the operation succeeds, the output JSON includes the crawl results; if it fails, an error message is returned in the json.error field.

The node does not explicitly output binary data.

Dependencies

Requires an API key credential for authenticating requests to the Eddie.surf service.
The node makes HTTP POST requests to the /crawl-batch endpoint of the Eddie.surf API.
Proper configuration of the API authentication credential in n8n is necessary.
Optional webhook URL can be configured for asynchronous job completion callbacks.

Troubleshooting

Invalid URL format: URLs must start with http:// or https://. Ensure all URLs are correctly formatted.
Minimum URL count: The "Crawl Batch" operation requires at least 200 URLs. Use the "Crawl" operation for fewer URLs.
Max Depth and Max Pages validation: Max Depth must be between 1 and 10; Max Pages must be at least 1.
Timeout Per Page limits: Must be between 1 and 180 seconds.
API errors: If the API returns errors, check your API key validity, network connectivity, and that the input parameters meet the requirements.
Empty URLs list: At least one URL is required; empty or whitespace-only entries will cause errors.
Mock Mode: When enabled, no credits are consumed but results may be simulated.