Overview
The node "Eddie.surf" provides web crawling and AI-powered smart search capabilities. It allows users to crawl multiple URLs or batches of URLs to extract structured data based on a provided JSON schema, or perform intelligent searches across websites using AI guidance.
Common scenarios include:
- Extracting structured information (e.g., pricing, contact info) from one or more websites.
- Performing large-scale batch crawls for extensive data collection.
- Conducting AI-driven searches to find relevant content across specified sites.
- Monitoring the status of ongoing crawl or search jobs.
Practical examples:
- Crawl a list of competitor websites to gather product details in a structured format.
- Use smart search to find mentions of a brand or topic across multiple domains.
- Batch crawl hundreds of URLs to build a comprehensive dataset for market research.
- Check the progress or results of a previously started crawl or search job.
Properties
| Name | Meaning |
|---|---|
| URLs | Comma-separated list of URLs to crawl (for "crawl" and "crawlBatch" operations). |
| Context | JSON object guiding AI processing and data extraction; used in "crawl", "crawlBatch", and "smartSearch". |
| JSON Schema | JSON schema defining the structure of data to extract; required for "crawl" and "crawlBatch". |
| Search Query | Text query string to find relevant content; required for "smartSearch". |
| Advanced Options | Collection of optional settings affecting crawl/search behavior: |
| - Callback Mode | Notification mode for callbacks: "Once" or "Multi". |
| - Callback URL | Optional webhook URL to receive job completion notifications. |
| - Include Technical Data | Whether to collect technical data during crawl (costs extra credits per page). |
| - Max Depth | Maximum link depth to follow during crawl (1-10). |
| - Max Pages | Maximum number of pages to crawl. |
| - Max Results | Maximum number of search results to return (1-5000), applicable to "smartSearch". |
| - Mock Mode | Enables test mode without consuming credits. |
| - Rules | Comma-separated custom processing instructions (e.g., "Extract pricing, Extract contact info"). |
| - Skip Duplicate Domains | For "smartSearch": whether to skip results from duplicate domains. |
| - Timeout Per Page | Timeout in seconds per page during crawl (1-180). |
| - Website Only | For "smartSearch": restrict search only within specified websites. |
| Job Type | For "getStatus": type of job to check ("Crawl Job" or "Smart Search Job"). |
| Job ID | For "getStatus": identifier of the job to check status for. |
| Site ID | Optional for "getStatus" with crawl jobs: check status of individual site within the job. |
Output
The node outputs an array of items where each item contains a json field with the response data from the Eddie.surf API corresponding to the selected operation:
- Crawl / Crawl Batch: The output JSON includes extracted structured data according to the provided JSON schema, along with metadata about the crawl such as URLs processed, depth, pages crawled, and optionally technical data if enabled.
- Smart Search: The output JSON contains AI-generated search results matching the query, including relevant content snippets and metadata.
- Get Status: The output JSON provides the current status and details of a crawl or smart search job, including progress and any site-specific information if requested.
The node does not explicitly output binary data.
Dependencies
- Requires an API key credential for authenticating requests to the Eddie.surf service.
- Uses HTTP POST or GET requests to Eddie.surf endpoints (
/crawl,/crawl-batch,/smart-search,/crawl/{jobId},/smart-search/{jobId}). - No additional environment variables are indicated beyond the API authentication setup.
Troubleshooting
- Invalid URL Format: URLs must start with
http://orhttps://. Ensure all URLs are correctly formatted. - URL Count Limits:
- "Crawl" supports up to 199 URLs; exceeding this will cause an error suggesting to use "Crawl Batch".
- "Crawl Batch" requires at least 200 URLs; fewer URLs will trigger an error recommending "Crawl".
- Parameter Ranges:
- Max Depth must be between 1 and 10.
- Max Pages must be at least 1.
- Timeout Per Page must be between 1 and 180 seconds.
- Max Results (for smart search) must be between 1 and 5000.
- Missing Required Fields: Operations require certain parameters (e.g., URLs for crawl, query for smart search, job ID for status checks). Missing these will throw errors.
- API Errors: Network issues or invalid credentials will result in request failures. Verify API key validity and network connectivity.
- Mock Mode: When enabled, no credits are consumed but results may be simulated; useful for testing.
Links and References
- Eddie.surf Official Website
- Documentation for JSON Schema: https://json-schema.org/
- General n8n documentation on creating and using nodes: https://docs.n8n.io/