Eddie.surf

Web crawling and smart search with Eddie.surf

Actions4

Overview

The node "Eddie.surf" provides web crawling and AI-powered smart search capabilities. It allows users to crawl multiple URLs or batches of URLs to extract structured data based on a provided JSON schema, or perform intelligent searches across websites using AI guidance.

Common scenarios include:

Extracting structured information (e.g., pricing, contact info) from one or more websites.
Performing large-scale batch crawls for extensive data collection.
Conducting AI-driven searches to find relevant content across specified sites.
Monitoring the status of ongoing crawl or search jobs.

Practical examples:

Crawl a list of competitor websites to gather product details in a structured format.
Use smart search to find mentions of a brand or topic across multiple domains.
Batch crawl hundreds of URLs to build a comprehensive dataset for market research.
Check the progress or results of a previously started crawl or search job.

Properties

Name	Meaning
URLs	Comma-separated list of URLs to crawl (for "crawl" and "crawlBatch" operations).
Context	JSON object guiding AI processing and data extraction; used in "crawl", "crawlBatch", and "smartSearch".
JSON Schema	JSON schema defining the structure of data to extract; required for "crawl" and "crawlBatch".
Search Query	Text query string to find relevant content; required for "smartSearch".
Advanced Options	Collection of optional settings affecting crawl/search behavior:
- Callback Mode	Notification mode for callbacks: "Once" or "Multi".
- Callback URL	Optional webhook URL to receive job completion notifications.
- Include Technical Data	Whether to collect technical data during crawl (costs extra credits per page).
- Max Depth	Maximum link depth to follow during crawl (1-10).
- Max Pages	Maximum number of pages to crawl.
- Max Results	Maximum number of search results to return (1-5000), applicable to "smartSearch".
- Mock Mode	Enables test mode without consuming credits.
- Rules	Comma-separated custom processing instructions (e.g., "Extract pricing, Extract contact info").
- Skip Duplicate Domains	For "smartSearch": whether to skip results from duplicate domains.
- Timeout Per Page	Timeout in seconds per page during crawl (1-180).
- Website Only	For "smartSearch": restrict search only within specified websites.
Job Type	For "getStatus": type of job to check ("Crawl Job" or "Smart Search Job").
Job ID	For "getStatus": identifier of the job to check status for.
Site ID	Optional for "getStatus" with crawl jobs: check status of individual site within the job.

Output

The node outputs an array of items where each item contains a json field with the response data from the Eddie.surf API corresponding to the selected operation:

Crawl / Crawl Batch: The output JSON includes extracted structured data according to the provided JSON schema, along with metadata about the crawl such as URLs processed, depth, pages crawled, and optionally technical data if enabled.
Smart Search: The output JSON contains AI-generated search results matching the query, including relevant content snippets and metadata.
Get Status: The output JSON provides the current status and details of a crawl or smart search job, including progress and any site-specific information if requested.

The node does not explicitly output binary data.

Dependencies

Requires an API key credential for authenticating requests to the Eddie.surf service.
Uses HTTP POST or GET requests to Eddie.surf endpoints (/crawl, /crawl-batch, /smart-search, /crawl/{jobId}, /smart-search/{jobId}).
No additional environment variables are indicated beyond the API authentication setup.

Troubleshooting

Invalid URL Format: URLs must start with http:// or https://. Ensure all URLs are correctly formatted.
URL Count Limits:
- "Crawl" supports up to 199 URLs; exceeding this will cause an error suggesting to use "Crawl Batch".
- "Crawl Batch" requires at least 200 URLs; fewer URLs will trigger an error recommending "Crawl".
Parameter Ranges:
- Max Depth must be between 1 and 10.
- Max Pages must be at least 1.
- Timeout Per Page must be between 1 and 180 seconds.
- Max Results (for smart search) must be between 1 and 5000.
Missing Required Fields: Operations require certain parameters (e.g., URLs for crawl, query for smart search, job ID for status checks). Missing these will throw errors.
API Errors: Network issues or invalid credentials will result in request failures. Verify API key validity and network connectivity.
Mock Mode: When enabled, no credits are consumed but results may be simulated; useful for testing.

Links and References

Eddie.surf Official Website
Documentation for JSON Schema: https://json-schema.org/
General n8n documentation on creating and using nodes: https://docs.n8n.io/