Actions4
- Deep SERPAPI Actions
- Universal Scraping API Actions
- Crawler Actions
Overview
The node provides web crawling capabilities under the "Crawler" resource with a "Crawl" operation. It allows users to start from a specified URL and automatically traverse linked subpages up to a defined limit. This is useful for extracting data from multiple pages of a website without manually specifying each URL.
Common scenarios include:
- Collecting product listings spread across multiple pages on an e-commerce site.
- Gathering blog posts or articles linked through pagination.
- Indexing content from a website for analysis or monitoring changes.
For example, a user can input the homepage URL of a news site and set the number of subpages to crawl as 10. The node will then visit the homepage and follow links to up to 10 additional pages, returning the aggregated data.
Properties
| Name | Meaning |
|---|---|
| URL to Crawl | The starting URL where the crawler begins its traversal. |
| Number Of Subpages | Maximum number of subpages to crawl and return results from. Limited to 100 subpages in this node. |
Output
The output is an array of JSON objects representing the crawled pages and their extracted data. Each item corresponds to one page visited by the crawler, including the initial URL and its subpages up to the specified limit.
If the node supports binary data (not explicitly shown here), it would typically represent downloaded files or media from the crawled pages.
Dependencies
- Requires an API key credential for the Scrapeless service to authenticate requests.
- The node depends on the Scrapeless API backend to perform crawling operations.
- No additional environment variables are indicated.
Troubleshooting
Common issues:
- Exceeding the maximum allowed number of subpages (limited to 100) may cause errors or truncated results.
- Invalid or unreachable URLs will result in errors during crawling.
- Network connectivity problems or invalid API credentials will prevent successful execution.
Error messages:
"Unsupported resource: crawler"— indicates the resource parameter was incorrectly set; ensure "crawler" is selected.- Errors related to API authentication failure suggest checking the configured API key credential.
- Timeout or network errors imply connectivity issues or that the target website is blocking requests.
To resolve these, verify the URL format, ensure the API key is valid and active, and confirm network access to the target site.
Links and References
- Scrapeless Official Documentation (for detailed SDK usage and advanced crawling options)
- n8n Documentation on Creating Custom Nodes