Actions18
- Search Actions
- Map Actions
- Extract Actions
- Research Actions
- Crawl Actions
Overview
This node performs a web mapping operation starting from a specified root URL. It crawls the web pages linked from the root URL according to user-defined options such as maximum depth, breadth, and filtering rules based on URL paths and domains. This is useful for scenarios like site auditing, content discovery, or building a sitemap.
Use Case Examples
- Starting from https://docs.tavily.com, crawl up to 2 levels deep, following up to 10 links per page, and only include URLs matching the pattern '/docs/.*'.
- Crawl a website excluding private paths and external domains, with a timeout of 120 seconds.
Properties
| Name | Meaning |
|---|---|
| URL | The root URL to begin mapping the web pages from. |
| Options | A collection of settings to control the crawling behavior, including instructions, max depth, max breadth, result limit, path and domain filters, external link inclusion, and timeout duration. |
Output
JSON
url- The root URL from which the mapping started.results- An array of mapped URLs and their metadata discovered during the crawl.metadata- Additional information about the mapping operation such as duration, number of pages crawled, and any errors encountered.
Dependencies
- This node likely depends on an external web crawling or mapping service or library to perform the URL mapping operation.
Troubleshooting
- Common issues include timeouts if the crawl takes too long or if the max depth and breadth settings are too high.
- Errors may occur if the root URL is invalid or unreachable.
- Regex patterns for selecting or excluding paths/domains must be valid to avoid filtering errors.