Tavily icon

Tavily

Tavily API

Overview

This node performs a web mapping operation starting from a specified root URL. It crawls the web pages linked from the root URL according to user-defined options such as maximum depth, breadth, and filtering rules based on URL paths and domains. This is useful for scenarios like site auditing, content discovery, or building a sitemap.

Use Case Examples

  1. Starting from https://docs.tavily.com, crawl up to 2 levels deep, following up to 10 links per page, and only include URLs matching the pattern '/docs/.*'.
  2. Crawl a website excluding private paths and external domains, with a timeout of 120 seconds.

Properties

Name Meaning
URL The root URL to begin mapping the web pages from.
Options A collection of settings to control the crawling behavior, including instructions, max depth, max breadth, result limit, path and domain filters, external link inclusion, and timeout duration.

Output

JSON

  • url - The root URL from which the mapping started.
  • results - An array of mapped URLs and their metadata discovered during the crawl.
  • metadata - Additional information about the mapping operation such as duration, number of pages crawled, and any errors encountered.

Dependencies

  • This node likely depends on an external web crawling or mapping service or library to perform the URL mapping operation.

Troubleshooting

  • Common issues include timeouts if the crawl takes too long or if the max depth and breadth settings are too high.
  • Errors may occur if the root URL is invalid or unreachable.
  • Regex patterns for selecting or excluding paths/domains must be valid to avoid filtering errors.

Discussion