Firecrawl icon

Firecrawl

Get data from Firecrawl API

Overview

This node integrates with the Firecrawl API to map a website and retrieve URLs found during crawling. It is useful for scenarios where you want to analyze the structure of a website, gather all accessible links, or extract sitemap information programmatically. For example, digital marketers can use it to audit website link structures, SEO specialists to verify sitemap coverage, or developers to collect URLs for further automated processing.

The "Map a website and get urls" operation crawls the specified URL, optionally including or excluding the sitemap and subdomains, and returns a list of discovered URLs up to a configurable limit.

Properties

Name Meaning
Url The starting URL of the website to crawl (e.g., https://firecrawl.dev).
Sitemap How to handle the website sitemap during crawling: Include (default), Only (use sitemap only), or Skip (ignore sitemap).
Include Subdomains Whether to include subdomains of the website in the crawl (true/false).
Limit Maximum number of URLs to return from the crawl (1 to 5000).
Timeout (Ms) Request timeout in milliseconds (e.g., 10000 ms = 10 seconds).
Use Custom Body Whether to send a custom JSON body instead of the standard parameters (true/false).
Additional Fields When using a custom body, allows adding extra JSON properties to the request body.

Output

The node outputs JSON data containing the results of the website mapping operation. This typically includes an array of URLs discovered during the crawl, along with metadata such as their source (e.g., sitemap or page links). The exact structure depends on the Firecrawl API response but generally provides comprehensive link data.

No binary data output is indicated by the code or properties.

Dependencies

  • Requires an API key credential for authenticating with the Firecrawl API.
  • The base URL defaults to https://api.firecrawl.dev/v2 but can be overridden via credentials.
  • Network access to the target websites and the Firecrawl API endpoint is necessary.

Troubleshooting

  • Timeouts: If the crawl takes longer than the configured timeout, the request may fail. Increase the "Timeout (Ms)" value if needed.
  • Limit Exceeded: Setting the "Limit" too high might cause performance issues or API rate limiting. Adjust accordingly.
  • Invalid URL: Ensure the "Url" property is a valid and reachable website address.
  • API Authentication Errors: Verify that the API key credential is correctly configured and has sufficient permissions.
  • Sitemap Handling: Choosing "Only" for sitemap without a valid sitemap may result in no URLs returned.

Links and References

Discussion