Firecrawl icon

Firecrawl

Get data from Firecrawl API

Overview

This node integrates with the Firecrawl API to map a website and retrieve URLs found on it. It is useful for web scraping, SEO analysis, or gathering site structure data without manually crawling the site yourself. The node can crawl entire websites, optionally including subdomains, and respects or ignores sitemaps based on user preference.

Practical examples:

  • Extract all accessible URLs from a competitor’s website for market research.
  • Generate a list of pages for automated testing or monitoring.
  • Collect sitemap URLs only to analyze site indexing status.

Properties

Name Meaning
Url The starting URL of the website to map and crawl.
Ignore Sitemap Whether to ignore the website's sitemap during crawling (true = do not use sitemap).
Sitemap Only Whether to return only links found in the sitemap, ignoring other crawled URLs.
Include Subdomains Whether to include URLs from subdomains of the main website.
Limit Maximum number of URLs to return (1 to 5000).
Timeout (Ms) Request timeout in milliseconds.
Use Custom Body Whether to send a custom request body instead of using the standard parameters.

Output

The node outputs JSON data containing the mapped URLs from the target website. The exact structure depends on the Firecrawl API response but generally includes an array of URLs discovered during the crawl.

If binary data were supported, it would typically represent downloadable content or snapshots, but this node focuses on JSON URL lists.

Dependencies

  • Requires an active API key credential for the Firecrawl API.
  • Network access to https://api.firecrawl.dev/v1.
  • Proper configuration of the API key credential within n8n.

Troubleshooting

  • Timeouts: If the crawl takes too long, increase the "Timeout (Ms)" property or reduce the "Limit".
  • Empty results: Check if "Ignore Sitemap" and "Sitemap Only" settings align with your expectations; some sites may have limited or no sitemap data.
  • Authentication errors: Ensure the API key credential is valid and correctly configured.
  • Rate limits: The Firecrawl API may enforce rate limits; handle errors accordingly or space out requests.

Links and References

Discussion