Firecrawl Tool

Web scraping, crawling, and data extraction tool using Firecrawl v2 API. Can extract content from websites, crawl entire domains, map site structures, search the web, and extract structured data using AI. Perfect for both workflow automation and AI agent tools.

Overview

The node "Firecrawl Tool" provides a web scraping, crawling, and data extraction capability using the Firecrawl v2 API. Specifically for the Map operation, it discovers and returns all URLs from a specified website quickly. This is useful to understand the structure of a site by mapping out its accessible pages.

Common scenarios where this node is beneficial include:

  • Quickly generating a sitemap or URL list for SEO analysis.
  • Discovering all reachable pages on a website before performing further scraping or crawling.
  • Filtering URLs based on search terms or including subdomains to get a comprehensive site map.

Practical example:
You want to analyze all pages on https://example.com to prepare a batch scrape later. Using the Map operation, you input the base URL and optionally limit the number of URLs returned or filter URLs containing certain keywords.

Properties

Name Meaning
URL The website URL to map. This is the starting point for discovering all URLs on the site.
Map Options Collection of options to customize the mapping process:
- Limit Maximum number of URLs to return (number).
- Search Filter URLs by a search term (string). Only URLs containing this term will be included.
- Include Subdomains Whether to include URLs from subdomains of the main domain (boolean).

Output

The output JSON contains the result of the URL mapping operation as returned by the Firecrawl API. It typically includes a list of discovered URLs from the target website, possibly filtered and limited according to the input options.

If the node supports binary data output in other operations, it is not applicable here; the Map operation outputs structured JSON data representing URLs.

Dependencies

  • Requires an API key credential for the Firecrawl API.
  • The node makes HTTP POST requests to the Firecrawl API endpoint (default: https://api.firecrawl.dev/v2/map).
  • Proper configuration of the API key credential in n8n is necessary for authentication.

Troubleshooting

  • Missing API Key Error:
    If the API key is not provided or invalid, the node throws an error indicating that the Firecrawl API key is required. Ensure the API key credential is correctly set up in n8n.

  • Timeouts or No Data Returned:
    If the website is unreachable or the API fails to respond, the node may throw errors or return empty results. Verify network connectivity and the validity of the URL.

  • Invalid URL Format:
    Providing an improperly formatted URL may cause the API request to fail. Always use full URLs with protocol (e.g., https://example.com).

  • Limit Too High:
    Setting a very high limit might slow down the response or cause timeouts depending on the API restrictions.

Links and References

Discussion