Firecrawl Tool

Web scraping, crawling, and data extraction tool using Firecrawl v2 API. Can extract content from websites, crawl entire domains, map site structures, search the web, and extract structured data using AI. Perfect for both workflow automation and AI agent tools.

Overview

The Firecrawl Tool node provides powerful web scraping, crawling, and data extraction capabilities using the Firecrawl v2 API. Specifically, the Scrape operation extracts content from a single webpage URL. It is ideal for workflows that need to retrieve structured or unstructured data from specific web pages, such as gathering product details, extracting article content, or capturing page snapshots.

Common scenarios include:

  • Extracting clean text content or markdown from a webpage for AI processing.
  • Generating summaries of web pages using AI.
  • Capturing screenshots of pages for visual records.
  • Collecting all links present on a page for further analysis.
  • Extracting structured data via JSON schemas for integration into databases or reports.

Practical example: You want to scrape a product page to get a clean markdown description, an AI-generated summary, and a screenshot for documentation purposes. You provide the URL, select these output formats, and optionally configure options like waiting for dynamic content or removing base64 images.

Properties

Name Meaning
URL The webpage URL to scrape. Example: https://docs.firecrawl.dev
Formats Output formats to return. Options: Markdown (clean markdown format, ideal for language models), HTML (cleaned HTML content), Summary (AI-generated summary), Screenshot (visual screenshot), Links (all links on page)
Additional Options Collection of advanced settings:
- Only Main Content Whether to extract only the main content, removing navigation, footers, etc. (boolean, default true)
- Max Age (seconds) Cache duration in seconds; use cached data if available and younger than this age. Default is 172800 (2 days)
- Wait For (ms) Time in milliseconds to wait for dynamic content to load before scraping (default 0)
- JSON Extraction JSON schema to extract structured data using AI. Example: {"type": "json", "prompt": "Extract product details", "schema": {"name": "string", "price": "number"}}
- Actions JSON array of actions to perform before scraping, e.g., clicking buttons, scrolling, waiting. Example: [{"type": "wait", "milliseconds": 1000}, {"type": "click", "selector": "button"}]
- Remove Base64 Images Whether to remove base64 encoded images from the output (boolean, default true)
- Mobile Whether to use a mobile viewport for scraping (boolean, default false)
- Include Tags Comma-separated list of HTML tags to include in the scrape (e.g., "article,main,div.content")
- Exclude Tags Comma-separated list of HTML tags to exclude from the scrape (e.g., "nav,footer,aside")

Output

The node outputs a JSON object containing the scraped data according to the requested formats:

  • Markdown: Clean markdown representation of the main content, suitable for AI consumption.
  • HTML: Cleaned HTML content of the page or selected parts.
  • Summary: AI-generated textual summary of the page content.
  • Screenshot: A visual screenshot image of the page (binary data or a link depending on implementation).
  • Links: An array of all hyperlinks extracted from the page.

If JSON extraction is used, the output will also include structured data matching the provided JSON schema.

The output is returned under the json field of the node's output item.

Dependencies

  • Requires an active Firecrawl API key credential with access to the Firecrawl v2 API.
  • The node makes HTTP requests to the Firecrawl API endpoint (default: https://api.firecrawl.dev).
  • Proper network connectivity to the API endpoint is necessary.
  • No additional environment variables are required beyond the API key credential.

Troubleshooting

  • Missing API Key Error: If the API key is not set in credentials, the node throws an error stating the Firecrawl API key is required. Solution: Add a valid API key credential.
  • Timeouts on Crawl Jobs: The crawl operation waits up to 5 minutes for completion. If it times out, an error is thrown. Solution: Increase timeout externally or check the crawl parameters.
  • Invalid JSON Schema: Providing malformed JSON in the JSON Extraction option causes errors. Solution: Validate JSON syntax before input.
  • Network Errors: Connectivity issues to the Firecrawl API will cause request failures. Check internet connection and API host configuration.
  • Incorrect URL Format: Ensure URLs are valid and accessible; invalid URLs may cause scraping failures.

Links and References

Discussion