FireCrawl icon

FireCrawl

FireCrawl API

Overview

This node, named "FireCrawl," is designed to scrape a given URL and retrieve its content in various formats. It is useful for extracting structured data, capturing screenshots, or obtaining raw HTML from web pages. Common scenarios include web data extraction for analysis, monitoring website changes, archiving page content, or automating data collection workflows.

For example, you can use this node to:

  • Extract the main article text from a news website.
  • Capture a full-page screenshot of a product page for visual records.
  • Retrieve all links on a webpage for crawling or validation.
  • Convert webpage content into Markdown format for documentation purposes.

The node supports advanced options such as waiting for dynamic content to load, interacting with page elements (clicking buttons, typing text), and customizing HTTP headers.

Properties

Name Meaning
URL The URL to scrape. This is the target webpage from which content will be extracted.
Formats Output format(s) for the scraped data. Options include: Extract (structured data extraction), Full Page Screenshot, HTML, Links, Markdown, Raw HTML, Screenshot. Multiple formats can be selected simultaneously.
Additional Options A collection of optional settings:
- Only Main Content: Return only the main content excluding headers, footers, navs, etc.
- Include Tags: Comma-separated list of HTML tags to include.
- Exclude Tags: Tags to exclude.
- Headers: Custom HTTP headers (e.g., cookies, user-agent).
- Wait for (MS): Delay before scraping to allow page loading.
- Mobile: Emulate mobile device scraping.
- Skip TLS Verification: Ignore TLS certificate errors.
- Timeout (MS): Request timeout duration.
- Remove Base64 Images: Remove base64 encoded images from output to reduce size.
Extract Structured data extraction parameters when using the "extract" format:
- Schema: Defines the structure for extracted data.
- System Prompt: Context prompt guiding extraction.
- Prompt: Extraction instructions without schema.
Actions List of actions to interact with dynamic content before scraping. Each action can be:
- Wait: Pause for specified milliseconds.
- Click: Click an element by CSS selector.
- Write: Type text into an input.
- Press: Simulate key press.
- Screenshot: Take a screenshot at that step.
These enable handling dynamic or interactive pages.
Use Custom Body Boolean flag to send a fully custom JSON body instead of using the above properties.
Custom Body JSON object defining the entire request body manually, including URL, formats, extract parameters, and actions. Useful for advanced users needing full control over the request payload.

Output

The node outputs JSON data containing the scraped content according to the requested formats. The structure varies depending on the selected formats:

  • Extract: Returns structured data based on the provided schema and prompts.
  • Full Page Screenshot / Screenshot: Contains image data representing the captured screenshot.
  • HTML / Raw HTML: Provides the HTML source code of the page or specific parts.
  • Links: Lists all hyperlinks found on the page.
  • Markdown: Converts the page content into Markdown format.

If binary data (such as screenshots) is included, it represents the visual capture of the webpage.

Dependencies

  • Requires an API key credential for the FireCrawl API service.
  • Needs network access to the target URLs.
  • Supports configuration of HTTP headers and TLS verification options.
  • No other external dependencies are indicated.

Troubleshooting

  • Timeouts: If the page takes too long to load, increase the "Timeout (MS)" or "Wait for (MS)" values.
  • TLS Errors: For sites with problematic certificates, enable "Skip TLS Verification."
  • Empty or Incomplete Data: Ensure the correct "Formats" are selected and that any required "Actions" to reveal dynamic content are configured.
  • Authentication Issues: Verify that the API key credential is correctly set up and has necessary permissions.
  • Base64 Image Overload: If output is too large, keep "Remove Base64 Images" enabled to reduce size.

Links and References

Discussion