Firecrawl icon

Firecrawl

Get data from Firecrawl API

Overview

The Firecrawl node is designed to scrape content from a specified URL using the Firecrawl API. This node is particularly useful for users who need to extract data from web pages, such as articles, product information, or any other content that can be accessed via a URL. Common scenarios include gathering data for research, monitoring website changes, or aggregating content for analysis. For example, a user might want to scrape product details from an e-commerce site to compare prices or gather reviews.

Properties

Name Meaning
Url The URL of the webpage to scrape. Default is http://localhost:3002.
Scrape Options Options for customizing the scraping process, including output formats and content filters.
- Formats Output format(s) for the scraped data (HTML, JSON, Links, Markdown, Raw HTML, Screenshot).
- Only Main Content If true, only the main content of the page will be returned, excluding headers and footers.
- Include Tags Specifies tags to include in the output.
- Exclude Tags Specifies tags to exclude from the output.
- Headers Custom headers to send with the request.
- Wait For (Ms) Time to wait in milliseconds for the page to load before fetching content.
- Mobile If true, emulates scraping from a mobile device.
- Skip TLS Verification If true, skips TLS certificate verification when making requests.
- Timeout (Ms) Maximum time in milliseconds to wait for the request to complete.
- Actions List of actions to perform on dynamic content before scraping (e.g., click, scroll).
- Location Settings for location-based scraping, including country and preferred languages.
- Remove Base64 Images If true, removes base64 encoded images from the output.
- Block Ads If true, enables ad-blocking and cookie popup blocking.
- Proxy Type of proxy to use (Basic or Stealth).
Additional Fields Allows sending additional custom fields in the request body.
Use Custom Body If true, allows the use of a custom body for the request.

Output

The output of the Firecrawl node is structured in JSON format, containing the scraped content based on the specified options. The exact structure may vary depending on the selected output formats and included/excluded tags. If binary data is involved, it typically represents images or files retrieved during the scraping process.

Dependencies

  • Requires an API key credential for authentication with the Firecrawl API.
  • The base URL for the API can be configured, defaulting to http://localhost:3002/v1.

Troubleshooting

  • Common Issues:

    • Users may encounter issues with invalid URLs leading to failed requests.
    • Incorrectly configured headers or timeout settings may result in timeouts or errors.
  • Error Messages:

    • "Invalid URL" - Ensure the URL is correctly formatted and accessible.
    • "Request Timeout" - Increase the timeout setting if the target page takes longer to load.
    • "403 Forbidden" - Check if the required API key is valid and has the necessary permissions.

Links and References

Discussion