Firecrawl

Get data from Firecrawl API

Actions7

Overview

The Firecrawl node is designed to interact with the Firecrawl API, enabling users to crawl websites and extract data efficiently. This node is particularly beneficial for web scraping tasks where users need to gather content from various web pages while applying specific filters and options. Common scenarios include collecting product information from e-commerce sites, aggregating blog posts, or extracting data for research purposes. For example, a user might configure the node to crawl a news website, including only articles from a specific section while excluding advertisements and irrelevant paths.

Properties

Name	Meaning
Url	The URL of the website to crawl (default: `http://localhost:3002`).
Exclude Paths	URL patterns to exclude from the crawl using regex (e.g., `blog/*` excludes `/blog/article-1`).
Include Paths	URL patterns to include in the crawl using regex (e.g., `blog/*` includes only `/blog/article-1`).
Max Depth	Maximum depth to crawl relative to the entered URL (default: 2).
Limit	Maximum number of results to return (default: 50).
Crawl Options	Various options affecting the crawling behavior, such as ignoring sitemaps or allowing external links.
Scrape Options	Options for scraping content during the crawl, including output formats and tag inclusion/exclusion.
Additional Fields	Custom fields to send in the request body, including custom JSON properties.
Use Custom Body	A flag indicating whether to use a custom body for the request.

Output

The output structure of the Firecrawl node typically consists of the scraped data from the specified URL, formatted according to the selected scrape options. This may include HTML content, JSON objects, or other specified formats. If binary data is involved, it would represent images or files extracted during the crawl process.

Dependencies

Firecrawl API: An API key credential is required to authenticate requests.
Base URL Configuration: Users can set a base URL for the API, defaulting to http://localhost:3002/v1.

Troubleshooting

Common Issues:
- Invalid URL: Ensure that the provided URL is correctly formatted and accessible.
- Authentication Errors: Verify that the API key is valid and has the necessary permissions.
- Timeouts: Adjust the timeout settings if requests are taking too long to respond.
Error Messages:
- "Failed to connect": Indicates issues with the network or incorrect URL. Check connectivity and URL validity.
- "Unauthorized": Suggests problems with API key authentication. Confirm that the correct credentials are being used.

Links and References

Firecrawl API Documentation (replace with actual documentation link)
Web Scraping Best Practices (replace with relevant resource)