Firecrawl

Get data from Firecrawl API

Actions7

Overview

The Firecrawl node is designed to extract data from specified URLs using the Firecrawl API. It allows users to define a set of URLs, customize the extraction process with various options, and specify how the data should be structured. This node is particularly beneficial for web scraping tasks where users need to gather information from multiple web pages efficiently. Common scenarios include gathering product details from e-commerce sites, collecting news articles from various sources, or aggregating data from blogs.

Practical Examples:

E-commerce Data Collection: Extracting product prices and descriptions from an online store.
News Aggregation: Collecting headlines and summaries from different news websites.
Content Scraping: Gathering blog posts or articles based on specific keywords.

Properties

Name	Meaning
URLs	The URLs to extract data from, specified in glob format.
Prompt	A prompt to guide the extraction process.
Schema	Defines the structure of the extracted data in JSON format.
Ignore Sitemap	Option to ignore the website sitemap when crawling.
Include Subdomains	Whether to include subdomains of the website in the extraction process.
Enable Web Search	Enables web search to find additional data during extraction.
Show Sources	Displays the sources used to extract the data.
Scrape Options	Various options for scraping content, including output formats and tag inclusion/exclusion.
Additional Fields	Allows sending custom properties in the request body.
Use Custom Body	Indicates whether to use a custom body for the request.

Output

The output of the Firecrawl node will typically be structured as JSON, containing the extracted data based on the defined schema. If binary data is involved, it may represent images or files scraped from the specified URLs.

Dependencies

Firecrawl API: Requires an API key credential for authentication.
Base URL Configuration: Users must configure the base URL for the Firecrawl API, which defaults to http://localhost:3002/v1 if not specified.

Troubleshooting

Common Issues:
- Invalid URL Format: Ensure that the URLs provided are in the correct glob format.
- API Authentication Errors: Verify that the API key is correctly configured and has the necessary permissions.
- Timeouts: Adjust the timeout settings if requests are taking too long to respond.
Error Messages:
- "Invalid URL": Check the format of the URLs being passed to the node.
- "Authentication Failed": Confirm that the API key is valid and properly set up in n8n.
- "Request Timeout": Increase the timeout duration in the scrape options.