Actions7
Overview
The Firecrawl node is designed to extract data from specified URLs using the Firecrawl API. It allows users to define a set of URLs, customize the extraction process with various options, and specify how the data should be structured. This node is particularly beneficial for web scraping tasks where users need to gather information from multiple web pages efficiently. Common scenarios include gathering product details from e-commerce sites, collecting news articles from various sources, or aggregating data from blogs.
Practical Examples:
- E-commerce Data Collection: Extracting product prices and descriptions from an online store.
- News Aggregation: Collecting headlines and summaries from different news websites.
- Content Scraping: Gathering blog posts or articles based on specific keywords.
Properties
| Name | Meaning |
|---|---|
| URLs | The URLs to extract data from, specified in glob format. |
| Prompt | A prompt to guide the extraction process. |
| Schema | Defines the structure of the extracted data in JSON format. |
| Ignore Sitemap | Option to ignore the website sitemap when crawling. |
| Include Subdomains | Whether to include subdomains of the website in the extraction process. |
| Enable Web Search | Enables web search to find additional data during extraction. |
| Show Sources | Displays the sources used to extract the data. |
| Scrape Options | Various options for scraping content, including output formats and tag inclusion/exclusion. |
| Additional Fields | Allows sending custom properties in the request body. |
| Use Custom Body | Indicates whether to use a custom body for the request. |
Output
The output of the Firecrawl node will typically be structured as JSON, containing the extracted data based on the defined schema. If binary data is involved, it may represent images or files scraped from the specified URLs.
Dependencies
- Firecrawl API: Requires an API key credential for authentication.
- Base URL Configuration: Users must configure the base URL for the Firecrawl API, which defaults to
http://localhost:3002/v1if not specified.
Troubleshooting
Common Issues:
- Invalid URL Format: Ensure that the URLs provided are in the correct glob format.
- API Authentication Errors: Verify that the API key is correctly configured and has the necessary permissions.
- Timeouts: Adjust the timeout settings if requests are taking too long to respond.
Error Messages:
- "Invalid URL": Check the format of the URLs being passed to the node.
- "Authentication Failed": Confirm that the API key is valid and properly set up in n8n.
- "Request Timeout": Increase the timeout duration in the scrape options.