Actions7
Overview
The Firecrawl node in n8n is designed to interact with the Firecrawl API, allowing users to map a website and retrieve URLs. This functionality is particularly useful for web scraping, SEO analysis, or data collection from various websites. Common scenarios include gathering links for content aggregation, monitoring website changes, or extracting data for research purposes. For example, a user might want to collect all URLs from a competitor's site to analyze their structure or identify potential backlinks.
Properties
| Name | Meaning |
|---|---|
| Url | The URL of the website to be crawled (default: http://localhost:3002). |
| Ignore Sitemap | A boolean indicating whether to ignore the website's sitemap when crawling (default: true). |
| Sitemap Only | A boolean that specifies if only links found in the website's sitemap should be returned (default: false). |
| Include Subdomains | A boolean that determines whether to include subdomains of the specified website (default: false). |
| Limit | The maximum number of results to return, with a minimum of 1 and a maximum of 5000 (default: 5000). |
| Timeout (Ms) | The timeout duration in milliseconds for the request (default: 10000 ms). |
| Additional Fields | A collection of additional fields to send in the request body, including custom JSON properties. |
| Use Custom Body | A boolean indicating whether to use a custom body for the request (default: false). |
Output
The output of the Firecrawl node will typically consist of a JSON array containing the URLs retrieved from the specified website. Each entry in the array represents a unique URL found during the crawl process. If the node can output binary data, it would generally represent files or resources associated with the URLs collected.
Dependencies
- An API key credential is required to authenticate with the Firecrawl API.
- The base URL for the Firecrawl API can be configured, defaulting to
http://localhost:3002/v1.
Troubleshooting
Common Issues:
- Users may encounter issues with authentication if the API key is missing or incorrect.
- Timeouts may occur if the specified URL is unresponsive or takes too long to respond.
Error Messages:
- "Authentication failed": Ensure that the correct API key is provided.
- "Request timed out": Check the URL for accessibility and consider increasing the timeout value.
- "Invalid URL format": Verify that the URL entered is correctly formatted.