Actions7
Overview
The node integrates with the Firecrawl API to map a website and retrieve URLs found during the crawl. It is useful for scenarios where you want to analyze the structure of a website, gather all accessible links, or perform web scraping tasks that require a comprehensive list of URLs from a target domain.
Practical examples include:
- SEO analysis by extracting all pages of a website.
- Content auditing by collecting URLs to check for broken links or outdated content.
- Competitive research by mapping competitor websites.
- Preparing datasets for further automated crawling or data extraction workflows.
Properties
| Name | Meaning |
|---|---|
| Url | The starting URL of the website to be mapped and crawled. |
| Ignore Sitemap | Whether to ignore the website's sitemap when crawling (true/false). |
| Sitemap Only | Whether to only return links found in the website sitemap (true/false). |
| Include Subdomains | Whether to include subdomains of the website in the crawl results (true/false). |
| Limit | Maximum number of URLs to return from the crawl (1 to 5000). |
| Timeout (Ms) | Timeout duration in milliseconds for the crawl request. |
| Use Custom Body | Whether to use a custom request body instead of the standard parameters (true/false). |
Output
The node outputs JSON data containing the URLs discovered during the crawl. The exact structure typically includes an array of URLs or link objects representing the pages found on the website according to the specified options (e.g., including/excluding sitemap links, subdomains).
If binary data were involved (not indicated here), it would represent downloadable content or files fetched during crawling, but this node focuses on URL data only.
Dependencies
- Requires an API key credential for authenticating with the Firecrawl API.
- The node depends on network access to the Firecrawl service endpoint (default: https://api.firecrawl.dev/v1).
- No additional external dependencies are indicated beyond the Firecrawl API.
Troubleshooting
- Timeouts: If the crawl takes longer than the specified timeout, the node may fail or return partial results. Increase the "Timeout (Ms)" property if necessary.
- Limit Exceeded: Setting the "Limit" too high might cause performance issues or API rate limiting. Adjust the limit according to your needs and API constraints.
- Invalid URL: Providing an invalid or unreachable URL will result in errors. Ensure the URL is correct and accessible.
- Sitemap Issues: If "Ignore Sitemap" and "Sitemap Only" options conflict or are misconfigured, the output may not match expectations. Review these settings carefully.
- Authentication Errors: Missing or incorrect API credentials will prevent the node from connecting to the Firecrawl API.
Links and References
- Firecrawl API Documentation: https://firecrawl.dev/docs
- n8n Documentation on HTTP Request Nodes: https://docs.n8n.io/nodes/n8n-nodes-base.httpRequest/
- General Web Crawling Concepts: https://en.wikipedia.org/wiki/Web_crawler