Actions8
Overview
The node integrates with the FireCrawl API to submit web crawling jobs. It allows users to specify a URL to crawl, set limits on the number of results, exclude certain paths from the crawl, and configure options for scraping data from crawled pages. The node is useful for scenarios such as:
- Automatically gathering structured or unstructured data from websites.
- Monitoring website content changes by scheduling repeated crawls.
- Extracting specific information using custom scraping prompts or schemas.
- Integrating crawl results into workflows for further processing or analysis.
For example, a user might submit a crawl job to collect product details from an e-commerce site, excluding irrelevant sections like user reviews, and receive the scraped data in Markdown format via webhook notifications.
Properties
| Name | Meaning |
|---|---|
| Url | The starting URL to begin the crawl from. |
| Limit | Maximum number of crawl results (pages) to return. Must be at least 1. |
| Exclude Paths | List of URL paths to exclude from the crawl process. Multiple paths can be specified to prevent crawling unwanted sections of the site. |
| Allow Backward Links | Boolean flag indicating whether to allow crawling pages that are not direct descendants of the initial URL (i.e., links going "backwards" in the site structure). |
| Webhook | URL to which webhook events will be sent during the crawl process, enabling real-time updates or integration with other systems. |
| Scrape Options | Configuration for how to scrape data from crawled pages, including: - Formats: Output formats for scraped data (Markdown, HTML, Extract). - Extract: Detailed extraction settings with schema, system prompt, and prompt text. |
| Use Custom Body | Whether to send a fully custom JSON body instead of using the standard properties. |
| Custom Body | A JSON object representing the entire request body to send when Use Custom Body is enabled. This allows full customization of the crawl job submission payload. |
Output
The node outputs JSON data representing the response from the FireCrawl API after submitting the crawl job. This typically includes information about the submitted job such as job ID, status, and any metadata returned by the API.
If the crawl job triggers webhook events, those events are sent asynchronously to the specified webhook URL and are not part of the immediate node output.
The node does not output binary data.
Dependencies
- Requires an API key credential for authenticating with the FireCrawl API.
- Needs the base URL of the FireCrawl API configured in the credentials.
- Network access to the FireCrawl service endpoint.
- Optional webhook URL must be accessible if webhook events are used.
Troubleshooting
- Invalid URL or unreachable target: Ensure the URL provided is valid and accessible from the network where n8n runs.
- Limit value too low or missing: The limit must be at least 1; otherwise, the API may reject the request.
- Incorrect webhook URL: If webhook events are not received, verify the webhook URL is correct and publicly reachable.
- Malformed custom body JSON: When using a custom body, ensure the JSON syntax is valid to avoid request errors.
- Authentication errors: Confirm that the API key credential is correctly configured and has necessary permissions.
- API rate limits or quota exceeded: Check FireCrawl account limits if requests start failing unexpectedly.
Links and References
- FireCrawl Official Documentation (example placeholder link)
- Web Crawling Concepts
- n8n Documentation - Creating Custom Nodes