Actions8
Overview
This node integrates with the FireCrawl API to perform web crawling with real-time monitoring via WebSocket. Specifically, the "Crawl Url With Websocket Monitoring" operation allows users to start a crawl on a specified URL while receiving live updates through a WebSocket connection. This is useful for scenarios where you want to scrape or analyze website content dynamically and monitor progress or results as they happen.
Practical examples include:
- Extracting structured data from a website while watching the crawl progress.
- Monitoring large-scale crawls in real time to react immediately to findings.
- Filtering out unwanted paths during crawling to focus on relevant content.
Properties
| Name | Meaning |
|---|---|
| Url | The URL to start crawling from. |
| Exclude Paths | A list of URL paths to exclude from the crawl, allowing selective crawling by ignoring certain subpaths. |
| Limit | Maximum number of crawl results to return. |
| Scrape Options | Options controlling the scraping output format and extraction details: |
| - Formats: Output formats such as Markdown, HTML, or raw extracted data. | |
| - Extract: Structured data extraction settings including schema, system prompt, and extraction prompt. | |
| Use Custom Body | Boolean flag to indicate if a fully custom request body should be used instead of the standard parameters. |
| Custom Body | JSON object representing a fully custom request body to send to the API, overriding other parameters. |
Output
The node outputs JSON data containing the results of the crawl and scraping operation. The structure typically includes the scraped content formatted according to the selected options (Markdown, HTML, or extracted data). If binary data were involved, it would represent downloaded files or media, but this node focuses on textual crawl results.
Dependencies
- Requires an active FireCrawl API credential with a base URL and authentication token configured in n8n.
- The node sends HTTP requests to the FireCrawl API endpoint and establishes a WebSocket connection for monitoring.
- No additional external dependencies beyond the FireCrawl service and its API.
Troubleshooting
Common issues:
- Invalid or missing API credentials will cause authentication failures.
- Incorrect URL format or unreachable URLs may result in crawl errors.
- Misconfigured exclude paths might lead to unexpected crawl results.
- Using a custom body with invalid JSON syntax can cause request failures.
Error messages:
- Authentication errors: Check that the API key/token is correctly set up.
- Network errors: Verify the target URL is accessible and the FireCrawl API base URL is correct.
- Validation errors: Ensure all required properties are provided and properly formatted.
Links and References
- FireCrawl API Documentation (hypothetical link)
- n8n documentation on Creating Custom Nodes