FireCrawl

FireCrawl API

Actions8

Overview

This node integrates with the FireCrawl API to perform web crawling with real-time monitoring via WebSocket. Specifically, the "Crawl Url With Websocket Monitoring" operation allows users to start a crawl on a specified URL while receiving live updates through a WebSocket connection. This is useful for scenarios where you want to scrape or analyze website content dynamically and monitor progress or results as they happen.

Practical examples include:

Extracting structured data from a website while watching the crawl progress.
Monitoring large-scale crawls in real time to react immediately to findings.
Filtering out unwanted paths during crawling to focus on relevant content.

Properties

Name	Meaning
Url	The URL to start crawling from.
Exclude Paths	A list of URL paths to exclude from the crawl, allowing selective crawling by ignoring certain subpaths.
Limit	Maximum number of crawl results to return.
Scrape Options	Options controlling the scraping output format and extraction details:
	- Formats: Output formats such as Markdown, HTML, or raw extracted data.
	- Extract: Structured data extraction settings including schema, system prompt, and extraction prompt.
Use Custom Body	Boolean flag to indicate if a fully custom request body should be used instead of the standard parameters.
Custom Body	JSON object representing a fully custom request body to send to the API, overriding other parameters.

Output

The node outputs JSON data containing the results of the crawl and scraping operation. The structure typically includes the scraped content formatted according to the selected options (Markdown, HTML, or extracted data). If binary data were involved, it would represent downloaded files or media, but this node focuses on textual crawl results.

Dependencies

Requires an active FireCrawl API credential with a base URL and authentication token configured in n8n.
The node sends HTTP requests to the FireCrawl API endpoint and establishes a WebSocket connection for monitoring.
No additional external dependencies beyond the FireCrawl service and its API.

Troubleshooting

Common issues:
- Invalid or missing API credentials will cause authentication failures.
- Incorrect URL format or unreachable URLs may result in crawl errors.
- Misconfigured exclude paths might lead to unexpected crawl results.
- Using a custom body with invalid JSON syntax can cause request failures.
Error messages:
- Authentication errors: Check that the API key/token is correctly set up.
- Network errors: Verify the target URL is accessible and the FireCrawl API base URL is correct.
- Validation errors: Ensure all required properties are provided and properly formatted.

Links and References

FireCrawl API Documentation (hypothetical link)
n8n documentation on Creating Custom Nodes

FireCrawlInstall