Firecrawl

Firecrawl是一个LLM友好的网页爬虫系统

Actions9

Overview

The node "Firecrawl" integrates with the Firecrawl web crawling system, which is designed to perform various web scraping and crawling tasks. Specifically, for the Resource V1 and Operation 查询获取整站任务状态 Crawl/{ID}, it queries the status of a whole-site crawling task by its task ID. This operation is useful when you want to monitor or retrieve the current state of a large-scale crawl job that collects data from an entire website.

Practical scenarios include:

Checking if a scheduled full-site crawl has completed.
Retrieving progress or results metadata about a crawling task.
Integrating crawl status checks into automated workflows to trigger subsequent processing steps once crawling finishes.

Properties

Name	Meaning
任务ID	The unique identifier of the crawling task whose status you want to query. (Required)
返回格式	(For other operations like searching webpages) Specifies one or more formats in which to return scraped content. Options include: Extract, HTML, Links, Markdown, 原始HTML (Raw HTML), 整个页面截图 (Full page screenshot), 网页截图 (Webpage screenshot).

Note: For the specific operation 查询获取整站任务状态 Crawl/{ID}, only the 任务ID property is relevant and required.

Output

The node outputs JSON data representing the status and details of the requested whole-site crawling task. The exact structure depends on the Firecrawl API response but typically includes fields such as task progress, completion status, errors if any, and possibly summary data about the crawl.

If the node supports binary data output (e.g., screenshots), it would be included in the binary output field, representing images captured during crawling.

Dependencies

Requires an API key credential for authenticating with the Firecrawl service.
Needs the base URL of the Firecrawl API configured in the node credentials.
The node uses HTTP requests to communicate with the Firecrawl API endpoints.

Troubleshooting

Common issues:
- Invalid or missing task ID will cause the API to return an error or no data.
- Network connectivity problems or incorrect base URL configuration can lead to request failures.
- Insufficient permissions or expired API keys may result in authentication errors.
Error messages:
- Errors related to "task not found" indicate the provided task ID does not exist or is mistyped.
- HTTP 401 or 403 errors suggest authentication or authorization issues; verify API credentials.
- Timeout or connection errors require checking network access and API endpoint availability.

Links and References

Firecrawl official documentation (refer to your Firecrawl API docs for detailed endpoint descriptions)
n8n documentation on creating and using custom nodes with API integrations

FirecrawlInstall