Firecrawl icon

Firecrawl

Firecrawl是一个LLM友好的网页爬虫系统

Overview

The node "Firecrawl" is designed to interact with the Firecrawl web crawling system, which leverages large language models (LLMs) to perform advanced web scraping tasks. Specifically, for the resource V1 and operation 查询批量获取任务状态 Batch/Scrape/{ID}, the node queries the status of batch or scrape tasks by their task IDs. This is useful in scenarios where users have submitted multiple web scraping jobs and want to monitor their progress or retrieve their current state.

Practical examples include:

  • Monitoring the progress of a batch scraping job that collects data from multiple web pages.
  • Checking if a previously started scraping task has completed or encountered errors.
  • Integrating task status checks into automated workflows to trigger subsequent actions based on completion.

Properties

Name Meaning
任务ID The unique identifier of the scraping task whose status is being queried. Required string.
返回格式 (For the "搜索网页" operation only) Specifies the desired formats for returned scraped data. Options include: Extract, HTML, Links, Markdown, 原始HTML (Raw HTML), 整个页面截图 (Full page screenshot), 网页截图 (Webpage screenshot). Multiple selections allowed.

Note: For the selected operation 查询批量获取任务状态 Batch/Scrape/{ID}, only the "任务ID" property is relevant and required.

Output

The node outputs JSON data representing the status and details of the requested batch scraping task(s). The exact structure depends on the Firecrawl API response but typically includes fields such as task progress, success/failure states, and possibly partial results or error messages.

If the operation involves returning scraped content (not the case here), the output may also contain binary data representing screenshots or other media.

Dependencies

  • Requires an API key credential for authenticating with the Firecrawl service.
  • The base URL for API requests is configured via credentials.
  • The node uses HTTP requests to communicate with the Firecrawl API endpoints.

Troubleshooting

  • Common issues:

    • Invalid or missing task ID will result in errors or empty responses.
    • Network or authentication failures due to incorrect API credentials or base URL.
    • Task not found if the provided task ID does not exist or has expired.
  • Error messages:

    • Authentication errors indicate invalid or missing API keys; verify credentials.
    • HTTP errors (e.g., 404) suggest the task ID is incorrect or the resource is unavailable.
    • Timeout or network errors require checking connectivity and API endpoint accessibility.

Links and References

  • Firecrawl official documentation (if available) for API details and usage guidelines.
  • n8n documentation on creating and using custom nodes with API integrations.

Discussion