Firecrawl icon

Firecrawl

Firecrawl是一个LLM友好的网页爬虫系统

Overview

The node "Firecrawl" is designed as a web crawling and scraping tool that leverages an LLM-friendly system to extract data from web pages. Specifically, for the Resource "V1" and Operation "获取站点地图 Map" (Get Sitemap), it performs a site-wide crawl starting from a given webpage URL, collecting multiple pages up to a specified limit. This operation is useful for scenarios where you want to gather a sitemap or overview of a website's structure and content.

Practical examples include:

  • Generating a sitemap for SEO analysis.
  • Collecting URLs and metadata from a website for content auditing.
  • Preparing datasets for machine learning by extracting structured information from multiple pages.

Properties

Name Meaning
网页链接 The starting webpage URL to begin crawling from.
最大页面数量 The maximum number of pages to crawl and retrieve during the sitemap generation task.

Note: The property "返回格式" (Return Format) is not applicable for this specific operation ("获取站点地图 Map") based on the provided display options.

Output

The output JSON field contains the results of the site crawl, which typically includes the collected pages' URLs and possibly their content or metadata depending on the API response. Since this operation focuses on generating a sitemap, the output likely represents a structured list or map of the site's pages up to the specified limit.

No binary data output is indicated for this operation.

Dependencies

  • Requires an API key credential for authentication with the Firecrawl service.
  • Needs the base URL of the Firecrawl API configured in the node credentials.
  • The node sends HTTP requests to the Firecrawl API endpoint corresponding to the V1 resource.

Troubleshooting

  • Common issues:

    • Invalid or missing API credentials will cause authentication failures.
    • Providing an invalid or unreachable URL in "网页链接" may result in errors or empty results.
    • Setting "最大页面数量" too high might lead to timeouts or rate limiting by the API.
  • Error messages:

    • Authentication errors indicate problems with the API key or credential setup.
    • Network errors suggest connectivity issues or incorrect base URL configuration.
    • Validation errors occur if required properties like "网页链接" or "最大页面数量" are missing or malformed.

Resolving these usually involves verifying credentials, checking URL correctness, and adjusting limits to reasonable values.

Links and References

Discussion