Firecrawl icon

Firecrawl

Firecrawl是一个LLM友好的网页爬虫系统

Overview

The Firecrawl node provides web scraping and search capabilities powered by a friendly LLM-based web crawler system. Specifically, the V1 resource with the "搜索网页" (Search Webpage) operation allows users to perform customized web searches by specifying keywords and various search parameters. This node is useful for scenarios where you want to programmatically gather search results from the web, extract content in different formats, or capture webpage screenshots.

Practical examples include:

  • Automating market research by searching for product reviews or news articles.
  • Gathering data for SEO analysis by extracting links or HTML snippets from search results.
  • Archiving webpages by capturing full-page screenshots or raw HTML content.

Properties

Name Meaning
搜索关键字 The search keyword or query string to look up on the web.
搜索页面最大数量 Maximum number of search result pages to retrieve.
返回格式 Output formats to return; options include: Extract (parsed content), HTML, Links, Markdown, 原始HTML (raw HTML), 整个页面截图 (full page screenshot), 网页截图 (webpage screenshot).
选项 Additional optional parameters:
- TBS Time-based search parameter to filter results by time.
- 语言代码 Language code for the search, e.g., "zh" for Chinese, "en" for English.
- 国家代码 Country code to localize search results, e.g., "cn" for China, "us" for United States.
- 搜索位置 Location parameter influencing search result relevance.
- 超时时间 Timeout duration in milliseconds for the search request.

Output

The node outputs JSON data containing the search results according to the requested formats. The structure varies depending on the selected output formats but generally includes:

  • Extracted textual content parsed from the search results.
  • HTML snippets or raw HTML of the pages.
  • Lists of links found within the search results.
  • Markdown-formatted content.
  • Base64-encoded images representing screenshots of the entire page or visible webpage area.

If screenshot formats are selected, binary data representing the image(s) will be included accordingly.

Dependencies

  • Requires an API key credential for the Firecrawl service.
  • Needs configuration of the base URL for the Firecrawl API endpoint.
  • Network access to the Firecrawl API service.

Troubleshooting

  • Timeouts: If searches take too long or fail, consider increasing the "超时时间" (timeout) property.
  • Empty Results: Ensure the search keyword is valid and that language/country/location parameters match expected values.
  • Invalid Format Errors: Verify that the selected output formats are supported and correctly spelled.
  • Authentication Failures: Confirm that the API key credential is correctly configured and has necessary permissions.
  • Network Issues: Check connectivity to the Firecrawl API base URL.

Links and References

  • Firecrawl official documentation (if available)
  • Web scraping best practices and legal considerations
  • Language and country codes reference lists (ISO standards)

Discussion