Actions9
Overview
This node, named "Firecrawl," is designed to scrape and extract content from web pages using a large language model (LLM)-friendly web crawling system. Specifically, the "获取单个网页" ("Get Single Webpage") operation under the "V0" resource fetches a single webpage's content with various customizable options.
Common scenarios for this node include:
- Extracting the main textual content of an article or blog post.
- Capturing screenshots of webpages for visual records.
- Retrieving HTML or raw HTML for further processing or analysis.
- Filtering specific HTML tags to include or exclude certain parts of the page.
- Adding custom HTTP headers for authenticated or specialized requests.
- Waiting for dynamic content to load before scraping.
Practical examples:
- Automatically fetching news articles' main content without ads or sidebars.
- Taking full-page screenshots of product pages for monitoring changes.
- Collecting all links from a webpage for link analysis.
- Scraping markdown-formatted content for integration into documentation systems.
Properties
| Name | Meaning |
|---|---|
| 网页链接 (v0Url) | The URL of the webpage to fetch. This is required. |
| 仅获取网页主要内容 (v0OnlyMainContent) | Whether to extract only the main content of the webpage (true/false). Defaults to true. |
| 返回HTML (v0IncludeHtml) | Whether to include the processed HTML content in the response (true/false). Defaults to false. |
| 返回原始HTML (v0IncludeRawHtml) | Whether to include the raw HTML source of the webpage in the response (true/false). Defaults to false. |
| 截图 (v0Screenshot) | Whether to capture a screenshot of the visible part of the webpage (true/false). Defaults to false. |
| 整体页面截图 (v0FullPageScreenshot) | Whether to capture a full-page screenshot (entire scrollable area) of the webpage (true/false). Defaults to false. |
| 等待时间(毫秒) (v0WaitFor) | Time in milliseconds to wait for the webpage to render before scraping. Useful for pages with dynamic content. Defaults to 0. |
| HTTP Headers (v0Headers) | Custom HTTP headers to send with the request. Users can add multiple key-value pairs. Useful for authentication or setting user-agent strings. |
| Tag标签白名单 (v0OnlyIncludeTags) | List of HTML tag names to whitelist/include in the extracted content. Only these tags will be included if specified. |
| 排除的Tag标签 (v0RemoveTags) | List of HTML tag names to exclude/remove from the extracted content. |
| 超时时间(毫秒) (v0Timeout) | Timeout in milliseconds to wait for the webpage rendering and scraping process before aborting. Defaults to 60,000 ms (60 seconds). |
Output
The node outputs JSON data containing the scraped webpage content according to the selected options. The output structure may include:
- Main content text extracted from the webpage.
- Processed HTML content if requested.
- Raw HTML source if requested.
- Screenshots as binary data (image files), either visible viewport or full page.
- Lists of links or other extracted elements depending on options.
If screenshots are enabled, the node outputs binary image data representing the captured webpage screenshot(s).
Dependencies
- Requires an API key credential for the Firecrawl service.
- The node sends requests to the Firecrawl API base URL configured in credentials.
- No other external dependencies are indicated.
Troubleshooting
- Timeouts: If the webpage takes too long to load, increase the "超时时间" (timeout) or "等待时间" (waitFor) properties.
- Empty content: Ensure the URL is correct and accessible. Check if the page requires authentication or special headers; use the HTTP Headers property accordingly.
- Screenshots not generated: Verify that the screenshot options are enabled and that the API supports capturing screenshots for the target page.
- Invalid tags in whitelist/blacklist: Use valid HTML tag names; incorrect tags may cause unexpected results.
- API errors: Confirm that the API key credential is valid and has sufficient permissions.
Links and References
- Firecrawl official website or API documentation (not provided in source).
- General web scraping best practices.
- n8n documentation on creating and using custom nodes.