Actions7
Overview
This node integrates with the Firecrawl API to scrape web pages and extract their content in various formats. It is useful for automating data extraction from websites, including dynamic content interaction, PDF processing, and capturing screenshots. Typical use cases include monitoring website changes, extracting structured data, archiving page snapshots, or converting PDFs to markdown.
For example, you can provide a URL to scrape its main content as markdown, capture a screenshot of the page, or extract links and JSON data based on custom schemas or prompts. The node supports advanced options like emulating mobile devices, blocking ads, waiting for page load, and interacting with page elements before scraping.
Properties
| Name | Meaning |
|---|---|
| Url | The URL of the webpage to scrape. |
| Parsers | Controls PDF processing during scraping. Options: PDF (extracts PDF content as markdown with billing per page) or disabled (returns PDF as base64 with flat rate). |
| Scrape Options | Options controlling output formats and scraping behavior: - Formats: Output types such as markdown, HTML, JSON, links, raw HTML, screenshot, summary, change tracking. - Modes and schema for change tracking and JSON extraction. - Screenshot settings like full page, quality, viewport size. |
| Only Main Content | Whether to return only the main content of the page, excluding headers, navigation bars, footers, etc. |
| Include Tags | List of HTML tags to explicitly include in the output. |
| Exclude Tags | List of HTML tags to exclude from the output. |
| Headers | Custom HTTP headers to send with the request. |
| Wait For (Ms) | Milliseconds to wait after page load before fetching content, useful for dynamic pages. |
| Mobile | Emulate scraping from a mobile device. |
| Skip TLS Verification | Whether to skip TLS certificate verification when making requests. |
| Timeout (Ms) | Request timeout in milliseconds. |
| Actions | Sequence of actions to interact with the page before scraping, such as clicking elements, pressing keys, scrolling, taking screenshots, waiting, or writing text. |
| Location | Location settings for the request, including country code (e.g., US, AU, DE, JP) and preferred languages/locales. |
| Remove Base64 Images | Whether to remove base64 encoded images from the output. |
| Block Ads | Enables ad-blocking and cookie popup blocking during scraping. |
| Store In Cache | Whether to store the scraped page in Firecrawl's index and cache. Disable for privacy or sensitive data concerns. |
| Proxy | Type of proxy to use for the request. Options: Basic, Stealth. |
| Additional Fields | Custom JSON properties to add to the request body when using a custom request body. |
| Use Custom Body | Whether to use a fully custom request body instead of the standard parameters. |
Output
The node outputs JSON data representing the scraped content according to the selected formats. This may include:
- Extracted main content in markdown, HTML, or raw HTML.
- Structured JSON data extracted via custom schemas or prompts.
- Lists of links found on the page.
- Change tracking information showing differences between scrapes.
- Screenshots as image data (binary).
- Summaries of the page content.
If PDF parsing is enabled, the content is converted to markdown; otherwise, PDFs are returned as base64 encoded files.
Binary data output (such as screenshots) is provided in the binary field of the output item.
Dependencies
- Requires an API key credential for authenticating with the Firecrawl API.
- Network access to the Firecrawl API endpoint (default https://api.firecrawl.dev/v2).
- Optional proxy configuration depending on user selection.
- No other external dependencies are required.
Troubleshooting
- Timeouts: If scraping large or slow-loading pages, increase the "Timeout (Ms)" or "Wait For (Ms)" properties.
- TLS Errors: Enable "Skip TLS Verification" if encountering SSL certificate issues.
- Dynamic Content Not Loaded: Use "Actions" to interact with the page (e.g., click buttons, scroll) before scraping.
- Incorrect Content Extraction: Adjust "Include Tags" and "Exclude Tags" to fine-tune which parts of the page are included.
- PDF Parsing Issues: Ensure the "Parsers" option includes PDF if you want markdown extraction; otherwise, PDFs will be base64 encoded.
- Ad Blocking Not Working: Verify "Block Ads" is enabled to reduce interference from ads and popups.
- Proxy Problems: Switch between "Basic" and "Stealth" proxy types if requests fail due to network restrictions.
Common error messages typically relate to invalid URLs, authentication failures, or request timeouts. Check credentials and network connectivity first.
Links and References
- Firecrawl API Documentation: https://firecrawl.dev/docs/api
- MDN Web Docs on Accept-Language header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language
- n8n Documentation on HTTP Request Node (for understanding headers and proxies): https://docs.n8n.io/nodes/n8n-nodes-base.httpRequest/