Actions7
Overview
This node integrates with the Firecrawl API to scrape web pages and extract their content in various formats. It is useful for scenarios where you need to programmatically gather data from websites, such as content aggregation, monitoring changes on web pages, or extracting structured information for further processing.
For example, you can use this node to:
- Scrape the main article content of a news website in Markdown format.
- Extract all links from a product page to analyze outbound URLs.
- Capture screenshots of dynamic web pages after interacting with elements like buttons or forms.
- Retrieve raw HTML or JSON representations of a page's content for custom parsing.
The node supports advanced options like waiting for page load, emulating mobile devices, blocking ads, and performing actions (click, scroll, write) before scraping, enabling it to handle complex, dynamic web pages.
Properties
| Name | Meaning |
|---|---|
| Url | The URL of the web page to scrape. |
| Scrape Options | A collection of settings controlling how the scraping is performed: |
| - Formats | Output format(s) for the scraped data. Options include: HTML, JSON, Links, Markdown, Raw HTML, Screenshot. |
| - Only Main Content | Whether to return only the main content of the page, excluding headers, navigation bars, footers, etc. |
| - Include Tags | List of HTML tags to explicitly include in the output. |
| - Exclude Tags | List of HTML tags to exclude from the output. |
| - Headers | Custom HTTP headers to send with the request, specified as key-value pairs. |
| - Wait For (Ms) | Number of milliseconds to wait for the page to load before fetching content. |
| - Mobile | Whether to emulate scraping from a mobile device. |
| - Skip TLS Verification | Whether to skip TLS certificate verification when making requests. |
| - Timeout (Ms) | Request timeout in milliseconds. |
| - Actions | List of actions to interact with dynamic content before scraping. Supported action types: Click, Press, Screenshot, Scroll, Wait, Write. Each action has specific parameters like selector, text, key, direction, etc. |
| - Location | Settings for geolocation of the request, including country (ISO 3166-1 alpha-2 code) and preferred languages/locales. |
| - Remove Base64 Images | Whether to remove base64 encoded images from the output. |
| - Block Ads | Enables ad-blocking and cookie popup blocking during scraping. |
| - Proxy | Type of proxy to use for the request. Options are Basic or Stealth. |
| Use Custom Body | Whether to use a custom request body instead of the default scraping options. |
Output
The node outputs a JSON object containing the scraped content according to the selected formats. Depending on the chosen formats, the output may include:
- HTML: The cleaned and processed HTML content of the page.
- JSON: Structured data extracted from the page.
- Links: An array of URLs found on the page.
- Markdown: The main content converted into Markdown format.
- Raw HTML: The unprocessed raw HTML source of the page.
- Screenshot: Binary data representing a screenshot image of the page (if requested).
If screenshots are included, the binary data will be available in the node's binary output field, representing the captured image.
Dependencies
- Requires an active Firecrawl API key credential configured in n8n.
- The node sends requests to the Firecrawl API endpoint (default:
https://api.firecrawl.dev/v1). - No additional external dependencies are required beyond the API access.
Troubleshooting
- Timeouts: If the page takes too long to load, increase the "Timeout (Ms)" property or the "Wait For (Ms)" delay to allow more time for dynamic content to render.
- TLS Errors: If scraping HTTPS sites with invalid certificates fails, enable "Skip TLS Verification" to bypass certificate checks (use cautiously).
- Missing Content: If expected content is not returned, verify that "Only Main Content" is set appropriately and consider adjusting "Include Tags" or "Exclude Tags" to fine-tune the output.
- Dynamic Content Issues: For pages requiring interaction (e.g., clicking buttons), ensure appropriate "Actions" are configured to simulate user behavior before scraping.
- Ad Blocking Problems: If content is blocked or missing due to ads or popups, enable "Block Ads" to improve scraping results.
- Proxy Configuration: If requests fail due to network restrictions, try switching the "Proxy" type between Basic and Stealth modes.
Common error messages typically relate to network connectivity, invalid URLs, or authentication failures with the API key. Ensure the API key is valid and the URL is reachable.