Firecrawl

Get data from Firecrawl API

Actions6

Overview

The node "Firecrawl" is designed to scrape web pages by fetching and extracting content from a specified URL using the Firecrawl API. It allows users to customize how the scraping is performed, including selecting output formats, filtering HTML tags, setting request headers, simulating mobile devices, handling dynamic page interactions, and more.

This node is beneficial in scenarios such as:

Extracting main content or specific parts of a webpage for data analysis.
Collecting links or structured data (JSON) from websites.
Taking screenshots of webpages programmatically.
Automating interaction with dynamic content before scraping (e.g., clicking buttons, scrolling).
Bypassing ads and cookie popups during scraping.

Practical examples:

Scraping blog posts in Markdown format excluding navigation bars and footers.
Gathering all links from a news website homepage.
Capturing a screenshot of a product page on an e-commerce site.
Extracting JSON data embedded in a webpage after interacting with a dropdown menu.

Properties

Name	Meaning
Url	The URL of the webpage to scrape.
Scrape Options	A collection of options controlling the scraping behavior:
- Formats	Output format(s) for the scraped data. Options include: HTML, JSON, Links, Markdown, Raw HTML, Screenshot.
- Only Main Content	Whether to return only the main content of the page, excluding headers, navigation bars, footers, etc.
- Include Tags	List of HTML tags to explicitly include in the output.
- Exclude Tags	List of HTML tags to exclude from the output.
- Headers	Custom HTTP headers to send with the scraping request, specified as key-value pairs.
- Wait For (Ms)	Number of milliseconds to wait for the page to load before fetching content.
- Mobile	Whether to emulate scraping from a mobile device.
- Skip TLS Verification	Whether to skip TLS certificate verification when making requests.
- Timeout (Ms)	Timeout duration in milliseconds for the scraping request.
- Actions	List of actions to interact with dynamic content before scraping. Actions can be click, press key, take screenshot, scroll, wait, or write text, each with relevant parameters like selector, text, direction, etc.
- Location	Settings for geolocation of the request, including country (ISO 3166-1 alpha-2 code) and preferred languages/locales.
- Remove Base64 Images	Whether to remove base64 encoded images from the output.
- Block Ads	Enables ad-blocking and cookie popup blocking during scraping.
- Proxy	Type of proxy to use for the request. Options are Basic or Stealth.
Use Custom Body	Whether to use a custom request body instead of the default scraping options.

Output

The node outputs a JSON object containing the scraped content according to the selected formats and options. The structure varies depending on the chosen output formats but generally includes:

Extracted content in HTML, Markdown, or raw HTML form.
JSON data if requested.
An array of links if the "Links" format is selected.
Screenshot data if the "Screenshot" format is selected (likely as binary or base64-encoded image data).

If binary data (such as screenshots) is included, it represents visual captures of the webpage either as full-page or viewport-sized images based on user settings.

Dependencies

Requires access to the Firecrawl API endpoint at https://api.firecrawl.dev/v1.
Needs an API authentication token credential configured in n8n to authorize requests to the Firecrawl service.
No other external dependencies are indicated.

Troubleshooting

Timeouts: If the request times out, consider increasing the "Timeout (Ms)" property or checking network connectivity.
TLS Errors: If TLS certificate errors occur, enabling "Skip TLS Verification" may help but should be used cautiously.
Incorrect Content Extraction: Adjust "Only Main Content", "Include Tags", and "Exclude Tags" to fine-tune what parts of the page are scraped.
Dynamic Content Not Loaded: Use "Actions" to interact with the page (e.g., clicking buttons, waiting) before scraping to ensure dynamic content is loaded.
Proxy Issues: If scraping fails due to IP restrictions, try switching between "Basic" and "Stealth" proxy types.
Invalid URL: Ensure the URL is correctly formatted and accessible.
API Authentication Failures: Verify that the API key credential is correctly set up and has necessary permissions.

Links and References

Firecrawl API Documentation: https://firecrawl.dev/docs
MDN Web Docs on Accept-Language Header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language