Firecrawl

Get data from Firecrawl API

Actions7

Overview

The Firecrawl node is designed to scrape content from a specified URL using the Firecrawl API. This node is particularly useful for users who need to extract data from web pages, such as articles, product information, or any other content that can be accessed via a URL. Common scenarios include gathering data for research, monitoring website changes, or aggregating content for analysis. For example, a user might want to scrape product details from an e-commerce site to compare prices or gather reviews.

Properties

Name	Meaning
Url	The URL of the webpage to scrape. Default is `http://localhost:3002`.
Scrape Options	Options for customizing the scraping process, including output formats and content filters.
- Formats	Output format(s) for the scraped data (HTML, JSON, Links, Markdown, Raw HTML, Screenshot).
- Only Main Content	If true, only the main content of the page will be returned, excluding headers and footers.
- Include Tags	Specifies tags to include in the output.
- Exclude Tags	Specifies tags to exclude from the output.
- Headers	Custom headers to send with the request.
- Wait For (Ms)	Time to wait in milliseconds for the page to load before fetching content.
- Mobile	If true, emulates scraping from a mobile device.
- Skip TLS Verification	If true, skips TLS certificate verification when making requests.
- Timeout (Ms)	Maximum time in milliseconds to wait for the request to complete.
- Actions	List of actions to perform on dynamic content before scraping (e.g., click, scroll).
- Location	Settings for location-based scraping, including country and preferred languages.
- Remove Base64 Images	If true, removes base64 encoded images from the output.
- Block Ads	If true, enables ad-blocking and cookie popup blocking.
- Proxy	Type of proxy to use (Basic or Stealth).
Additional Fields	Allows sending additional custom fields in the request body.
Use Custom Body	If true, allows the use of a custom body for the request.

Output

The output of the Firecrawl node is structured in JSON format, containing the scraped content based on the specified options. The exact structure may vary depending on the selected output formats and included/excluded tags. If binary data is involved, it typically represents images or files retrieved during the scraping process.

Dependencies

Requires an API key credential for authentication with the Firecrawl API.
The base URL for the API can be configured, defaulting to http://localhost:3002/v1.

Troubleshooting

Common Issues:
- Users may encounter issues with invalid URLs leading to failed requests.
- Incorrectly configured headers or timeout settings may result in timeouts or errors.
Error Messages:
- "Invalid URL" - Ensure the URL is correctly formatted and accessible.
- "Request Timeout" - Increase the timeout setting if the target page takes longer to load.
- "403 Forbidden" - Check if the required API key is valid and has the necessary permissions.

Links and References

Firecrawl API Documentation (replace with actual link)
n8n Documentation for general usage and troubleshooting tips.