Firecrawl

Get data from Firecrawl API

Actions7

Overview

This node integrates with the Firecrawl API to scrape web pages and extract their content in various formats. It is useful for automating data extraction from websites, including dynamic content interaction, PDF processing, and capturing screenshots. Typical use cases include monitoring website changes, extracting structured data, archiving page snapshots, or converting PDFs to markdown.

For example, you can provide a URL to scrape its main content as markdown, capture a screenshot of the page, or extract links and JSON data based on custom schemas or prompts. The node supports advanced options like emulating mobile devices, blocking ads, waiting for page load, and interacting with page elements before scraping.

Properties

Name	Meaning
Url	The URL of the webpage to scrape.
Parsers	Controls PDF processing during scraping. Options: `PDF` (extracts PDF content as markdown with billing per page) or disabled (returns PDF as base64 with flat rate).
Scrape Options	Options controlling output formats and scraping behavior: - Formats: Output types such as markdown, HTML, JSON, links, raw HTML, screenshot, summary, change tracking. - Modes and schema for change tracking and JSON extraction. - Screenshot settings like full page, quality, viewport size.
Only Main Content	Whether to return only the main content of the page, excluding headers, navigation bars, footers, etc.
Include Tags	List of HTML tags to explicitly include in the output.
Exclude Tags	List of HTML tags to exclude from the output.
Headers	Custom HTTP headers to send with the request.
Wait For (Ms)	Milliseconds to wait after page load before fetching content, useful for dynamic pages.
Mobile	Emulate scraping from a mobile device.
Skip TLS Verification	Whether to skip TLS certificate verification when making requests.
Timeout (Ms)	Request timeout in milliseconds.
Actions	Sequence of actions to interact with the page before scraping, such as clicking elements, pressing keys, scrolling, taking screenshots, waiting, or writing text.
Location	Location settings for the request, including country code (e.g., US, AU, DE, JP) and preferred languages/locales.
Remove Base64 Images	Whether to remove base64 encoded images from the output.
Block Ads	Enables ad-blocking and cookie popup blocking during scraping.
Store In Cache	Whether to store the scraped page in Firecrawl's index and cache. Disable for privacy or sensitive data concerns.
Proxy	Type of proxy to use for the request. Options: `Basic`, `Stealth`.
Additional Fields	Custom JSON properties to add to the request body when using a custom request body.
Use Custom Body	Whether to use a fully custom request body instead of the standard parameters.

Output

The node outputs JSON data representing the scraped content according to the selected formats. This may include:

Extracted main content in markdown, HTML, or raw HTML.
Structured JSON data extracted via custom schemas or prompts.
Lists of links found on the page.
Change tracking information showing differences between scrapes.
Screenshots as image data (binary).
Summaries of the page content.

If PDF parsing is enabled, the content is converted to markdown; otherwise, PDFs are returned as base64 encoded files.

Binary data output (such as screenshots) is provided in the binary field of the output item.

Dependencies

Requires an API key credential for authenticating with the Firecrawl API.
Network access to the Firecrawl API endpoint (default https://api.firecrawl.dev/v2).
Optional proxy configuration depending on user selection.
No other external dependencies are required.

Troubleshooting

Timeouts: If scraping large or slow-loading pages, increase the "Timeout (Ms)" or "Wait For (Ms)" properties.
TLS Errors: Enable "Skip TLS Verification" if encountering SSL certificate issues.
Dynamic Content Not Loaded: Use "Actions" to interact with the page (e.g., click buttons, scroll) before scraping.
Incorrect Content Extraction: Adjust "Include Tags" and "Exclude Tags" to fine-tune which parts of the page are included.
PDF Parsing Issues: Ensure the "Parsers" option includes PDF if you want markdown extraction; otherwise, PDFs will be base64 encoded.
Ad Blocking Not Working: Verify "Block Ads" is enabled to reduce interference from ads and popups.
Proxy Problems: Switch between "Basic" and "Stealth" proxy types if requests fail due to network restrictions.

Common error messages typically relate to invalid URLs, authentication failures, or request timeouts. Check credentials and network connectivity first.

Links and References

Firecrawl API Documentation: https://firecrawl.dev/docs/api
MDN Web Docs on Accept-Language header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language
n8n Documentation on HTTP Request Node (for understanding headers and proxies): https://docs.n8n.io/nodes/n8n-nodes-base.httpRequest/

FirecrawlInstall