FireCrawl icon

FireCrawl

FireCrawl API

Overview

The FireCrawl node for n8n enables automated web scraping of a specified URL, with options to extract content in various formats (Markdown, HTML, or structured data) and interact with dynamic web pages before scraping. This is particularly useful for scenarios such as:

  • Extracting readable content from articles or blogs.
  • Gathering structured data from web pages using custom extraction schemas.
  • Interacting with JavaScript-heavy sites (e.g., clicking buttons, waiting for elements) before scraping.
  • Automating data collection workflows for research, monitoring, or reporting.

Example use cases:

  • Scraping product details from e-commerce sites after simulating user interactions.
  • Collecting news articles in Markdown format for further processing.
  • Extracting specific fields from a webpage using a custom schema.

Properties

Below are the supported input properties for the "Scrape A Url And Get Its Content" operation:

Display Name Type Description
Url String The URL to scrape.
Formats Multi-Options Output format(s) for the scraped data. Options: Markdown, Html, Extract.
Extract Fixed Collection Structured data extraction settings. Includes:
- Schema: Extraction schema.
- Systemprompt: System prompt for extraction.
- Prompt: Extraction prompt without schema.
(Visible only if "Formats" includes "Extract")
Actions Fixed Collection (multiple) List of actions to perform on the page before scraping. Each action can include:
- Type: Action type (Wait, Click, Write, Press, Screenshot).
- Selector: CSS selector for click/write.
- Milliseconds: Wait time for "wait".
- Text: Text for "write".
- Key: Key for "press".
Use Custom Body Boolean Whether to use a custom JSON body for the request.
Custom Body JSON Custom body to send. Allows full control over the request payload. (Visible only if "Use Custom Body" is enabled)

Output

The node outputs a json object containing the results of the scraping operation. The structure depends on the selected formats and extraction options, but typically includes:

  • markdown: If "Markdown" format is selected, contains the page content in Markdown.
  • html: If "Html" format is selected, contains the raw HTML of the page.
  • extract: If "Extract" format is selected, contains structured data based on the provided schema and prompts.
  • actions: May include information about actions performed prior to scraping.

Example output (when all formats are selected):

{
  "markdown": "...",
  "html": "<html>...</html>",
  "extract": {
    "field1": "value1",
    "field2": "value2"
  }
}

If binary data is produced (e.g., screenshots), it will be included as binary output, representing images or other files captured during the scraping process.


Dependencies

  • External Service: Requires access to the FireCrawl API.
  • API Credentials: You must configure the "FireCrawl API" credentials in n8n, including the base URL and any required authentication tokens.
  • n8n Configuration: Ensure that the FireCrawl node is properly installed and that your n8n instance can reach the FireCrawl API endpoint.

Troubleshooting

Common Issues:

  • Invalid URL: If the provided URL is malformed or unreachable, the node may return an error indicating a failed request.
  • Missing Credentials: If the FireCrawl API credentials are not set up, the node will fail to authenticate.
  • Incorrect Selectors/Actions: If you specify invalid CSS selectors or unsupported actions, the scraping may not behave as expected or could fail.
  • Extraction Errors: If the extraction schema or prompts are incorrect, the "extract" output may be empty or incomplete.

Common Error Messages & Resolutions:

  • "401 Unauthorized": Check your FireCrawl API credentials in n8n.
  • "404 Not Found": Verify the target URL is correct and accessible.
  • "Invalid selector": Double-check the CSS selectors used in actions.
  • "Extraction failed": Review your extraction schema and prompts for correctness.

Links and References


Discussion