FireCrawl icon

FireCrawl

FireCrawl API

Overview

The node integrates with the FireCrawl API to extract structured data from web pages. It is designed for scenarios where users want to scrape or crawl websites to gather specific information, such as product details, company info, or any custom data defined by the user. The node supports two extraction modes: a simple natural language prompt for quick and flexible data extraction, and a detailed JSON schema for precise control over the extracted data structure.

Common use cases include:

  • Extracting product names, prices, features, and customer ratings from e-commerce sites.
  • Gathering company information like mission statements or open-source status.
  • Crawling entire domains or sections using URL wildcards to collect comprehensive datasets.
  • Enabling web search to follow relevant links for additional context.

Example: Extract product details from an online store by specifying a URL pattern and describing the desired fields in a prompt or schema.

Properties

Name Meaning
URL The target URL to extract data from. Supports wildcards (e.g., example.com/*) to crawl multiple pages within a domain or section.
Extract Type Choose between "Simple Prompt (Recommended)" to describe extraction needs in natural language, or "JSON Schema" to define the exact data structure to extract.
Prompt A natural language description of what information to extract. Useful for flexible, human-readable instructions. Example: listing product name, price, features, ratings, and contact info.
Schema A JSON schema defining the exact structure and types of data to extract, including strings, numbers, booleans, and arrays. Enables precise and consistent data extraction.
Enable Web Search Boolean flag to allow the extraction process to follow relevant links on the page(s) to gather more context and related information beyond the initial URL.
Use Custom Body Boolean flag indicating whether to send a fully custom request body instead of using the standard parameters.
Custom Body A JSON object representing a fully customized request payload, including URL, prompt, schema, and web search option. Overrides other properties when enabled. Useful for advanced or non-standard extraction requests.

Output

The node outputs JSON data containing the extracted information according to the specified prompt or schema. The structure of the output matches the requested data format:

  • If using a prompt, the output will be a flexible JSON object reflecting the described fields.
  • If using a schema, the output strictly follows the defined JSON structure with typed fields (strings, numbers, booleans, arrays).
  • The output may include nested objects and arrays depending on the schema complexity.
  • Binary data output is not indicated in the source; the node focuses on JSON data extraction.

Dependencies

  • Requires access to the FireCrawl API service.
  • An API key credential must be configured in n8n to authenticate requests to FireCrawl.
  • Network access to the target URLs for crawling/extraction.
  • No other external dependencies are indicated.

Troubleshooting

  • Common issues:

    • Invalid or unreachable URLs can cause extraction failures.
    • Incorrectly formatted JSON schema may lead to errors or incomplete data extraction.
    • Using wildcards improperly might result in excessive crawling or timeouts.
    • API authentication errors if the API key is missing or invalid.
  • Error messages:

    • Authentication errors typically indicate missing or incorrect API credentials; verify and update the API key.
    • Schema validation errors suggest malformed JSON or unsupported types; check the schema syntax carefully.
    • Network errors may occur if the target site blocks crawlers or is down; ensure accessibility and permissions.
  • Resolutions:

    • Double-check URL formats and test them independently.
    • Validate JSON schemas with a JSON validator before use.
    • Limit wildcard usage to manageable scopes.
    • Confirm API credentials and network connectivity.

Links and References

Discussion