FireCrawl icon

FireCrawl

FireCrawl API

Overview

This node integrates with the FireCrawl API to submit web crawling jobs that notify a specified webhook URL upon completion or during progress. It is designed to automate the process of crawling web pages, extracting structured data, and receiving asynchronous updates via webhooks.

Common scenarios where this node is beneficial include:

  • Monitoring website content changes by crawling URLs regularly.
  • Extracting structured data from web pages for further processing or analysis.
  • Automating workflows that depend on web-scraped data delivered asynchronously.
  • Integrating web crawling results into other systems without polling, using webhook callbacks.

For example, you can configure the node to crawl a product page URL, limit the number of results returned, exclude certain paths from crawling, specify output formats like Markdown or HTML, and receive the crawl results via a webhook URL you provide.

Properties

Name Meaning
Url The URL to crawl. This is the starting point for the web crawler.
Limit Maximum number of results to return from the crawl. Must be at least 1.
Webhook The URL where webhook events will be sent. This allows asynchronous notification of crawl job status and results.
Exclude Paths A list of URL paths to exclude from the crawl. Useful to avoid crawling irrelevant or unwanted sections of the site.
Scrape Options Options controlling how the scraped data is formatted and extracted. Includes:
• Formats: Output format(s) such as Markdown, HTML, or Extract (structured extraction).
• Extract: Schema, system prompt, and prompt for structured data extraction.
Use Custom Body Boolean flag indicating whether to send a fully custom JSON body instead of the standard parameters.
Custom Body A JSON object representing a fully custom request body to send to the API. Used only if "Use Custom Body" is true.

Output

The node outputs JSON data representing the response from the FireCrawl API after submitting the crawl job. This typically includes information about the submitted job, such as job ID, status, and any immediate metadata returned by the API.

Since the crawl results are delivered asynchronously via the webhook URL provided, the node itself does not output the crawl results directly in this operation.

No binary data output is indicated for this operation.

Dependencies

  • Requires an API key credential for authenticating with the FireCrawl API.
  • The base URL for the FireCrawl API must be configured in the node credentials.
  • The webhook URL provided must be accessible and able to receive HTTP POST requests from FireCrawl.

Troubleshooting

  • Invalid URL or unreachable target: Ensure the URL to crawl is valid and publicly accessible. The API may reject invalid URLs or fail silently if the target cannot be reached.
  • Webhook delivery failures: If webhook events are not received, verify that the webhook URL is correct, publicly reachable, and properly handles incoming POST requests.
  • Limit value errors: The limit must be a positive integer (minimum 1). Providing zero or negative values may cause errors.
  • Malformed custom body: When using a custom JSON body, ensure it is valid JSON and matches the expected schema of the API. Invalid JSON will cause request failures.
  • Missing or invalid API credentials: The node requires a valid API key credential. Errors related to authentication indicate misconfiguration of credentials.
  • Exclude paths format: The exclude paths must be provided as an array of strings. Incorrect formatting may lead to unexpected crawl behavior.

Links and References


Note: All property names and descriptions are based on static analysis of the provided source code and property definitions.

Discussion