Firecrawl Tool
Overview
The Firecrawl Tool node enables advanced web data extraction by interfacing with the Firecrawl v2 API. Specifically, the Extract operation allows users to extract structured data from multiple webpages using AI-driven prompts and optional JSON schemas. This is useful when you need to pull specific fields or information from a set of URLs without manually parsing HTML.
Common scenarios include:
- Extracting product details (name, price, availability) from e-commerce pages.
- Gathering event information (date, location, description) from event listing sites.
- Pulling article metadata (title, author, publish date) from news websites.
For example, you can provide a list of URLs and a natural language prompt like "Extract the product name, price, and description from each page," optionally supplying a JSON schema to enforce structure. The node then returns the extracted data in a structured JSON format.
Properties
| Name | Meaning |
|---|---|
| URLs | Comma-separated list of webpage URLs to extract data from. Example: "https://example.com/page1,https://example.com/page2" |
| Extraction Prompt | Natural language instruction describing what data to extract from each page. Example: "Extract the product name, price, and description from each page" |
| Extract Options | Collection of additional options: • Schema: JSON schema defining expected data structure (e.g., {"productName": "string", "price": "number"}) • Allow External Links: Boolean flag to follow and extract data from external links found on the pages • Enable Web Search: Boolean flag to enable web search for additional context during extraction |
Output
The node outputs an array of JSON objects, each corresponding to one input URL. Each JSON object contains the extracted data fields as specified by the prompt and optional schema. The structure depends on the extraction results but generally matches the requested fields.
If enabled, the output may also include data gathered from external links or web searches to enrich the extraction.
No binary data output is produced by this operation.
Example output snippet (simplified):
[
{
"productName": "Example Product",
"price": 19.99,
"inStock": true,
"description": "A great product for your needs."
},
{
"productName": "Another Product",
"price": 29.99,
"inStock": false,
"description": "Currently out of stock."
}
]
Dependencies
- Requires an active Firecrawl API key credential configured in n8n.
- The node makes HTTP requests to the Firecrawl API endpoint (
https://api.firecrawl.devby default). - Internet access is required for the node to communicate with the Firecrawl service.
Troubleshooting
- Missing API Key Error: If the Firecrawl API key is not provided or invalid, the node will throw an error stating that the API key is required. Ensure the credential is properly set up.
- Invalid JSON Schema: Providing malformed JSON in the schema option will cause an error. Validate the JSON schema syntax before use.
- Timeouts or Slow Responses: Large numbers of URLs or complex extraction prompts may increase processing time. Consider limiting URLs or simplifying prompts.
- Extraction Errors: If the extraction fails for certain URLs, the node can be configured to continue on failure, returning error messages per item instead of stopping the workflow.
Links and References
- Firecrawl Documentation – Official docs for the Firecrawl API and usage guidelines.
- JSON Schema Specification – For defining structured extraction schemas.
- n8n Documentation – General info on creating and using custom nodes and credentials.