Package Information
Released: 8/24/2025
Downloads: 3 weekly / 27 monthly
Latest Version: 1.8.6
Author: Heictor Hsiao
Documentation
Crawl4AI for n8n (Maintained Fork)
Note: This is a maintained fork of the original n8n-nodes-crawl4ai project by Heictor Hsiao. The original author is no longer maintaining the project, so this fork continues development and bug fixes. All original credits remain with the original author.
This project provides n8n integration for Crawl4AI, a powerful web crawling and data extraction tool. It consists of two main nodes:
- Crawl4AI: Basic Crawler - For general web crawling and content extraction
- Crawl4AI: Content Extractor - For extracting structured data using CSS selectors, LLM, or JSON
Features
Basic Crawler Node
- Crawl Single URL - Extract content from a single web page
- Crawl Multiple URLs - Process multiple web pages in one operation
- Process Raw HTML - Extract content from raw HTML without crawling
Content Extractor Node
- CSS Selector Extractor - Extract structured data using CSS selectors
- LLM Extractor - Use AI to extract structured data from webpages
- JSON Extractor - Extract and process JSON data from web pages
Installation
- Clone this repository into your n8n custom nodes directory
- Run
npm installto install dependencies - Restart your n8n instance
Usage
Setting up credentials
Before using the nodes, you need to set up Crawl4AI API credentials:
- Go to Settings > Credentials > New
- Select Crawl4AI API
- Configure connection settings:
- Connection Mode: Direct or Docker
- Authentication: Configure as needed
- LLM Settings: Enable and configure if needed for AI extraction
Basic Crawler Usage
The Basic Crawler node allows you to crawl web pages and extract their content:
- Add the "Crawl4AI: Basic Crawler" node to your workflow
- Select an operation (Crawl Single URL, Crawl Multiple URLs, or Process Raw HTML)
- Configure the required parameters
- Run the workflow to extract content
Content Extractor Usage
The Content Extractor node allows you to extract structured data from web pages:
- Add the "Crawl4AI: Content Extractor" node to your workflow
- Select an extraction method (CSS Selector, LLM, or JSON)
- Configure the extraction parameters
- Run the workflow to extract structured data
Configuration Options
Browser Options
- Headless Mode: Run browser in headless mode
- Enable JavaScript: Enable JavaScript execution
- Viewport Size: Set browser viewport dimensions
- Timeout: Maximum time to wait for page load
- User Agent: Override browser user agent
Crawler Options
- Cache Mode: Control caching behavior
- JavaScript Code: Execute custom JS on the page
- CSS Selector: Focus crawling on specific elements
- Excluded Tags: Skip specific HTML tags
- Check Robots.txt: Respect robots.txt rules
- Word Count Threshold: Filter content by word count
LLM Extraction Options
- Extraction Instructions: Instructions for the AI
- Schema Fields: Define structured data schema
- LLM Provider: Choose AI model provider
- Temperature: Control randomness of AI responses
Project Structure
nodes/
├── Crawl4aiBasicCrawler/
│ ├── Crawl4aiBasicCrawler.node.ts # Main node file
│ ├── crawl4ai.svg # Icon
│ ├── actions/
│ │ ├── operations.ts # Operations definition
│ │ ├── router.ts # Router handler
│ │ ├── crawlSingleUrl.operation.ts # Single URL crawl operation
│ │ ├── crawlMultipleUrls.operation.ts # Multiple URL crawl operation
│ │ └── processRawHtml.operation.ts # Raw HTML processing operation
│ └── helpers/
│ ├── interfaces.ts # Interface definitions
│ ├── utils.ts # Common utilities
│ ├── apiClient.ts # API client
│ └── formatters.ts # Formatting tools
│
└── Crawl4aiContentExtractor/
├── Crawl4aiContentExtractor.node.ts # Main node file
├── crawl4ai.svg # Icon
├── actions/
│ ├── operations.ts # Operations definition
│ ├── router.ts # Router handler
│ ├── cssExtractor.operation.ts # CSS selector extraction operation
│ ├── llmExtractor.operation.ts # LLM extraction operation
│ └── jsonExtractor.operation.ts # JSON extraction operation
└── helpers/
├── interfaces.ts # Interface definitions
├── utils.ts # Common utilities
├── apiClient.ts # API client
└── formatters.ts # Formatting tools
credentials/
└── Crawl4aiApi.credentials.ts # Credentials definition
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.