crawl4ai_naf

Advanced web crawling, data extraction, and interaction nodes for n8n with LLM capabilities

Package Information

Downloads: 0 weekly / 15 monthly
Latest Version: 1.4.0
Author: NAF

Documentation

Crawl4AI n8n Nodes

Advanced web crawling, data extraction, and interaction nodes for n8n with LLM capabilities.

Installation

npm install n8n-nodes-crawl4ai_naf

Features

Main Crawl4ai Node

  • Basic Crawling: Simple web page crawling with markdown/HTML extraction
  • CSS Extraction: Extract structured data using CSS selectors
  • LLM Extraction: Use LLM for complex data extraction
  • Batch Processing: Process multiple URLs concurrently
  • Anti-Detection: Undetected browser mode, stealth mode, CAPTCHA bypass

Crawl4ai Interaction Node

  • Element Interaction: Click buttons, fill forms, handle dropdowns
  • Authentication: Login form handling and session management
  • LLM Prompts: Automate interactions using natural language prompts
  • Multi-Step Workflows: Complex interaction sequences

Usage Examples

Basic Crawling

{
  "nodes": [
    {
      "parameters": {
        "operation": "basic_crawl",
        "urlConfig": {
          "urls": [
            {
              "url": "https://example.com"
            }
          ]
        },
        "browserConfig": {
          "settings": {
            "headless": true,
            "viewportWidth": 1920,
            "viewportHeight": 1080
          }
        }
      },
      "name": "Crawl4ai",
      "type": "n8n-nodes-crawl4ai_naf.crawl4ai",
      "typeVersion": 1,
      "position": [250, 300]
    }
  ]
}

Advanced Crawling with Authentication

{
  "nodes": [
    {
      "parameters": {
        "operation": "css_extraction",
        "urlConfig": {
          "urls": [
            {
              "url": "https://protected.example.com/dashboard"
            },
            {
              "url": "https://protected.example.com/reports"
            }
          ]
        },
        "browserConfig": {
          "settings": {
            "headless": true,
            "viewportWidth": 1920,
            "viewportHeight": 1080
          }
        },
        "antiDetection": {
          "settings": {
            "undetected": true,
            "stealth": true,
            "captchaBypass": "2captcha"
          }
        },
        "authConfig": {
          "authSettings": {
            "enableAuth": true,
            "authType": "form",
            "username": "your_username",
            "password": "your_password",
            "loginUrl": "https://protected.example.com/login"
          }
        },
        "advancedConfig": {
          "advancedSettings": {
            "maxRetries": 3,
            "timeout": 30000,
            "concurrentRequests": 2,
            "debugMode": true
          }
        }
      },
      "name": "Crawl4ai",
      "type": "n8n-nodes-crawl4ai_naf.crawl4ai",
      "typeVersion": 1,
      "position": [250, 300]
    }
  ]
}

LLM Extraction Example

{
  "nodes": [
    {
      "parameters": {
        "operation": "llm_extraction",
        "urlConfig": {
          "urls": [
            {
              "url": "https://complex-data.example.com"
            }
          ]
        },
        "browserConfig": {
          "settings": {
            "headless": true
          }
        }
      },
      "name": "Crawl4ai",
      "type": "n8n-nodes-crawl4ai_naf.crawl4ai",
      "typeVersion": 1,
      "position": [250, 300]
    }
  ]
}

LLM Prompt Interaction

{
  "nodes": [
    {
      "parameters": {
        "interactionType": "llm_prompt",
        "llmPromptConfig": {
          "promptSettings": {
            "promptText": "Find the login form, fill username with 'testuser' and password with 'testpass', then click the submit button",
            "provider": "openai/gpt-4",
            "maxTokens": 1000
          }
        }
      },
      "name": "Crawl4aiInteraction",
      "type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
      "typeVersion": 1,
      "position": [250, 300]
    }
  ]
}

Element Click Interaction

{
  "nodes": [
    {
      "parameters": {
        "interactionType": "element_click",
        "elementConfig": {
          "clickSettings": {
            "selector": "#submit-button",
            "waitAfterClick": 2000
          }
        }
      },
      "name": "Crawl4aiInteraction",
      "type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
      "typeVersion": 1,
      "position": [450, 300]
    }
  ]
}

Complete Workflow Example

{
  "nodes": [
    {
      "parameters": {
        "operation": "basic_crawl",
        "urlConfig": {
          "urls": [
            {
              "url": "https://example.com/login"
            }
          ]
        }
      },
      "name": "Crawl4ai",
      "type": "n8n-nodes-crawl4ai_naf.crawl4ai",
      "typeVersion": 1,
      "position": [250, 300]
    },
    {
      "parameters": {
        "interactionType": "authentication",
        "authConfig": {
          "authSettings": {
            "username": "user@example.com",
            "password": "password123",
            "loginUrl": "https://example.com/login"
          }
        }
      },
      "name": "Crawl4aiInteraction",
      "type": "n8n-nodes-crawl4ai_naf.crawl4aiInteraction",
      "typeVersion": 1,
      "position": [450, 300]
    },
    {
      "parameters": {
        "operation": "css_extraction",
        "urlConfig": {
          "urls": [
            {
              "url": "https://example.com/dashboard"
            }
          ]
        }
      },
      "name": "Crawl4ai2",
      "type": "n8n-nodes-crawl4ai_naf.crawl4ai",
      "typeVersion": 1,
      "position": [650, 300]
    }
  ],
  "connections": {
    "Crawl4ai": {
      "main": [
        [
          {
            "node": "Crawl4aiInteraction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Crawl4aiInteraction": {
      "main": [
        [
          {
            "node": "Crawl4ai2",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Configuration

Browser Configuration

  • Headless Mode: Run browser in headless mode (default: true)
  • Viewport: Set browser viewport dimensions (default: 1920x1080)
  • User Agent: Custom user agent string
  • Proxy Support: Configure proxy settings

Anti-Detection Settings

  • Undetected Mode: Enable undetected browser mode
  • Stealth Mode: Enable stealth mode with fingerprint masking
  • CAPTCHA Bypass: Configure CAPTCHA bypass strategies (2Captcha, Anti-Captcha, Custom)
  • Behavioral Simulation: Simulate human-like interactions

Authentication Options

  • Basic Auth: Username/password authentication
  • Form Auth: Form-based authentication with login URL
  • OAuth2: OAuth2 token-based authentication
  • API Key: API key authentication
  • Session Cookie: Session cookie authentication

Advanced Configuration

  • Max Retries: Maximum number of retry attempts (default: 3)
  • Timeout: Request timeout in milliseconds (default: 30000)
  • Concurrent Requests: Number of concurrent requests (default: 5)
  • Debug Mode: Enable debug logging (default: false)

Development

Prerequisites

  • Node.js 18+
  • npm 9+
  • TypeScript 5+

Building

npm install
npm run build

Testing

npm run test

Publishing

npm publish

Error Handling

Both nodes include comprehensive error handling and validation:

  • Input data validation
  • URL format validation
  • Configuration parameter validation
  • Authentication credential validation
  • Proper error messages and timestamps

Support

For issues, questions, or contributions, please contact: contact@nafer.ru

License

MIT

Discussion