n8ntools-web-scraper

N8N Tools - Web Scraper: Extract data from websites with AI-powered content recognition and anti-bot detection

Package Information

Released: 8/22/2025
Downloads: 199 weekly / 6,328 monthly
Latest Version: 4.4.1
Author: paulobraga.bots

Documentation

N8N Tools - Web Scraper

npm version
npm downloads
License: MIT

Extract data from websites with AI-powered content recognition and anti-bot detection bypass. This N8N community node provides intelligent web scraping capabilities through the N8N Tools platform.

✨ Features

  • 🕷️ Smart Scraping: AI-powered content recognition and extraction
  • 🔄 Multiple Operations: Single page, multiple pages, and monitoring
  • 🎯 CSS Selectors: Flexible data extraction with attribute support
  • 🤖 JavaScript Support: Handle dynamic content and SPAs
  • 📸 Screenshots: Optional page screenshots for verification
  • 🛡️ Anti-Bot Protection: Built-in detection bypass mechanisms
  • 💰 Cost Tracking: Usage monitoring and budget controls

🚀 Quick Start

Installation

Install this node in your N8N instance:

Via Community Nodes (Recommended)

  1. Go to Settings > Community Nodes in your N8N interface
  2. Click Install a community node
  3. Enter n8n-nodes-n8ntools-web-scraper
  4. Click Install

Via npm

npm install n8n-nodes-n8ntools-web-scraper

Setup Credentials

  1. Sign up at N8N Tools and get your API key
  2. In N8N, create new N8N Tools API credentials
  3. Enter your API URL: https://api.n8ntools.io
  4. Enter your API key

📖 Usage

Supported Operations

Operation Description Use Case
Scrape Single Page Extract data from one webpage Product details, contact info
Scrape Multiple Pages Batch process multiple URLs Catalog scraping, bulk data
Monitor Page Changes Track website changes Price monitoring, content updates

Example Workflow

[Schedule Trigger] → [N8N Tools Web Scraper] → [Process Data] → [Database]

Configuration Example

E-commerce Product Scraping:

{
  "operation": "scrapePage",
  "url": "https://example-store.com/products/laptop",
  "selectors": [
    {
      "name": "title",
      "selector": "h1.product-title",
      "attribute": "text"
    },
    {
      "name": "price",
      "selector": ".price-current",
      "attribute": "text"
    },
    {
      "name": "images",
      "selector": ".product-gallery img",
      "attribute": "src",
      "multiple": true
    },
    {
      "name": "availability",
      "selector": ".stock-status",
      "attribute": "text"
    }
  ],
  "options": {
    "waitForSelector": ".price-current",
    "waitTime": 3,
    "screenshot": true
  }
}

⚙️ Node Parameters

URL Configuration

  • URL: Target webpage URL (for single page and monitoring)
  • URLs: Multiple URLs, one per line (for batch processing)

Selector Configuration

  • Name: Field name in the output
  • CSS Selector: CSS selector to target elements
  • Attribute: Element attribute to extract (text, href, src, title, etc.)
  • Multiple: Extract multiple elements (returns array)

Advanced Options

  • Wait for Selector: CSS selector to wait for before scraping
  • Wait Time: Seconds to wait before extraction (default: 5)
  • User Agent: Custom user agent string
  • Enable JavaScript: Execute JavaScript on page (default: true)
  • Screenshot: Capture page screenshot (default: false)
  • Follow Redirects: Handle HTTP redirects (default: true)

📤 Output Data

Single Page Result

{
  "url": "https://example-store.com/products/laptop",
  "title": "Gaming Laptop Pro 15\"",
  "price": "$1,299.99",
  "images": [
    "https://example-store.com/img/laptop-1.jpg",
    "https://example-store.com/img/laptop-2.jpg"
  ],
  "availability": "In Stock",
  "success": true,
  "operation": "scrapePage",
  "creditsUsed": 1,
  "creditsRemaining": 99,
  "timestamp": "2024-01-15T10:30:00Z"
}

Multiple Pages Result

Returns array with one object per URL processed.

🔧 Selector Guide

Basic Selectors

// Text content
{ "selector": "h1", "attribute": "text" }

// Links
{ "selector": "a.product-link", "attribute": "href" }

// Images
{ "selector": "img.thumbnail", "attribute": "src" }

// Data attributes
{ "selector": "[data-price]", "attribute": "data-price" }

Advanced Selectors

// Multiple items
{
  "selector": ".product-item",
  "attribute": "text",
  "multiple": true
}

// Nested selection
{
  "selector": ".product-card .title",
  "attribute": "text"
}

// Attribute extraction
{
  "selector": "meta[property='og:image']",
  "attribute": "content"
}

🤖 JavaScript Support

Handle dynamic content and single-page applications:

{
  "options": {
    "enableJavaScript": true,
    "waitForSelector": ".dynamic-content",
    "waitTime": 5
  }
}

Perfect for:

  • React/Vue/Angular applications
  • AJAX-loaded content
  • Lazy-loaded images
  • Dynamic pricing

📸 Screenshot Feature

Capture page screenshots for verification:

{
  "options": {
    "screenshot": true
  }
}

Screenshots are returned as base64-encoded PNG images in the response.

🛡️ Anti-Bot Features

Built-in protection against common anti-bot measures:

  • Rotating User Agents: Automatic user agent rotation
  • Request Delays: Human-like request timing
  • Header Spoofing: Realistic browser headers
  • Proxy Support: Optional proxy rotation (premium feature)

💸 Pricing & Limits

  • Single Page: 1 credit per page
  • Multiple Pages: 1 credit per URL
  • Page Monitoring: 1 credit per check
  • Screenshot: +0.5 credits when enabled
  • Rate Limit: Based on your N8N Tools subscription

🔄 Monitoring Workflows

Price Monitoring Example

[Cron Trigger: Daily] → [Web Scraper] → [Compare Previous] → [Send Alert]

Content Change Detection

[Schedule: Hourly] → [Web Scraper] → [Hash Content] → [Detect Changes] → [Notify]

🛠️ Advanced Use Cases

Product Catalog Scraping

// Scrape product listings
{
  "operation": "scrapePage",
  "url": "https://store.com/category/laptops",
  "selectors": [
    {
      "name": "products",
      "selector": ".product-item a",
      "attribute": "href",
      "multiple": true
    }
  ]
}

Lead Generation

// Extract contact information
{
  "selectors": [
    { "name": "email", "selector": "a[href^='mailto:']", "attribute": "href" },
    { "name": "phone", "selector": ".contact-phone", "attribute": "text" },
    { "name": "address", "selector": ".address", "attribute": "text" }
  ]
}

🚨 Error Handling

Common errors and solutions:

// Timeout error
{
  "error": "Page load timeout",
  "success": false,
  "suggestion": "Increase waitTime or check URL accessibility"
}

// Selector not found
{
  "error": "Selector not found: .missing-element",
  "success": false,
  "suggestion": "Verify CSS selector or wait for dynamic content"
}

🔗 Related Packages

📋 Requirements

  • N8N version 0.174.0 or higher
  • N8N Tools account and API key
  • Node.js 18+ (for development)

🆘 Support

📄 License

MIT License - see LICENSE file for details.


Part of the N8N Tools ecosystemWebsiteAll Packages

Discussion