n8ntools-web-scraper

N8N Tools - Web Scraper: Extract data from websites with AI-powered content recognition and anti-bot detection

Package Information

Downloads: 199 weekly / 6,328 monthly

Latest Version: 4.4.1

Author: N8N Tools

Available Nodes

N8N Tools - Web Scraper

Scrape data from websites using N8N Tools platform

Documentation

N8N Tools - Web Scraper

Extract data from websites with AI-powered content recognition and anti-bot detection bypass. This N8N community node provides intelligent web scraping capabilities through the N8N Tools platform.

✨ Features

🕷️ Smart Scraping: AI-powered content recognition and extraction
🔄 Multiple Operations: Single page, multiple pages, and monitoring
🎯 CSS Selectors: Flexible data extraction with attribute support
🤖 JavaScript Support: Handle dynamic content and SPAs
📸 Screenshots: Optional page screenshots for verification
🛡️ Anti-Bot Protection: Built-in detection bypass mechanisms
💰 Cost Tracking: Usage monitoring and budget controls

🚀 Quick Start

Installation

Install this node in your N8N instance:

Via Community Nodes (Recommended)

Go to Settings > Community Nodes in your N8N interface
Click Install a community node
Enter n8n-nodes-n8ntools-web-scraper
Click Install

Via npm

npm install n8n-nodes-n8ntools-web-scraper

Setup Credentials

Sign up at N8N Tools and get your API key
In N8N, create new N8N Tools API credentials
Enter your API URL: https://api.n8ntools.io
Enter your API key

📖 Usage

Supported Operations

Operation	Description	Use Case
Scrape Single Page	Extract data from one webpage	Product details, contact info
Scrape Multiple Pages	Batch process multiple URLs	Catalog scraping, bulk data
Monitor Page Changes	Track website changes	Price monitoring, content updates

Example Workflow

[Schedule Trigger] → [N8N Tools Web Scraper] → [Process Data] → [Database]

Configuration Example

E-commerce Product Scraping:

{
  "operation": "scrapePage",
  "url": "https://example-store.com/products/laptop",
  "selectors": [
    {
      "name": "title",
      "selector": "h1.product-title",
      "attribute": "text"
    },
    {
      "name": "price",
      "selector": ".price-current",
      "attribute": "text"
    },
    {
      "name": "images",
      "selector": ".product-gallery img",
      "attribute": "src",
      "multiple": true
    },
    {
      "name": "availability",
      "selector": ".stock-status",
      "attribute": "text"
    }
  ],
  "options": {
    "waitForSelector": ".price-current",
    "waitTime": 3,
    "screenshot": true
  }
}

⚙️ Node Parameters

URL Configuration

URL: Target webpage URL (for single page and monitoring)
URLs: Multiple URLs, one per line (for batch processing)

Selector Configuration

Name: Field name in the output
CSS Selector: CSS selector to target elements
Attribute: Element attribute to extract (text, href, src, title, etc.)
Multiple: Extract multiple elements (returns array)

Advanced Options

Wait for Selector: CSS selector to wait for before scraping
Wait Time: Seconds to wait before extraction (default: 5)
User Agent: Custom user agent string
Enable JavaScript: Execute JavaScript on page (default: true)
Screenshot: Capture page screenshot (default: false)
Follow Redirects: Handle HTTP redirects (default: true)

📤 Output Data

Single Page Result

{
  "url": "https://example-store.com/products/laptop",
  "title": "Gaming Laptop Pro 15\"",
  "price": "$1,299.99",
  "images": [
    "https://example-store.com/img/laptop-1.jpg",
    "https://example-store.com/img/laptop-2.jpg"
  ],
  "availability": "In Stock",
  "success": true,
  "operation": "scrapePage",
  "creditsUsed": 1,
  "creditsRemaining": 99,
  "timestamp": "2024-01-15T10:30:00Z"
}

Multiple Pages Result

Returns array with one object per URL processed.

🔧 Selector Guide

Basic Selectors

// Text content
{ "selector": "h1", "attribute": "text" }

// Links
{ "selector": "a.product-link", "attribute": "href" }

// Images
{ "selector": "img.thumbnail", "attribute": "src" }

// Data attributes
{ "selector": "[data-price]", "attribute": "data-price" }

Advanced Selectors

// Multiple items
{
  "selector": ".product-item",
  "attribute": "text",
  "multiple": true
}

// Nested selection
{
  "selector": ".product-card .title",
  "attribute": "text"
}

// Attribute extraction
{
  "selector": "meta[property='og:image']",
  "attribute": "content"
}

🤖 JavaScript Support

Handle dynamic content and single-page applications:

{
  "options": {
    "enableJavaScript": true,
    "waitForSelector": ".dynamic-content",
    "waitTime": 5
  }
}

Perfect for:

React/Vue/Angular applications
AJAX-loaded content
Lazy-loaded images
Dynamic pricing

📸 Screenshot Feature

Capture page screenshots for verification:

{
  "options": {
    "screenshot": true
  }
}

Screenshots are returned as base64-encoded PNG images in the response.

🛡️ Anti-Bot Features

Built-in protection against common anti-bot measures:

Rotating User Agents: Automatic user agent rotation
Request Delays: Human-like request timing
Header Spoofing: Realistic browser headers
Proxy Support: Optional proxy rotation (premium feature)

💸 Pricing & Limits

Single Page: 1 credit per page
Multiple Pages: 1 credit per URL
Page Monitoring: 1 credit per check
Screenshot: +0.5 credits when enabled
Rate Limit: Based on your N8N Tools subscription

🔄 Monitoring Workflows

Price Monitoring Example

[Cron Trigger: Daily] → [Web Scraper] → [Compare Previous] → [Send Alert]

Content Change Detection

[Schedule: Hourly] → [Web Scraper] → [Hash Content] → [Detect Changes] → [Notify]

🛠️ Advanced Use Cases

Product Catalog Scraping

// Scrape product listings
{
  "operation": "scrapePage",
  "url": "https://store.com/category/laptops",
  "selectors": [
    {
      "name": "products",
      "selector": ".product-item a",
      "attribute": "href",
      "multiple": true
    }
  ]
}

Lead Generation

// Extract contact information
{
  "selectors": [
    { "name": "email", "selector": "a[href^='mailto:']", "attribute": "href" },
    { "name": "phone", "selector": ".contact-phone", "attribute": "text" },
    { "name": "address", "selector": ".address", "attribute": "text" }
  ]
}

🚨 Error Handling

Common errors and solutions:

// Timeout error
{
  "error": "Page load timeout",
  "success": false,
  "suggestion": "Increase waitTime or check URL accessibility"
}

// Selector not found
{
  "error": "Selector not found: .missing-element",
  "success": false,
  "suggestion": "Verify CSS selector or wait for dynamic content"
}

🔗 Related Packages

PDF Generator: Generate reports from scraped data
Document Processor: Process downloaded documents

📋 Requirements

N8N version 0.174.0 or higher
N8N Tools account and API key
Node.js 18+ (for development)

🆘 Support

📧 Email: support@n8ntools.io
📖 Documentation: docs.n8ntools.io
💬 Community: Discord
🐛 Issues: GitHub

📄 License

MIT License - see LICENSE file for details.

Part of the N8N Tools ecosystem • Website • All Packages