Package Information
Downloads: 48 weekly / 974 monthly
Latest Version: 1.1.7
Author: Centicore Technologies
Documentation
n8n-nodes-datagait
n8n community node for DataGait — scrape, extract, and crawl data from JavaScript-heavy websites with the world's fastest headless DOM engine.
npm | Documentation | Report Issue
Features
- Full JavaScript Rendering — Renders SPAs, React, Angular, Vue, and dynamic content before extraction
- Structured Data Extraction — JSON-LD, Open Graph, Twitter Cards, microdata in a single call
- Multi-Page Crawling — Crawl entire sites with intelligent link following and SSE streaming
- Proxy Support — Smart proxy routing with
X-Proxy-Mode(off / smart / always) - Blazing Fast — Powered by a Rust-based headless DOM engine, 10x faster than traditional headless browsers
- Clean Text Output — CSS-aware text extraction ideal for LLM and AI data pipelines
- Real-Time Credit Tracking — Per-page credit usage and remaining balance in crawl events
Installation
From n8n Community Nodes (recommended)
- Open your n8n instance
- Go to Settings → Community Nodes
- Click Install a community node
- Search for
n8n-nodes-datagait - Click Install
Manual Installation
cd ~/.n8n
npm install n8n-nodes-datagait
# Restart n8n
Setup
- Sign up at datagait.com
- Go to Dashboard → Settings → API Keys → Create API Key
- In n8n, add DataGait API credentials with your
dg_live_xxxtoken - Click Test to verify the connection
Actions (6)
Scraping Actions (GET /extract)
| Action | Description |
|---|---|
| Extract Page | Extract content from a webpage with full JavaScript rendering. Choose which fields to return: HTML, text, links, media, meta tags, and structured data. |
| Scrape Text | Extract clean, readable text content from a URL. Ideal for feeding into LLMs, AI agents, and NLP pipelines. |
| Scrape Links | Extract all links from a webpage with resolved absolute URLs and link count. |
| Scrape Metadata | Extract meta tags, Open Graph, Twitter Cards, and JSON-LD structured data from any page. |
Crawling Actions (POST /crawl — SSE streaming)
| Action | Description |
|---|---|
| Crawl Site | Crawl multiple pages from a site with intelligent extraction. Configure output fields (text, html, links, media, meta, structured_data), max pages (up to 100), and parallel workers. Results streamed via Server-Sent Events with real-time credit tracking. |
Account Actions
| Action | Description |
|---|---|
| Health Check | Check DataGait API availability and measure response latency. |
Output Fields
Extract Page
| Field | Description |
|---|---|
url |
The final URL after redirects |
title |
Page title |
html |
Full rendered HTML after JavaScript execution |
text |
CSS-aware innerText — clean, readable content |
links |
All <a> and <link> elements with resolved absolute URLs |
media |
Images, videos, and audio elements with src and alt |
meta |
All <meta> tags (name, property, content) |
structured_data |
JSON-LD, microdata, Open Graph, Twitter Cards |
timing_ms |
Extraction time in milliseconds |
proxy_used |
Whether a proxy was used for the request |
proxy_provider |
Proxy provider: datagait, byop, or none |
Crawl Site
| Field | Description |
|---|---|
url |
Starting URL |
pages |
Array of page events (each with url, page_num, text, title, links, credits_used, credits_remaining) |
page_count |
Number of pages crawled |
total_ms |
Total crawl time in milliseconds |
total_credits_used |
Total credits consumed |
credit_exhausted |
true if crawl stopped due to credit exhaustion |
Crawl SSE Events
| Event Type | Description |
|---|---|
started |
First event — confirms crawl started with max_pages and credits_available |
page |
One per page — includes success, content fields, credits_used, credits_remaining |
done |
Final event — total_pages, total_content, total_links, total_ms, total_credits_used |
credit_exhausted |
Only if credits run out mid-crawl — pages_completed, credits_used |
API Details
- Extract:
GET /extract?url=<url>&html=true&text=true&... - Crawl:
POST /crawlwith JSON body{ url, config: { max_pages, text, html, links, workers, ... } } - Auth:
X-API-Key: dg_live_xxxheader - Base URL:
https://ingest.datagait.com(configurable for self-hosted) - Proxy: Optional
X-Proxy-Modeheader (off/smart/always) - Crawl Timeout: Up to 300s — SSE stream stays open until
doneorcredit_exhausted
Why DataGait?
| Feature | DataGait | Firecrawl | Apify |
|---|---|---|---|
| JS Rendering | Rust-based headless DOM (fastest) | Chrome-based | Chrome-based |
| Latency | Sub-second for most pages | 2-10s typical | 5-30s typical |
| Structured Data | JSON-LD, OG, Twitter, microdata | Limited | Varies by actor |
| Crawl Streaming | Real-time SSE with per-page credits | Webhook/polling | Polling |
| Proxy Support | Built-in smart proxy routing | Separate config | Separate config |
| Pricing | 1 credit/page (10 via proxy) | Pay per extraction | Pay per compute |
Development
Build
cd integrations/n8n
npm install
npm run build
Test Locally with n8n
cd integrations/n8n
npm link
cd ~/.n8n/custom
npm link n8n-nodes-datagait
# Restart n8n — the DataGait node will appear in the editor
Publish to npm
# 1. Bump version in package.json
# 2. Build and publish
npm publish
Once published to npm with the n8n-community-node-package keyword, it will automatically appear in the n8n Community Nodes marketplace.
Example Workflows
See the examples/ directory for importable n8n workflow JSON files:
- price-monitoring.json — Monitor product prices on a schedule, alert on drops
- job-board-scraper.json — Scrape job listings and save to Airtable
Resources
License
MIT