bee2bee

n8n nodes to index GitHub repositories, extract metadata, and generate embeddings for RAG

Package Information

Downloads: 0 weekly / 28 monthly

Latest Version: 0.3.9

Author: Bee2Bee Team

Available Nodes

Bee2Bee Indexer

Index GitHub repositories and generate embeddings for RAG

Bee2Bee Metadata

Extract rich metadata from GitHub repositories

Documentation

n8n-nodes-bee2bee-indexer

This is an n8n community node that lets you index GitHub repositories and generate embeddings for RAG (Retrieval-Augmented Generation) systems.

Features

🐝 Multi-language support: Python, JavaScript, TypeScript, Rust, Go, Java, C, C++
🔍 Smart code parsing: Uses tree-sitter for accurate AST-based parsing
🧠 Dual embeddings: Generates both NLP and code-specific embeddings
⚡ Flexible output: Choose between full data, chunks only, or metadata only
🔐 Multiple providers: Local embeddings (free) or OpenAI (paid)
🎯 Customizable chunking: Function-level, class-level, or file-level strategies

Installation

Follow the installation guide in the n8n community nodes documentation.

Community Node Installation

Go to Settings > Community Nodes.
Select Install.
Enter n8n-nodes-bee2bee-indexer in Enter npm package name.
Agree to the risks of using community nodes.
Select Install.

Manual Installation

To get started locally, install the dependencies:

cd n8n-node
npm install

Build the node:

npm run build

Link it to your local n8n installation:

npm link

Then in your n8n custom directory (~/.n8n/custom/):

npm link n8n-nodes-bee2bee-indexer

Credentials

This node requires the following credentials:

GitHub Token: Personal Access Token for downloading repositories
OpenAI API Key (optional): Only needed if using OpenAI embeddings
Embedding Provider: Choose between local (free) or openai (paid)

Operations

Index Repository

Downloads a GitHub repository and generates embeddings for all code files.

Parameters:

Repository Owner (required): GitHub username or organization
Repository Name (required): Repository name
Branch (required): Git branch to index (default: main)
Output Format:
- Full: Metadata + Chunks + Embeddings
- Chunks + Embeddings: Only code chunks with embeddings
- Chunks Only: Code chunks without embeddings
- Metadata Only: Repository statistics only

Additional Options:

Max Files: Limit number of files to process (0 = no limit)
File Extensions: Comma-separated list of extensions to include
Exclude Patterns: Directories to exclude (e.g., node_modules,dist)
Include Docstrings: Extract and include documentation
Chunk Strategy: function, class, or file level chunking

Output

The node outputs a JSON object with the following structure:

{
  "success": true,
  "repository": {
    "owner": "facebook",
    "name": "react",
    "branch": "main",
    "fullName": "facebook/react"
  },
  "statistics": {
    "totalFiles": 150,
    "processedFiles": 145,
    "totalChunks": 1234,
    "languageBreakdown": {
      "javascript": 80,
      "typescript": 65
    }
  },
  "chunks": [
    {
      "id": "unique_id",
      "code": "function example() {...}",
      "metadata": {
        "file_path": "src/index.js",
        "language": "javascript",
        "chunk_type": "function",
        "name": "example",
        "lines": [10, 25]
      },
      "embeddings": {
        "nlp": [0.1, 0.2, ...],
        "code": [0.3, 0.4, ...]
      }
    }
  ]
}

Usage in n8n Workflows

Example: Index → Store in Vector DB

[Schedule Trigger] → [Bee2Bee Indexer] → [Pinecone] → [Webhook]

Bee2Bee Indexer node processes the repository
Output is sent to Pinecone (or ChromaDB/Qdrant/Weaviate)
Final webhook confirms indexing is complete

Example: Search Flow

[Webhook] → [Pinecone Search] → [OpenAI] → [Response]

User sends search query via webhook
Pinecone searches indexed embeddings
OpenAI uses retrieved chunks for context
Response sent back with answer

Compatibility

Tested with n8n version 1.0.0+

Resources

License

MIT

bee2beeInstall