Agentic RAG Supabase icon

Agentic RAG Supabase

Handle RAG operations with Supabase pgvector for PDF/TXT files

Overview

The Agentic RAG node with the Ingest Document operation is designed to process and ingest documents into a vector database for retrieval-augmented generation (RAG) workflows. It supports reading documents from file paths, extracting their textual content, generating embeddings for chunks of text, and storing these embeddings in a Supabase pgvector table. This enables efficient semantic search and question answering over ingested documents.

Typical use cases include:

  • Ingesting PDFs, DOCX, or TXT files into a vector store to enable semantic search.
  • Preparing document data for AI-powered question answering systems that rely on context retrieval.
  • Automating document processing pipelines where documents are parsed, embedded, and indexed for later querying.

For example, a user can provide a PDF file path, and the node will parse the text, split it into chunks, generate vector embeddings for each chunk using a Hugging Face model, and upsert these vectors into a Supabase database. Later, these vectors can be searched to answer queries based on the document content.

Properties

Name Meaning
Document Path The full file system path to the document to ingest. Supported formats: PDF, DOCX, TXT.
OpenAI API Key A required API key for OpenAI services used internally for query processing and evaluation.

Output

The output JSON object includes:

  • fileName: The base name of the ingested document file.
  • chunksProcessed: Number of text chunks generated from the document.
  • upsertStatus: Status message indicating the result of upserting vectors into the database.
  • successCount: Number of chunks successfully inserted/updated in the vector store.
  • errorCount: Number of chunks that failed during upsert.
  • ingestionComplete: Boolean flag indicating successful completion of ingestion.

This output provides detailed feedback on the ingestion process, including how many chunks were processed and stored.

Dependencies

  • Supabase: Used as the vector database backend with pgvector extension for storing embeddings.
  • Hugging Face Inference API: For generating vector embeddings of text chunks using the "thenlper/gte-small" model.
  • OpenAI API: Utilized for advanced query processing, answer generation, and evaluation within the RAG pipeline.
  • File System Access: Reads local files (PDF, DOCX, TXT) for ingestion.
  • Node.js libraries: Includes pdf-parse, mammoth, pdf2json, csv-parser, axios, and others for file parsing and HTTP requests.

The node requires credentials for Supabase (project URL and API key), Hugging Face API key, and an OpenAI API key.

Troubleshooting

  • Unsupported File Type Error: If the document path points to a file type other than PDF, DOCX, or TXT, the node will throw an error. Ensure the file format is supported.
  • Embedding Generation Failures: Errors during embedding generation may occur if the Hugging Face API key is invalid or rate-limited. Verify API key validity and usage limits.
  • Upsert Vector Errors: Some chunks might fail to insert into Supabase due to malformed data or connectivity issues. Check network access and database permissions.
  • OpenAI API Errors: Answer generation and evaluation steps depend on a valid OpenAI API key. Invalid keys or quota exhaustion will cause errors.
  • File Path Issues: Ensure the provided document path is accessible by the n8n runtime environment and correctly specified.

Common error messages include:

  • "Unsupported file type: .xyz" — Use a supported file format.
  • "Embedding error: ..." — Check Hugging Face API key and service status.
  • "Answer generation error: ..." — Validate OpenAI API key and network connectivity.
  • "Chunk processing error: ..." — Inspect individual chunk data and database state.

Links and References


This summary focuses specifically on the Agentic RAG resource and its Ingest Document operation as requested.

Discussion