Actions9
- File Actions
- Vector Actions
- Agentic RAG Actions
Overview
The Agentic RAG node with the Ingest Document operation is designed to process and ingest documents into a vector database for retrieval-augmented generation (RAG) workflows. It supports reading documents from file paths, extracting their textual content, generating embeddings for chunks of text, and storing these embeddings in a Supabase pgvector table. This enables efficient semantic search and question answering over ingested documents.
Typical use cases include:
- Ingesting PDFs, DOCX, or TXT files into a vector store to enable semantic search.
- Preparing document data for AI-powered question answering systems that rely on context retrieval.
- Automating document processing pipelines where documents are parsed, embedded, and indexed for later querying.
For example, a user can provide a PDF file path, and the node will parse the text, split it into chunks, generate vector embeddings for each chunk using a Hugging Face model, and upsert these vectors into a Supabase database. Later, these vectors can be searched to answer queries based on the document content.
Properties
| Name | Meaning |
|---|---|
| Document Path | The full file system path to the document to ingest. Supported formats: PDF, DOCX, TXT. |
| OpenAI API Key | A required API key for OpenAI services used internally for query processing and evaluation. |
Output
The output JSON object includes:
fileName: The base name of the ingested document file.chunksProcessed: Number of text chunks generated from the document.upsertStatus: Status message indicating the result of upserting vectors into the database.successCount: Number of chunks successfully inserted/updated in the vector store.errorCount: Number of chunks that failed during upsert.ingestionComplete: Boolean flag indicating successful completion of ingestion.
This output provides detailed feedback on the ingestion process, including how many chunks were processed and stored.
Dependencies
- Supabase: Used as the vector database backend with pgvector extension for storing embeddings.
- Hugging Face Inference API: For generating vector embeddings of text chunks using the "thenlper/gte-small" model.
- OpenAI API: Utilized for advanced query processing, answer generation, and evaluation within the RAG pipeline.
- File System Access: Reads local files (PDF, DOCX, TXT) for ingestion.
- Node.js libraries: Includes pdf-parse, mammoth, pdf2json, csv-parser, axios, and others for file parsing and HTTP requests.
The node requires credentials for Supabase (project URL and API key), Hugging Face API key, and an OpenAI API key.
Troubleshooting
- Unsupported File Type Error: If the document path points to a file type other than PDF, DOCX, or TXT, the node will throw an error. Ensure the file format is supported.
- Embedding Generation Failures: Errors during embedding generation may occur if the Hugging Face API key is invalid or rate-limited. Verify API key validity and usage limits.
- Upsert Vector Errors: Some chunks might fail to insert into Supabase due to malformed data or connectivity issues. Check network access and database permissions.
- OpenAI API Errors: Answer generation and evaluation steps depend on a valid OpenAI API key. Invalid keys or quota exhaustion will cause errors.
- File Path Issues: Ensure the provided document path is accessible by the n8n runtime environment and correctly specified.
Common error messages include:
"Unsupported file type: .xyz"— Use a supported file format."Embedding error: ..."— Check Hugging Face API key and service status."Answer generation error: ..."— Validate OpenAI API key and network connectivity."Chunk processing error: ..."— Inspect individual chunk data and database state.
Links and References
- Supabase pgvector documentation
- Hugging Face Inference API
- OpenAI Chat Completion API
- pdf-parse npm package
- mammoth npm package (DOCX parser)
This summary focuses specifically on the Agentic RAG resource and its Ingest Document operation as requested.