Agentic RAG Supabase icon

Agentic RAG Supabase

Handle RAG operations with Supabase pgvector for PDF/TXT files

Overview

The node "Agentic RAG Supabase" provides operations to process files, vectors, and agentic retrieval-augmented generation (RAG) workflows using a Supabase backend with pgvector support. Specifically for the File resource and the Extract Structured operation, the node extracts structured data from supported file types (PDF, TXT, CSV) into a chosen output format such as JSON, CSV, or Markdown table.

This operation is useful when you want to convert unstructured or semi-structured documents into structured tabular data for further processing, analysis, or integration. For example, extracting tables or text blocks from PDFs or CSV files into JSON arrays or markdown tables that can be easily consumed by other nodes or systems.

Practical examples:

  • Extracting tables from PDF reports into JSON for database ingestion.
  • Converting CSV files into markdown tables for documentation or display.
  • Parsing TXT files with delimited content into structured arrays.

Properties

Name Meaning
File Path The path to the input file to extract structured data from. Supported types: PDF, TXT, CSV.
Output Format The desired output format of the extracted structured data. Options: JSON, CSV, Markdown Table.

Output

The output JSON object contains:

  • structuredData: The extracted structured content from the file. Its type depends on the selected output format:
    • If JSON is selected, this is an array of arrays representing rows and columns.
    • If CSV is selected, this is a string with comma-separated values.
    • If Markdown Table is selected, this is a string formatted as a markdown table.
  • format: The output format used (json, csv, or markdown).
  • fileName: The base name of the processed file.

No binary data output is produced by this operation.

Dependencies

  • Requires access to the local filesystem to read the specified file path.
  • Uses several npm packages internally for parsing:
    • pdf2json for PDF structure extraction.
    • csv-parser for CSV reading.
    • Node.js fs module for file reading.
  • No external API calls are made during structured extraction itself.
  • The node requires credentials for the overall Supabase and Huggingface services, but these are not directly involved in the extract structured operation.

Troubleshooting

  • Unsupported file type error:
    If the file extension is not .pdf, .txt, or .csv, the node will throw an error indicating unsupported file type for structured extraction. Ensure your file is one of the supported formats.

  • File not found or inaccessible:
    If the file path is incorrect or the node does not have permission to read the file, it will fail. Verify the file path and permissions.

  • Malformed CSV or TXT delimiters:
    The node attempts to auto-detect delimiters for TXT files among tab, comma, and pipe characters. If the file uses unusual delimiters or inconsistent formatting, the output may be incorrect or empty.

  • PDF parsing issues:
    Complex PDFs with non-tabular layouts might produce unexpected results since the extraction relies on text positioning heuristics.

Links and References


If you need details on other resources or operations, feel free to ask!

Discussion