h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

This node operation creates a job to add files from the local file system into a specified collection within the H2O GPT environment. It is designed for batch ingestion of documents stored locally, enabling users to organize and process large volumes of files by adding them to collections for further AI-powered analysis, search, or processing.

Typical use cases include:

  • Automating the import of local document repositories into an AI-driven knowledge base.
  • Preparing datasets for retrieval-augmented generation (RAG) workflows.
  • Organizing files for internal team search assistants or document summarization tasks.

For example, a user might specify a root directory containing PDF reports and a glob pattern to match all .pdf files, then create an ingestion job that uploads these files into a collection for semantic search and question answering.

Properties

Name Meaning
Collection ID String ID of the target collection where ingested documents will be added.
Root Dir Path on the local file system where the node should look for files to ingest.
Glob Glob pattern string used to match files within the root directory (e.g., **/*.pdf to match all PDFs recursively).
Additional Options A set of optional parameters to customize ingestion behavior:
- Audio Input Language Language code for audio files; default is "auto" for automatic detection.
- Chunk By Page Boolean flag indicating whether each page of a document should be treated as a separate chunk. If true, the option to keep tables as one chunk is ignored.
- Gen Doc Questions Boolean flag to auto-generate sample questions for each document using a large language model (LLM).
- Gen Doc Summaries Boolean flag to auto-generate summaries for each document using an LLM.
- Handwriting Check Boolean flag to enable handwriting detection on pages, which triggers specialized models if handwriting is found.
- Ingest Mode Mode of ingestion: standard (default) for normal ingestion suitable for RAG, or agent_only which bypasses standard ingestion.
- Keep Tables As One Chunk Boolean flag indicating whether tables detected by the table parser should be kept as a single chunk.
- Ocr Model Specifies the OCR method to extract text from images; default is "auto".
- Tesseract Lang Language code to use when the OCR model is set to "tesseract".
- Timeout Timeout in seconds for the ingestion job; default is 0 (no timeout).

Output

The node outputs the full HTTP response from the API call that creates the ingestion job. The main output field is json, which contains details about the created job, such as its unique identifier, status, and any metadata returned by the server.

If the ingestion job involves binary data (e.g., files), this node handles the upload via the API but does not output binary data itself. Instead, it returns job metadata for tracking progress.

Dependencies

  • Requires an API key credential configured in n8n to authenticate with the H2O GPT API.
  • The node sends requests to the H2O GPT instance's API endpoint, which must be accessible from the n8n environment.
  • Local file system access is required to read files matching the specified root directory and glob pattern.
  • Optional dependencies include AI models for handwriting detection, OCR, and LLM-based question/summary generation, which are managed by the backend service.

Troubleshooting

  • Invalid Collection ID: Ensure the collection ID exists and is accessible with your API credentials.
  • File Access Issues: Verify that the specified root directory and glob pattern correctly point to existing files and that n8n has permission to read them.
  • Timeouts: If the ingestion job times out, consider increasing the timeout value or checking network connectivity.
  • Unsupported File Types: Some file types may not be supported by the ingestion backend; check documentation for supported formats.
  • Handwriting/OCR Errors: Enabling handwriting check or OCR requires appropriate backend support; errors here may indicate missing models or misconfiguration.
  • API Authentication Failures: Confirm that the API key credential is valid and has necessary permissions.

Links and References


This summary covers the "Creates a Job to Add Files From the Local System Into a Collection" operation under the "Document Ingestion" resource, based on the provided source code and property definitions.

Discussion