h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Join our community

Actions198

Overview

This node operation creates a job to add files from the local file system into a specified collection within the H2O GPT environment. It is designed for batch ingestion of documents stored locally, enabling users to organize and process large volumes of files by adding them to collections for further AI-powered analysis, search, or processing.

Typical use cases include:

Automating the import of local document repositories into an AI-driven knowledge base.
Preparing datasets for retrieval-augmented generation (RAG) workflows.
Organizing files for internal team search assistants or document summarization tasks.

For example, a user might specify a root directory containing PDF reports and a glob pattern to match all .pdf files, then create an ingestion job that uploads these files into a collection for semantic search and question answering.

Properties

Name	Meaning
Collection ID	String ID of the target collection where ingested documents will be added.
Root Dir	Path on the local file system where the node should look for files to ingest.
Glob	Glob pattern string used to match files within the root directory (e.g., `*/.pdf` to match all PDFs recursively).
Additional Options	A set of optional parameters to customize ingestion behavior:
- Audio Input Language	Language code for audio files; default is `"auto"` for automatic detection.
- Chunk By Page	Boolean flag indicating whether each page of a document should be treated as a separate chunk. If true, the option to keep tables as one chunk is ignored.
- Gen Doc Questions	Boolean flag to auto-generate sample questions for each document using a large language model (LLM).
- Gen Doc Summaries	Boolean flag to auto-generate summaries for each document using an LLM.
- Handwriting Check	Boolean flag to enable handwriting detection on pages, which triggers specialized models if handwriting is found.
- Ingest Mode	Mode of ingestion: `standard` (default) for normal ingestion suitable for RAG, or `agent_only` which bypasses standard ingestion.
- Keep Tables As One Chunk	Boolean flag indicating whether tables detected by the table parser should be kept as a single chunk.
- Ocr Model	Specifies the OCR method to extract text from images; default is `"auto"`.
- Tesseract Lang	Language code to use when the OCR model is set to `"tesseract"`.
- Timeout	Timeout in seconds for the ingestion job; default is 0 (no timeout).

Output

The node outputs the full HTTP response from the API call that creates the ingestion job. The main output field is json, which contains details about the created job, such as its unique identifier, status, and any metadata returned by the server.

If the ingestion job involves binary data (e.g., files), this node handles the upload via the API but does not output binary data itself. Instead, it returns job metadata for tracking progress.

Dependencies

Requires an API key credential configured in n8n to authenticate with the H2O GPT API.
The node sends requests to the H2O GPT instance's API endpoint, which must be accessible from the n8n environment.
Local file system access is required to read files matching the specified root directory and glob pattern.
Optional dependencies include AI models for handwriting detection, OCR, and LLM-based question/summary generation, which are managed by the backend service.

Troubleshooting

Invalid Collection ID: Ensure the collection ID exists and is accessible with your API credentials.
File Access Issues: Verify that the specified root directory and glob pattern correctly point to existing files and that n8n has permission to read them.
Timeouts: If the ingestion job times out, consider increasing the timeout value or checking network connectivity.
Unsupported File Types: Some file types may not be supported by the ingestion backend; check documentation for supported formats.
Handwriting/OCR Errors: Enabling handwriting check or OCR requires appropriate backend support; errors here may indicate missing models or misconfiguration.
API Authentication Failures: Confirm that the API key credential is valid and has necessary permissions.

Links and References

This summary covers the "Creates a Job to Add Files From the Local System Into a Collection" operation under the "Document Ingestion" resource, based on the provided source code and property definitions.

h2oGPTeInstall