h2oGPTe
Actions198
- Agent Actions
- API Key Actions
- Chat Actions
- Asks Question in a Given Chat Session. If Stream Is Enabled, the Server Sends Stream of Delta Messages. The Stream Is te...
- Changes the Vote Value of a Chat Message
- Creates Chat Session
- Creates Job to Delete Chat Sessions
- Deletes Agent Server Directories
- Deletes Agent Server Files
- Deletes Collection
- Deletes Specific Chat Messages
- Fetches Chat Message and Metadata for Messages in a Chat Session
- Fetches Chat Message Meta Information
- Fetches Metadata for References of a Chat Message
- Finds a Chat Session by ID
- Gets Stats of a Agent Server Directory
- List Chat Sessions
- List Suggested Questions for a Given Chat Session
- Lists Agent Server Files
- Lists Stats of Agent Server Directories
- Lists User's Questions and Answers that Have a Feedback
- Removes a Collection Reference From the Chat Session
- Removes a Prompt Template Reference From the Chat Session
- Update Feedback for a Specific Answer to a Question
- Updates a Collection Reference of a Chat Session
- Updates a Prompt Template Reference of a Chat Session
- Updates the Name of a Chat Session
- Collection Actions
- Archives a Collection Along with Its Associated Data
- Counts a Number of Chat Sessions with the Collection
- Counts a Number of Documents in the Collection
- Create a Collection
- Creates a Job to Delete Collection Thumbnail
- Creates a Job to Delete Collections
- Creates a Job to Update Collection Thumbnail
- Creates Job to Import Collection to the Collection
- Creates Job to Insert Document to the Collection
- Creates Job to Remove Documents From the Collection
- Deletes Collection
- Deletes Collection Thumbnail
- Fetches All Users' Collection Metadata Sorted by Last Update Time by Default
- Fetches Collection Chat Settings
- Fetches Collection Metadata
- Fetches Collection Settings
- Finds Chunks Related to a Message Using Lexical Search
- Finds Chunks Related to a Message Using Semantic Search
- Get a Collection
- Import an Already Stored Document to an Existing Collection
- List a Collection's Documents
- List Chat Sessions for a Given Collection
- List Collections
- List Suggested Questions for a Given Collection
- Removes a Prompt Template Reference From the Collection
- Removes a Size Limit for a Collection
- Removes an Expiry Date From a Collection
- Removes an Inactivity Interval From the Collection
- Removes Sharing of a Collection to a User
- Removes Sharing of a Collection to All Other Users Except the Original Owner
- Removes the Document From the Collection
- Updates Collection Settings
- Resets the Prompt Settings for a Given Collection
- Restores an Archived Collection to an Active Status
- Returns a List of Access Permissions for a Given Collection
- Returns a List of Group Access Permissions for a Given Collection
- Returns Specific Chunks in a Collection
- Sets a Maximum Limit on the Total Size of Documents (Sum) Added to a Collection
- Shares a Collection to a Group
- Shares a Collection to a User
- Updates a Flag Specifying Whether a Collection Is Private or Public
- Updates a Prompt Template Reference of a Collection
- Updates an Expiry Date of a Collection
- Updates an Inactivity Interval of a Collection
- Updates Attributes of an Existing Collection
- Updates Collection Chat Settings
- Updates Collection Metadata
- Updates Collection Thumbnail
- Configuration Actions
- Deletes Global Configuration Items
- Deletes Role Configuration Items
- Deletes User Configuration Items
- Gets Configurations for a Given Role
- Gets Global Configurations
- Gets User Configurations
- Resets User Configuration Item
- Sets Configuration Item for a Given Role
- Sets Global Configuration Item
- Sets User Configuration Item
- Document Actions
- Assigns a Tag to the Document
- Counts a Number of Chat Sessions with the Document
- Counts a Number of Documents
- Creates Job to Delete Documents
- Creates Job to Process Document
- Deletes a Document
- Deletes Document Summaries
- Fetches Document Guardrails Settings
- Fetches Document Internal Metadata
- Fetches Document Metadata
- Fetches Document Page Layout
- Fetches Document Page Ocr Model
- Fetches Document Summary
- Fetches Document User Source File
- Fetches Recent Document Summaries/extractions/transformations
- Finds a Document by ID
- List Chat Sessions for a Given Document
- List Documents
- Lists Collections for Containing a Given Document
- Removes a Tag From a Document
- Returns All Chunks for a Specific Document
- Updates Attributes of an Existing Document
- Updates Document Metadata
- Document Ingestion Actions
- Adds Files From the AWS S3 Storage Into a Collection
- Adds Files From the Azure Blob Storage Into a Collection
- Adds Files From the Google Cloud Storage Into a Collection
- Adds Files From the Local System Into a Collection
- Adds Plain Text to a Collection
- Converts Files Uploaded in "Agent_only" Ingest Mode to PDF and Parses Them
- Crawls and Ingest a URL Into a Collection
- Creates a Job to Add Files From the AWS S3 Storage Into a Collection
- Creates a Job to Add Files From the Azure Blob Storage Into a Collection
- Creates a Job to Add Files From the Google Cloud Storage Into a Collection
- Creates a Job to Add Files From the Local System Into a Collection
- Creates a Job to Add Plain Text to a Collection
- Creates a Job to Crawl and Ingest a URL Into a Collection
- Creates a Job to Ingest Uploaded Document
- Creates a Job to Parse Files Uploaded in "Agent_only" Ingest Mode
- Ingest Uploaded Document
- Uploads File to H2OGPTe Instance
- Job Actions
- Model Actions
- Creates a Topic Model on the Collection
- Creates Job for Creation of a Topic Model
- Encode Texts for Semantic Searching
- Extract Information From One or More Contexts Using an LLM
- Helper to Get Reasonable (Easy to Use) Defaults for Guardrails/PII Settings
- Returns Performance Statistics Grouped by Models
- Returns Usage Statistics for All Models
- Returns Usage Statistics Grouped by Models
- Returns Usage Statistics Grouped by Models and Users
- Returns Usage Statistics Grouped by Users
- Runs a Self-Test for a Given Model
- Send a Message and Get a Response From an LLM
- Summarize One or More Contexts Using an LLM
- Permission Actions
- Deletes Roles for Given Unique Identifiers
- Assigns Permission to a Given Role
- Assigns Roles to a Given Group
- Assigns Roles to a Given User
- Associates a User with a Document They Have Permission On
- Checks if Collection Permission Is Granted for a Given User
- Checks if Permission Is Granted for a Given User
- Creates a Role
- Creates a User Group
- Deletes Groups for Given Group Names
- Deletes Groups for Given Unique Identifiers
- Deletes Roles for Given Role Names
- Finds Role for a Given Unique Identifier
- Finds Roles Associated with a Given Group
- Finds Roles Associated with a Given User
- Finds User for a Given Unique Identifier
- Lists Permissions of a Given Group
- Lists Permissions of a Given Role
- Lists Permissions of a Given User
- Removes Permission From a Given Role
- Removes Roles From a Given Group
- Removes Roles From a Given User
- Resets Group Roles
- Resets User Roles
- Returns a List of All Registered Users for the System
- Returns System Permissions
- Sets a New Set of Permissions for a Given Role
- Sets Priority for a Given Role
- Prompt Template Actions
- Creates a New Prompt Template
- Deletes a Prompt Template
- Finds a Prompt Template by ID
- List Prompt Templates
- Lists Prompt Templates, Including Hidden Default Templates
- Removes Access to a Prompt Template for a Group
- Removes Sharing of a Prompt Template to a User
- Removes Sharing of a Prompt Template to All Other Users Except the Original Owner
- Resets and Shares a Prompt Template to a New List of Groups
- Resets and Shares a Prompt Template to a New List of Users
- Returns a List of Access Permissions for a Given Prompt Template
- Returns a List of Group Access Permissions for a Given Prompt Template
- Shares a Prompt Template to a User
- Shares a Prompt Template with a Group
- Updates a Flag Specifying Whether a Default Prompt Template Is Visible or Hidden to Users
- Updates Attributes of a Given Prompt Template
- Tag Actions
Overview
This node operation allows users to add files from their local file system into a specified document collection within the system. It is designed for ingesting documents stored locally by specifying a root directory and a glob pattern to match files. The node supports various options to customize the ingestion process, such as language settings for audio files, chunking behavior, OCR model selection, handwriting detection, and timeout settings.
This functionality is beneficial in scenarios where organizations want to bulk import local documents into a centralized collection for further processing, searching, or analysis. For example, a user might have a folder of PDFs, images, or audio files on their computer that they want to ingest into a knowledge base or document management system.
Practical examples:
- Importing scanned contracts or reports stored locally into a searchable document collection.
- Adding audio recordings from a local directory with automatic language detection for transcription.
- Ingesting a set of research papers matched by a glob pattern for semantic search and question answering.
Properties
| Name | Meaning |
|---|---|
| Collection ID | String ID of the collection to add the ingested documents into. This identifies the target collection where files will be added. |
| Root Dir | String path of the root directory on the local file system where the node will look for files to ingest. |
| Glob | String glob pattern used to match files within the root directory. Only files matching this pattern will be ingested. |
| Additional Options | A collection of optional parameters to customize ingestion: |
| - Audio Input Language | Language code for audio files; default is "auto" for automatic detection. |
| - Chunk By Page | Boolean indicating whether each page should be treated as a separate chunk. If true, keep_tables_as_one_chunk is ignored. |
| - Gen Doc Questions | Boolean to enable auto-generation of sample questions for each document using a large language model (LLM). |
| - Gen Doc Summaries | Boolean to enable auto-generation of document summaries using an LLM. |
| - Handwriting Check | Boolean to enable checking pages for handwriting, which triggers specialized models if handwriting is detected. |
| - Ingest Mode | Option to select the ingest mode: "standard" (default) for regular ingestion suitable for retrieval-augmented generation (RAG), or "agent_only" to bypass standard ingestion. |
| - Keep Tables As One Chunk | Boolean indicating whether tables identified by the table parser should be kept as a single chunk. |
| - Ocr Model | String specifying the OCR method to extract text from images. Default is "auto". Supported methods include docTR, tesseract, etc. |
| - Tesseract Lang | Language code to use when OCR model is set to "tesseract". |
| - Timeout | Number specifying the timeout in seconds for the ingestion request. Default is 0 (no timeout). |
Output
The node outputs JSON data representing the response from the ingestion API endpoint. This typically includes metadata about the ingestion job or confirmation of successful ingestion. The exact structure depends on the backend API but generally contains status information and identifiers related to the ingested documents or job.
The node does not output binary data.
Dependencies
- Requires access to the backend API service that manages document collections and ingestion.
- Requires an API authentication token or API key credential configured in n8n to authorize requests.
- The local file system must be accessible to the environment running the node to read files from the specified root directory.
- No additional external services are required unless specific ingestion options (like OCR) depend on them via the backend.
Troubleshooting
Common Issues:
- Incorrect
Collection ID: Ensure the collection ID exists and is accessible with the provided credentials. - Invalid
Root DirorGlobpattern: Verify the path exists and the glob pattern correctly matches intended files. - Timeout errors: Increase the timeout value if ingestion takes longer than expected.
- Permission errors: Confirm the API key has sufficient permissions to ingest documents into the collection.
- Unsupported OCR model or language: Use supported values for OCR model and language options.
- Incorrect
Error Messages:
- "Collection not found": The specified collection ID does not exist or is inaccessible.
- "No files matched the glob pattern": The glob pattern did not match any files in the root directory.
- "Timeout exceeded": The ingestion process took longer than the allowed timeout.
- "Unauthorized" or "Forbidden": Authentication failed or insufficient permissions.
Resolving these usually involves verifying input parameters, checking API credentials, and adjusting timeout or option settings.
Links and References
- Glob Pattern Syntax
- Optical Character Recognition (OCR)
- Retrieval-Augmented Generation (RAG)
- Documentation for the backend API managing document ingestion (refer to your platform's API docs).