h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Join our community

Actions198

Overview

This node operation creates a job to add files from an Azure Blob Storage container into a specified document collection. It is designed for scenarios where you want to ingest documents stored in Azure Blob Storage into a collection for further processing, searching, or analysis within the system.

Typical use cases include:

Automating the ingestion of large volumes of documents stored in Azure Blob Storage into a centralized collection.
Preparing documents for AI-powered search, summarization, or question-answering workflows.
Managing document collections by importing external data sources seamlessly.

For example, a user might have a set of PDF reports stored in an Azure Blob Storage container and wants to ingest them into a collection to enable semantic search and generate summaries automatically.

Properties

Name	Meaning
Collection ID	String ID of the collection to add the ingested documents into (required).
Container	Name of the Azure Blob Storage container where the files are located (required).
Paths	Path or list of paths to files or directories within the Azure Blob Storage container (required).
Account Name	Name of the Azure storage account (required).
Additional Options	A collection of optional parameters to customize the ingestion job:
- Audio Input Language	Language of audio files; default is "auto" for automatic detection.
- Chunk By Page	Boolean indicating whether each page should be treated as a separate chunk. If true, `keep_tables_as_one_chunk` is ignored.
- Credentials	JSON object containing Azure credentials; if the container is private, either an account key or SAS token must be provided here.
- Gen Doc Questions	Boolean to auto-generate sample questions for each document using a large language model (LLM).
- Gen Doc Summaries	Boolean to auto-generate document summaries using LLM.
- Handwriting Check	Boolean to check pages for handwriting; specialized models will be used if handwriting is detected.
- Ingest Mode	Option to select the ingest mode: "standard" (files ingested for retrieval-augmented generation) or "agent_only" (bypasses standard ingestion).
- Keep Tables As One Chunk	Boolean indicating whether tables identified by the parser should be kept as a single chunk.
- Metadata	JSON object with metadata to associate with the ingested documents.
- Ocr Model	Method to extract text from images using AI-enabled OCR models; default is "auto".
- Tesseract Lang	Language code to use when OCR model is set to "tesseract".
- Timeout	Timeout in seconds for the ingestion job; default is 0 (no timeout).

Output

The node outputs the response from the API call that creates the ingestion job. The output includes details about the created job such as its unique identifier, status, and any relevant metadata returned by the server.

The main output field is:

json: Contains the full response data from the job creation request, including job ID and status.

There is no binary data output for this operation.

Dependencies

Requires access to an API endpoint that manages document ingestion jobs.
Requires an API authentication token or API key credential configured in n8n to authorize requests.
For private Azure Blob Storage containers, valid Azure credentials (account key or SAS token) must be provided in the credentials property.
Network connectivity to Azure Blob Storage and the ingestion API service is necessary.

Troubleshooting

Authentication errors: Ensure that the API key credential is correctly configured and has permissions to create ingestion jobs.
Azure credentials issues: If the container is private, verify that the provided Azure credentials (account_key or sas_token) are valid and have access to the specified container.
Invalid paths or container names: Double-check the container name and file paths; incorrect values will cause the ingestion job to fail.
Timeouts: If the ingestion process takes too long, consider increasing the timeout value or checking network stability.
Unsupported file formats: Ensure that the files in the specified paths are supported by the ingestion system.

Links and References

This summary covers the logic and configuration of the "Creates a Job to Add Files From the Azure Blob Storage Into a Collection" operation under the Document Ingestion resource.

h2oGPTeInstall