h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Join our community

Actions198

Overview

This node operation creates a job to parse files that were previously uploaded in the "Agent_only" ingest mode and converts them into a standard parsed format within a specified collection. This is useful when documents have been ingested in a restricted or preliminary mode ("Agent_only") and need to be processed further for full parsing and indexing.

Typical use cases include:

Post-processing documents uploaded by an agent system before making them available for search or analysis.
Automating the transition of documents from a raw upload state to a fully parsed and searchable state.
Managing document ingestion workflows where initial upload and parsing are decoupled.

For example, after uploading confidential documents via an agent-only channel, you can use this operation to create a job that parses these documents into your main collection for downstream AI-powered search or summarization.

Properties

Name	Meaning
Collection ID	String ID of the collection to add the ingested documents into.
Document ID	String ID of the document to be parsed.
Additional Options	Optional settings to customize the parsing job:
- Audio Input Language	Language of audio files (default: "auto").
- Chunk By Page	Whether each page will be treated as a separate chunk (boolean).
- Gen Doc Questions	Whether to auto-generate sample questions for each document using a large language model (LLM).
- Gen Doc Summaries	Whether to auto-generate document summaries using LLM.
- Handwriting Check	Whether to check pages for handwriting and use specialized models if found (boolean).
- Keep Tables As One Chunk	Whether tables identified by the parser should be kept as a single chunk (boolean).
- Ocr Model	Method to extract text from images using AI-enabled OCR models (default: "auto").
- Permissions	List of usernames having permissions to the document (string).
- Restricted	Whether the document should be restricted only to certain users (boolean).
- Tesseract Lang	Language to use when OCR model is set to "tesseract".
- Timeout	Timeout for the job in seconds (number).

Output

The output of this operation is the response from the API indicating the creation of the parsing job. The json output field typically contains details about the created job such as its unique identifier, status, and metadata related to the parsing task.

If the job involves binary data (e.g., file uploads), it would represent the uploaded document content, but in this case, the operation is about creating a job, so the output is JSON metadata about the job.

Dependencies

Requires an API key credential for authentication with the H2O GPT Enterprise API.
The node sends HTTP POST requests to the endpoint /ingest/agent_only_to_standard/job.
Proper configuration of the API base URL and credentials in n8n is necessary.
The target collection and document must already exist and be accessible.

Troubleshooting

Missing Required Fields: Ensure both collection_id and document_id are provided; otherwise, the API will reject the request.
Permission Errors: If the user or API key lacks permission to access the collection or document, the job creation will fail.
Timeouts: Long-running jobs may require increasing the timeout property to avoid premature termination.
Invalid OCR Model or Language: Specifying unsupported OCR models or languages may cause parsing errors.
Restricted Documents: If the document is marked as restricted, ensure the correct permissions are set to allow processing.

Links and References

This summary is based on static analysis of the node's properties and routing configuration for the "Document Ingestion" resource and the "Creates a Job to Parse Files Uploaded in 'Agent_only' Ingest Mode" operation.

h2oGPTeInstall