h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

This node operation creates a job to parse files that were previously uploaded in the "Agent_only" ingest mode and converts them into a standard parsed format within a specified collection. This is useful when documents have been ingested in a restricted or preliminary mode ("Agent_only") and need to be processed further for full parsing and indexing.

Typical use cases include:

  • Post-processing documents uploaded by an agent system before making them available for search or analysis.
  • Automating the transition of documents from a raw upload state to a fully parsed and searchable state.
  • Managing document ingestion workflows where initial upload and parsing are decoupled.

For example, after uploading confidential documents via an agent-only channel, you can use this operation to create a job that parses these documents into your main collection for downstream AI-powered search or summarization.

Properties

Name Meaning
Collection ID String ID of the collection to add the ingested documents into.
Document ID String ID of the document to be parsed.
Additional Options Optional settings to customize the parsing job:
- Audio Input Language Language of audio files (default: "auto").
- Chunk By Page Whether each page will be treated as a separate chunk (boolean).
- Gen Doc Questions Whether to auto-generate sample questions for each document using a large language model (LLM).
- Gen Doc Summaries Whether to auto-generate document summaries using LLM.
- Handwriting Check Whether to check pages for handwriting and use specialized models if found (boolean).
- Keep Tables As One Chunk Whether tables identified by the parser should be kept as a single chunk (boolean).
- Ocr Model Method to extract text from images using AI-enabled OCR models (default: "auto").
- Permissions List of usernames having permissions to the document (string).
- Restricted Whether the document should be restricted only to certain users (boolean).
- Tesseract Lang Language to use when OCR model is set to "tesseract".
- Timeout Timeout for the job in seconds (number).

Output

The output of this operation is the response from the API indicating the creation of the parsing job. The json output field typically contains details about the created job such as its unique identifier, status, and metadata related to the parsing task.

If the job involves binary data (e.g., file uploads), it would represent the uploaded document content, but in this case, the operation is about creating a job, so the output is JSON metadata about the job.

Dependencies

  • Requires an API key credential for authentication with the H2O GPT Enterprise API.
  • The node sends HTTP POST requests to the endpoint /ingest/agent_only_to_standard/job.
  • Proper configuration of the API base URL and credentials in n8n is necessary.
  • The target collection and document must already exist and be accessible.

Troubleshooting

  • Missing Required Fields: Ensure both collection_id and document_id are provided; otherwise, the API will reject the request.
  • Permission Errors: If the user or API key lacks permission to access the collection or document, the job creation will fail.
  • Timeouts: Long-running jobs may require increasing the timeout property to avoid premature termination.
  • Invalid OCR Model or Language: Specifying unsupported OCR models or languages may cause parsing errors.
  • Restricted Documents: If the document is marked as restricted, ensure the correct permissions are set to allow processing.

Links and References


This summary is based on static analysis of the node's properties and routing configuration for the "Document Ingestion" resource and the "Creates a Job to Parse Files Uploaded in 'Agent_only' Ingest Mode" operation.

Discussion