h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

This node operation creates a job to insert an existing document into a specified collection within the system. It is useful when you want to add documents that are already stored or processed elsewhere into a particular collection for further use, such as search, analysis, or AI-powered querying.

Typical scenarios include:

  • Organizing documents by grouping them into collections.
  • Adding new content to an existing collection without re-uploading or re-processing the document.
  • Automating workflows where documents are ingested and then assigned to collections asynchronously via jobs.

For example, if you have a document ID from a previous upload or ingestion step, you can create a job to insert this document into a target collection, optionally controlling how the document is chunked, whether to copy it, or generate summaries/questions automatically.

Properties

Name Meaning
Collection ID The unique identifier of the collection where the document will be inserted. This is required to specify the target collection.
Document ID The unique identifier of the document to be inserted into the collection. Required to specify which document to add.
Additional Options A set of optional parameters to customize the insertion job:
- Chunk By Page Boolean flag indicating whether each page of the document should be treated as a separate chunk. If true, the option to keep tables as one chunk is ignored.
- Copy Document Boolean flag indicating whether to save a new copy of the document in the collection.
- Gen Doc Questions Boolean flag to auto-generate sample questions for the document using a large language model (LLM).
- Gen Doc Summaries Boolean flag to auto-generate document summaries using an LLM.
- Handwriting Check Boolean flag to check pages for handwriting and use specialized models if handwriting is detected.
- Ingest Mode Option to select the ingest mode: standard (files ingested for retrieval-augmented generation) or agent_only (bypasses standard ingestion).
- Keep Tables As One Chunk Boolean flag indicating whether tables identified by the table parser should be kept as a single chunk.
- Ocr Model String specifying which OCR model to use for extracting text from images. Default is "auto".
- Tesseract Lang String specifying the language to use when the OCR model is set to "tesseract".
- Timeout Number specifying the timeout in seconds for the job execution. Default is 0 (no timeout).

Output

The output of this operation is the full HTTP response from the API call that creates the job to insert the document into the collection. Typically, this includes details about the created job such as its ID, status, and any metadata returned by the server.

The json output field contains the job information object. This allows downstream nodes or workflows to monitor the job status or retrieve results once the job completes.

No binary data is produced by this operation.

Dependencies

  • Requires an API key credential for authentication with the external service.
  • The node sends a POST request to the endpoint /collections/{collection_id}/documents/insert_job.
  • The user must have appropriate permissions to create jobs and modify collections/documents on the connected system.

Troubleshooting

  • Missing or invalid Collection ID or Document ID: Ensure both IDs are provided and valid; otherwise, the API will reject the request.
  • Permission errors: The authenticated user must have rights to insert documents into the specified collection.
  • Timeouts: If the job creation takes too long, consider increasing the timeout value or checking network connectivity.
  • Invalid option combinations: For example, setting chunk_by_page to true ignores keep_tables_as_one_chunk; ensure options are set correctly according to your needs.
  • API errors: Review the error message returned by the API for specific issues like invalid document state or collection restrictions.

Links and References

  • Refer to the API documentation of the external service for detailed information on job creation and document management.
  • Consult the OCR model documentation for supported values and usage of the ocr_model and tesseract_lang options.
  • For best practices on document chunking and ingestion modes, see the service's ingestion guidelines.

Discussion