h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

This node operation allows you to import an already stored document into an existing collection within the system. It is useful when you have documents that are already uploaded or processed and want to organize them by adding them to a specific collection for easier management, retrieval, or further processing.

Common scenarios include:

  • Organizing documents into thematic or project-based collections after initial upload.
  • Adding documents to collections without re-uploading or re-processing them.
  • Managing document grouping in knowledge bases or content management systems.

For example, if you have a document representing a research paper already stored in the system, you can use this operation to insert it into a "Research Papers" collection to keep your documents organized.

Properties

Name Meaning
Collection ID The unique identifier of the collection where the document will be inserted.
Document ID The unique identifier of the document to be inserted into the collection.
Additional Options Optional settings to customize how the document is inserted. These include:
- Chunk By Page: Whether to split the document into chunks by page (boolean).
- Copy Document: Whether to save a new copy of the document (boolean).
- Gen Doc Questions: Auto-generate sample questions for the document using a language model (boolean).
- Gen Doc Summaries: Auto-generate summaries for the document using a language model (boolean).
- Handwriting Check: Enable handwriting detection on pages (boolean).
- Ingest Mode: Mode of ingestion, either standard or agent_only.
- Keep Tables As One Chunk: Keep table tokens as a single chunk (boolean).
- Ocr Model: OCR model to use for text extraction from images (string, default "auto").
- Tesseract Lang: Language for Tesseract OCR if used (string).
- Timeout: Timeout for the operation in seconds (number).

Output

The output of this operation is the full HTTP response returned by the API call that inserts the document into the collection. This typically includes metadata about the insertion job or confirmation of success.

The main output field is:

  • json: Contains the response data from the server regarding the document insertion operation.

There is no indication that binary data is output by this operation.

Dependencies

  • Requires an API key credential for authentication with the backend service.
  • The node sends a PUT request to the endpoint /collections/{collection_id}/documents/{document_id} with query parameters based on additional options.
  • Proper configuration of the base URL and authentication credentials in n8n is necessary.

Troubleshooting

  • Invalid Collection ID or Document ID: Ensure that both IDs are correct and exist in the system; otherwise, the API will return an error.
  • Timeouts: If the operation times out, consider increasing the timeout value in the additional options.
  • Permission Issues: Make sure the API key has sufficient permissions to modify collections and documents.
  • Incorrect Option Values: Verify boolean flags and option values conform to expected types and allowed values.
  • OCR Model Errors: If specifying an OCR model, ensure the model name is valid and supported.

Links and References

  • Refer to the API documentation of the backend service for details on the /collections/{collection_id}/documents/{document_id} endpoint.
  • For OCR model options and usage, consult the service's OCR capabilities documentation.
  • For more information on document chunking and ingestion modes, see the service's ingestion guidelines.

Discussion