h2oGPTe icon

h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Actions198

Overview

The "Updates Collection Settings" operation in the Collection resource allows users to completely replace or recreate the settings of a specified collection. This is useful when you want to modify how documents within a collection are processed, ingested, or handled by the system. For example, you might update chunking behavior, language detection for audio files, OCR model preferences, or enable features like auto-generating document summaries and questions.

Practical scenarios include:

  • Adjusting document chunking strategies to optimize search or retrieval.
  • Enabling handwriting detection on scanned documents.
  • Setting up automatic generation of document summaries or sample questions using large language models (LLMs).
  • Configuring link-following behavior when ingesting web content into the collection.

Properties

Name Meaning
Collection ID The unique identifier of the collection whose settings you want to update.
Additional Options A set of optional parameters to customize the collection settings:
- Audio Input Language Language code for audio files; defaults to automatic language detection ("auto"). Passing an empty string shows available choices.
- Chunk By Page Boolean flag indicating whether each page should be treated as a separate chunk. If true, the keep_tables_as_one_chunk option is ignored.
- Chunk Overlap Tokens Approximate number of tokens overlapping between successive chunks to maintain context continuity.
- Copy Document Whether to copy the document when importing an existing one.
- Follow Links Whether to import all web pages linked from a given URL (external links are ignored). Useful for crawling related content.
- Gen Doc Questions Enables automatic generation of sample questions for each document using LLMs.
- Gen Doc Summaries Enables automatic generation of document summaries using LLMs.
- Guardrails Settings JSON object specifying guardrails or privacy settings to apply during processing.
- Handwriting Check Enables checking pages for handwriting and uses specialized models if handwriting is detected.
- Keep Tables As One Chunk When tables are identified by the table parser, this option keeps all table tokens in a single chunk.
- Max Depth Maximum recursion depth when following links (only applicable if follow_links is true). A value of 0 means no links are followed.
- Max Documents Maximum number of documents to import when following links (only applicable if follow_links is true). Use 0 for automatic system defaults.
- Max Tokens Per Chunk Approximate maximum number of tokens per chunk for text-heavy document pages. Image chunks can be larger.
- Ocr Model Specifies which AI-enabled OCR model to use for extracting text from images. Passing an empty string shows available options.
- Root Dir Root directory path for document storage.
- Tesseract Lang Language code used when the OCR model is set to "tesseract". Passing an empty string shows available choices.

Output

The node outputs the full HTTP response from the API call that updates the collection settings. The main output field is json, which contains the updated collection settings data returned by the server after the update operation.

No binary data output is involved in this operation.

Dependencies

  • Requires an API key credential for authentication with the external service.
  • The base URL for API requests is derived from the configured credentials.
  • The node sends a PUT request to the endpoint /collections/{collection_id}/settings with the provided settings in the request body.

Troubleshooting

  • Missing or invalid Collection ID: Ensure the Collection ID property is correctly set and corresponds to an existing collection.
  • Invalid option values: Some properties have specific expected formats or enumerations (e.g., audio_input_language, ocr_model). Passing invalid values may cause errors.
  • API authentication errors: Verify that the API key credential is valid and has sufficient permissions to update collection settings.
  • Timeouts or network issues: Large updates or slow network connections may cause timeouts; consider adjusting timeout settings if available.
  • Conflicting options: For example, setting chunk_by_page to true ignores keep_tables_as_one_chunk. Be aware of such interactions to avoid unexpected behavior.

Links and References


This summary is based on static analysis of the node's source code and input property definitions for the "Updates Collection Settings" operation under the Collection resource.

Discussion