h2oGPTe

h2oGPTe is an AI-powered search assistant for your internal teams to answer questions gleaned from large volumes of documents, websites and workplace content.

Join our community

Actions198

Overview

This node operation performs a semantic search within a specified collection to find chunks of documents that are related to a given message. It uses vectorized representations (embeddings) of the message to run the semantic search, allowing it to identify relevant document chunks based on meaning rather than just keyword matching.

This is particularly useful in scenarios where you want to retrieve contextually relevant pieces of information from large document collections, such as:

Enhancing chatbot or virtual assistant responses by fetching relevant document excerpts.
Building knowledge bases or FAQ systems that return semantically similar content.
Performing research or data analysis by finding related document segments based on conceptual similarity.

For example, if you have a collection of technical manuals and a user query about "error handling," this operation can find document chunks that semantically relate to error handling concepts, even if the exact keywords do not appear.

Properties

Name	Meaning
Collection ID	The unique identifier of the collection within which to perform the semantic search.
Vectors	A JSON list of vectorized messages (embeddings) used as the query input for semantic search.
Topics	A string listing document IDs to filter which documents in the collection should be searched.
Additional Options	Optional parameters including: • Offset: Number of chunks to skip before returning results. • Limit: Maximum number of results to return. • Cut Off: Distance threshold to exclude matches with higher distances.

Output

The output contains a JSON array of chunks from the collection that are semantically related to the input vectors. Each chunk includes metadata such as relevance and similarity scores indicating how closely it matches the query vectors.

If binary data is included (not indicated here), it would represent associated files or media linked to the chunks, but this operation primarily returns JSON data representing text chunks and their metadata.

Dependencies

Requires an API key credential for authentication to the external service providing the collection and semantic search capabilities.
The node sends HTTP POST requests to the endpoint /collections/{collection_id}/chunks/match of the configured API base URL.
The vectors property must contain properly formatted embeddings compatible with the backend semantic search engine.

Troubleshooting

Invalid Collection ID: If the collection ID does not exist or is incorrect, the API will likely return an error. Verify the collection ID is correct.
Malformed Vectors JSON: Ensure the vectors input is a valid JSON array of embeddings; otherwise, the request may fail.
Empty Results: If no chunks match, consider adjusting the cut_off value to allow more distant matches or verify that the topics filter is not overly restrictive.
Authentication Errors: Confirm that the API key credential is correctly configured and has permissions to access the collection.
Timeouts or Rate Limits: Large queries or high limits may cause timeouts or rate limiting; reduce the limit or offset accordingly.

Links and References

Semantic Search Concepts
Vector Embeddings Overview
Documentation for the external API (not provided here) should be consulted for detailed parameter and response schema information.

h2oGPTeInstall