Overview
This node generates text embeddings or calculates similarity scores between texts using a custom embedding API. It supports processing either a single text or multiple texts at once and offers two main operations: generating embeddings for input texts or computing similarity between exactly two texts.
Common scenarios where this node is useful include:
- Creating vector representations of texts for downstream machine learning tasks such as classification, clustering, or search.
- Measuring semantic similarity between two pieces of text, useful in applications like duplicate detection, paraphrase identification, or recommendation systems.
Practical examples:
- Generate embeddings for product descriptions to enable thematic clustering or search.
- Calculate similarity between user queries and document snippets to improve search relevance.
Properties
| Name | Meaning |
|---|---|
| Text | Single text string to generate an embedding for (used when Input Type is "Single Text"). |
| Input Type | Choose whether to process a single text ("Single Text") or multiple texts ("Multiple Texts"). |
| Texts | Multiple texts separated by newlines to generate embeddings for (used when Input Type is "Multiple Texts"). |
| Endpoint | Select the API endpoint to use: - "Embed": Generate embeddings for texts - "Similarity": Calculate similarity between two texts |
| Task Type | Specify the task type for embedding generation: - Classification - Clustering - Search Query - Search Document - Default (no prefix) |
Output
The output JSON structure depends on the selected endpoint:
Embed endpoint:
{ "texts": [/* array of input texts */], "embeddings": [/* array of embedding vectors, each an array of numbers */], "endpoint": "embed", "task": "classification" | "clustering" | "search_query" | "search_document" | "default", "model": "string", // model name or "unknown" "count": number, // number of embeddings returned "dimensions": number, // dimensionality of each embedding vector "metadata": { "baseUrl": "host:port", "timestamp": "ISO timestamp" } }Similarity endpoint:
{ "texts": [text1, text2], "similarity": number, // similarity score between the two texts "endpoint": "similarity", "task": "classification" | "clustering" | "search_query" | "search_document" | "default", "text1": "string", // first text (may be echoed from response) "text2": "string", // second text (may be echoed from response) "metadata": { "baseUrl": "host:port", "timestamp": "ISO timestamp" } }
No binary data output is produced by this node.
Dependencies
- Requires access to a custom embedding API server specified by host and port.
- Requires an API key credential for authentication with the embedding API.
- The node makes HTTP POST requests to either
/embedor/similarityendpoints on the configured server. - Proper configuration of the API host, port, and API key is necessary in the node credentials.
Troubleshooting
- Missing Host or Port: The node will throw an error if the API host or port is not provided in credentials.
- Invalid Text Input: If no valid text is provided (empty or whitespace-only), the node throws an error.
- Similarity Endpoint Requires Exactly Two Texts: When using the similarity endpoint, exactly two texts must be provided; otherwise, an error is thrown.
- Unsupported Endpoint: Selecting an endpoint other than "embed" or "similarity" results in an error.
- Invalid API Response: If the API response does not contain expected fields (
embeddingsfor embed orsimilarityfor similarity), an error is raised. - Network or Authentication Errors: Failures in HTTP requests due to network issues or invalid API keys will cause errors.
- Use the "Continue On Fail" option to handle errors gracefully and continue processing remaining items.