Custom Embedding (Standalone)

Generate text embeddings using custom embedding API (Standalone Node)

Overview

This node generates text embeddings or calculates similarity scores between texts using a custom embedding API. It supports processing either a single text or multiple texts at once and offers two main operations: generating embeddings for input texts or computing similarity between exactly two texts.

Common scenarios where this node is useful include:

  • Creating vector representations of texts for downstream machine learning tasks such as classification, clustering, or search.
  • Measuring semantic similarity between two pieces of text, useful in applications like duplicate detection, paraphrase identification, or recommendation systems.

Practical examples:

  • Generate embeddings for product descriptions to enable thematic clustering or search.
  • Calculate similarity between user queries and document snippets to improve search relevance.

Properties

Name Meaning
Text Single text string to generate an embedding for (used when Input Type is "Single Text").
Input Type Choose whether to process a single text ("Single Text") or multiple texts ("Multiple Texts").
Texts Multiple texts separated by newlines to generate embeddings for (used when Input Type is "Multiple Texts").
Endpoint Select the API endpoint to use:
- "Embed": Generate embeddings for texts
- "Similarity": Calculate similarity between two texts
Task Type Specify the task type for embedding generation:
- Classification
- Clustering
- Search Query
- Search Document
- Default (no prefix)

Output

The output JSON structure depends on the selected endpoint:

  • Embed endpoint:

    {
      "texts": [/* array of input texts */],
      "embeddings": [/* array of embedding vectors, each an array of numbers */],
      "endpoint": "embed",
      "task": "classification" | "clustering" | "search_query" | "search_document" | "default",
      "model": "string", // model name or "unknown"
      "count": number, // number of embeddings returned
      "dimensions": number, // dimensionality of each embedding vector
      "metadata": {
        "baseUrl": "host:port",
        "timestamp": "ISO timestamp"
      }
    }
    
  • Similarity endpoint:

    {
      "texts": [text1, text2],
      "similarity": number, // similarity score between the two texts
      "endpoint": "similarity",
      "task": "classification" | "clustering" | "search_query" | "search_document" | "default",
      "text1": "string", // first text (may be echoed from response)
      "text2": "string", // second text (may be echoed from response)
      "metadata": {
        "baseUrl": "host:port",
        "timestamp": "ISO timestamp"
      }
    }
    

No binary data output is produced by this node.

Dependencies

  • Requires access to a custom embedding API server specified by host and port.
  • Requires an API key credential for authentication with the embedding API.
  • The node makes HTTP POST requests to either /embed or /similarity endpoints on the configured server.
  • Proper configuration of the API host, port, and API key is necessary in the node credentials.

Troubleshooting

  • Missing Host or Port: The node will throw an error if the API host or port is not provided in credentials.
  • Invalid Text Input: If no valid text is provided (empty or whitespace-only), the node throws an error.
  • Similarity Endpoint Requires Exactly Two Texts: When using the similarity endpoint, exactly two texts must be provided; otherwise, an error is thrown.
  • Unsupported Endpoint: Selecting an endpoint other than "embed" or "similarity" results in an error.
  • Invalid API Response: If the API response does not contain expected fields (embeddings for embed or similarity for similarity), an error is raised.
  • Network or Authentication Errors: Failures in HTTP requests due to network issues or invalid API keys will cause errors.
  • Use the "Continue On Fail" option to handle errors gracefully and continue processing remaining items.

Links and References

Discussion