Text Embeddings

Convert text to embeddings using Transformer.js locally without external dependencies

Overview

This node converts input text into numerical embeddings using Transformer.js models locally, without relying on external API calls. It supports different pre-trained embedding models to generate vector representations of text, which are useful for tasks like semantic search, clustering, recommendation systems, or similarity comparisons.

Typical use cases include:

  • Generating embeddings for short texts or documents to enable semantic similarity searches.
  • Creating vector representations for downstream machine learning or data analysis workflows.
  • Normalizing embeddings to improve the quality of similarity calculations.
  • Optionally including metadata about the generated embeddings for auditing or debugging.

For example, you can input a product description and get its embedding vector to find similar products based on semantic content.

Properties

Name Meaning
Text Input The raw text string to convert into embeddings.
Model The embedding model to use:
- all-MiniLM-L6-v2 (Recommended): Lightweight, 384 dimensions, fast and efficient.
- all-mpnet-base-v2: Higher quality, 768 dimensions, slower but more accurate.
Output Field The name of the output JSON field where the embeddings will be stored.
Normalize Embeddings Whether to normalize the resulting embeddings vectors (recommended for similarity calculations).
Include Metadata Whether to add metadata about the embeddings such as model used, vector dimensions, normalization status, input text length, and generation timestamp.

Output

The node outputs an array of items, each containing the original input JSON extended with:

  • A new field (name configurable via Output Field) containing the embeddings vector as an array of numbers.
  • Optionally, if enabled, a metadata field named <Output Field>_metadata with details:
    • model: The embedding model used.
    • dimensions: Number of elements in the embedding vector.
    • normalized: Boolean indicating if embeddings were normalized.
    • text_length: Length of the input text.
    • generated_at: ISO timestamp when embeddings were created.

If an error occurs for an item (e.g., empty text), the output for that item includes an error field describing the issue.

The node does not output binary data.

Dependencies

  • Uses the @huggingface/transformers library's local pipeline feature-extraction models.
  • No external API calls or internet connection required once models are cached.
  • Requires sufficient local resources to load and run the selected transformer model.
  • No special n8n credentials or environment variables needed.

Troubleshooting

  • Common issues:

    • Empty input text: The node throws an error unless "Continue On Fail" is enabled, in which case it marks the item with an error message.
    • Model loading failure: If the specified model cannot be loaded (e.g., due to missing files or incompatible environment), the node throws an error indicating failure to load the embedding model.
    • Performance: Larger models (like all-mpnet-base-v2) require more memory and CPU time; ensure your environment can handle them.
  • Error messages:

    • "Text is empty": Input text was blank or whitespace only. Provide valid text or enable "Continue On Fail".
    • "Failed to load embedding model: ...": Indicates problems loading the chosen model. Verify model availability and environment compatibility.

Links and References

Discussion