Cosine Similarity

Calculates cosine similarity between two arrays of vectors

Overview

This node calculates the cosine similarity between two arrays of vectors. Cosine similarity is a measure that calculates the cosine of the angle between two non-zero vectors in an inner product space, which is commonly used to determine how similar two vectors are regardless of their magnitude.

Typical use cases include:

  • Comparing text embeddings or feature vectors in machine learning workflows.
  • Finding similar items or documents based on vector representations.
  • Filtering pairs of vectors by a minimum similarity threshold to identify relevant matches.

For example, given two sets of vectors representing different items, this node outputs all pairs whose cosine similarity exceeds a specified threshold, helping users identify closely related pairs.

Properties

Name Meaning
Array of Vectors A The first array of vectors to compare, e.g., [[1,2,3],[4,5,6]].
Array of Vectors B The second array of vectors to compare, e.g., [[7,8,9],[10,11,12]].
Similarity Threshold Minimum cosine similarity (between 0 and 1) required for a pair to be included in output.

Output

The node outputs a JSON object with a single field matches, which is an array of objects. Each object represents a pair of vectors from the input arrays that meet or exceed the similarity threshold, structured as:

{
  "matches": [
    {
      "vectorA": [/* vector from Array A */],
      "vectorB": [/* vector from Array B */],
      "similarity": /* cosine similarity value */
    },
    ...
  ]
}

No binary data is produced by this node.

Dependencies

  • No external services or APIs are required.
  • The node depends on standard JavaScript math functions and n8n's workflow utilities for error handling.
  • No special environment variables or credentials are needed.

Troubleshooting

  • Invalid JSON format for vector arrays: This error occurs if the input vectors are provided as strings but cannot be parsed into valid JSON arrays. Ensure the input is correctly formatted JSON.
  • Array A/B must be a valid array of vectors: The inputs must be arrays containing only arrays (vectors). Check that each element is itself an array.
  • All vectors must have the same dimension: All vectors within each array must have the same length. Mismatched vector sizes will cause errors.
  • Empty arrays: Providing empty arrays will trigger validation errors; ensure both arrays contain at least one vector.

To resolve these issues, verify the input format and consistency of vector dimensions before running the node.

Links and References

Discussion