embedding

stable

Vector math utilities for working with embedding vectors. Compute similarity, distance, and arithmetic operations on float arrays returned by AI embedding models.

use plugin embedding::{cosine_similarity, euclidean_distance, dot_product, …}

14 functions AI & ML

/ filter jk navigate Esc clear

Functions (14)

cosine_similarity Cosine similarity between two vectors
euclidean_distance Euclidean (L2) distance between two vectors
dot_product Dot product of two vectors
normalize Scale vector to unit length
magnitude Euclidean length of a vector
batch_cosine Cosine similarity of query against many vectors
topk_similar Top-k most similar vectors by cosine score
average_vectors Element-wise average of a set of vectors
add_vectors Element-wise addition of two vectors
scale_vector Multiply every element by a scalar
dimension Number of elements in a vector
zeros Create a zero vector of length n
manhattan_distance Manhattan (L1) distance between two vectors
random_unit Generate a pseudo-random unit vector of length n

Overview

embedding is a dependency-free toolkit of vector math primitives for working with the dense float arrays that AI embedding models produce. There are no handles or stateful objects: a vector is just an ordinary array of numbers, and every function takes and returns plain arrays (or scalars), so you can store, compare, and pass embeddings around freely. Reach for it whenever you need to score similarity, measure distance, do element-wise arithmetic, or build small helpers like centroids and zero/unit vectors on top of model output.

The mental model is simple: get embedding vectors from your model, then use the similarity and distance functions (cosine_similarity, euclidean_distance, dot_product, manhattan_distance) to rank them, the arithmetic helpers (add_vectors, subtract_vectors, scale_vector, average_vectors, normalize) to transform them, and the batch helpers (batch_cosine, topk_similar) to run a query against a whole corpus at once.

Common patterns

Run a query embedding against a corpus and pull back the closest matches:

use plugin embedding::{topk_similar}

let query  = [0.12, 0.83, 0.04, 0.55]
let corpus = [
  [0.10, 0.80, 0.05, 0.50],
  [0.90, 0.10, 0.20, 0.05],
  [0.15, 0.78, 0.02, 0.60]
]
let hits = topk_similar(query, corpus, 2)
for hit in hits {
  print("doc {hit["index"]} scored {hit["score"]}")
}

Build a centroid embedding for a cluster, then score how well a new vector fits:

use plugin embedding::{average_vectors, normalize, cosine_similarity}

let cluster  = [[1.0, 0.2, 0.1], [0.9, 0.3, 0.0], [1.1, 0.1, 0.2]]
let centroid = normalize(average_vectors(cluster))

let candidate = normalize([1.0, 0.25, 0.1])
print("fit: {cosine_similarity(centroid, candidate)}")

Do classic vector arithmetic (the "king - man + woman ~ queen" analogy):

use plugin embedding::{add_vectors, subtract_vectors, normalize, cosine_similarity}

let king  = [0.90, 0.10, 0.80]
let man   = [0.80, 0.00, 0.70]
let woman = [0.10, 0.90, 0.10]
let queen = [0.20, 0.95, 0.20]

let guess = normalize(add_vectors(subtract_vectors(king, man), woman))
print("analogy match: {cosine_similarity(guess, normalize(queen))}")

cosine_similarity(v1, v2) → number

Cosine similarity between two vectors

Returns a value in [-1, 1] representing how similar the directions of two vectors are. 1.0 means identical direction, 0.0 means orthogonal. Commonly used to compare AI embedding outputs.

use plugin embedding::{cosine_similarity}

let a = [1.0, 0.0, 0.0]
let b = [1.0, 0.0, 0.0]
let c = [0.0, 1.0, 0.0]

print(cosine_similarity(a, b))  // 1.0
print(cosine_similarity(a, c))  // 0.0

Cosine similarity ignores magnitude, so a vector and its scaled copy are still perfectly similar:

use plugin embedding::{cosine_similarity, scale_vector}

let v = [0.3, 0.7, 0.1]
print(cosine_similarity(v, scale_vector(v, 10.0)))  // 1.0

euclidean_distance(v1, v2) → number

Euclidean (L2) distance between two vectors

Computes the straight-line distance between two points in n-dimensional space (L2 norm of the difference vector).

use plugin embedding::{euclidean_distance}

let a = [0.0, 0.0]
let b = [3.0, 4.0]
print(euclidean_distance(a, b))  // 5.0

dot_product(v1, v2) → number

Dot product of two vectors

Returns the sum of element-wise products. Equivalent to cosine_similarity * |v1| * |v2|. Useful when vectors are already normalized.

use plugin embedding::{dot_product, normalize}

let a = normalize([3.0, 4.0])
let b = normalize([4.0, 3.0])
print(dot_product(a, b))

normalize(vec) → table

Scale vector to unit length

Returns a new vector scaled to unit length (magnitude 1.0). Returns the zero vector unchanged if the input has zero magnitude.

use plugin embedding::{normalize, magnitude}

let v = normalize([3.0, 0.0, 4.0])
print(magnitude(v))  // 1.0

Normalizing both operands first turns dot_product into cosine similarity, which is handy when you want to precompute unit vectors once and reuse them:

use plugin embedding::{normalize, dot_product}

let a = normalize([2.0, 1.0])
let b = normalize([1.0, 2.0])
print(dot_product(a, b))  // same as cosine_similarity of the originals

magnitude(vec) → number

Euclidean length of a vector

Returns the Euclidean length (L2 norm) of a vector.

use plugin embedding::{magnitude}

print(magnitude([3.0, 4.0]))  // 5.0

batch_cosine(query, vectors) → table

Cosine similarity of query against many vectors

Computes cosine similarity between query and every vector in vectors. Returns a table of floats, one per input vector, in the same order.

use plugin embedding::{batch_cosine}

let query = [1.0, 0.0, 0.0]
let corpus = [
  [1.0, 0.0, 0.0],
  [0.0, 1.0, 0.0],
  [0.707, 0.707, 0.0]
]
let scores = batch_cosine(query, corpus)
print(scores)

Use it to find the single best match by scanning the returned scores yourself:

use plugin embedding::{batch_cosine, dimension}

let query  = [1.0, 0.0, 0.0]
let corpus = [[0.0, 1.0, 0.0], [0.9, 0.1, 0.0], [0.5, 0.5, 0.0]]
let scores = batch_cosine(query, corpus)

let best = 1
for i in 1..=dimension(scores) {
  if scores[i] > scores[best] { best = i }
}
print("best vector is #{best}")

topk_similar(query, vectors, k) → table

Top-k most similar vectors by cosine score

Returns the top-k most similar vectors by cosine score. Each result row contains {index, score} where index is the 0-based position in vectors.

use plugin embedding::{topk_similar}

let query  = [1.0, 0.0]
let corpus = [[0.9, 0.1], [0.1, 0.9], [0.8, 0.2], [0.0, 1.0]]
let top2   = topk_similar(query, corpus, 2)
print(top2)
// [{index: 0, score: ~0.99}, {index: 2, score: ~0.97}]

average_vectors(vectors) → table

Element-wise average of a set of vectors

Returns a new vector whose elements are the element-wise mean across all input vectors. Useful for computing a centroid embedding.

use plugin embedding::{average_vectors}

let vs  = [[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]]
let avg = average_vectors(vs)
print(avg)  // [0.667, 0.667]

add_vectors(v1, v2) → table / subtract_vectors(v1, v2)

Element-wise addition of two vectors

Element-wise addition or subtraction of two same-length vectors. Useful for vector arithmetic (e.g., word analogy: king - man + woman).

use plugin embedding::{add_vectors, subtract_vectors}

let king  = [0.9, 0.1, 0.8]
let man   = [0.8, 0.0, 0.7]
let woman = [0.1, 0.9, 0.1]
let queen = add_vectors(subtract_vectors(king, man), woman)
print(queen)

scale_vector(vec, scalar) → table

Multiply every element by a scalar

Multiplies every element of vec by scalar. Returns a new vector.

use plugin embedding::{scale_vector}

let v      = [1.0, 2.0, 3.0]
let halved = scale_vector(v, 0.5)
print(halved)  // [0.5, 1.0, 1.5]

dimension(vec) → int

Number of elements in a vector

Returns the number of elements in the vector.

use plugin embedding::{dimension, zeros}

let v = zeros(128)
print(dimension(v))  // 128

zeros(n) → table / ones(n)

Create a zero vector of length n

Create a vector of n zeros or ones. Useful as baselines or for initializing accumulators.

use plugin embedding::{zeros, ones}

let z = zeros(4)
let o = ones(4)
print(z)  // [0.0, 0.0, 0.0, 0.0]
print(o)  // [1.0, 1.0, 1.0, 1.0]

manhattan_distance(v1, v2) → number

Manhattan (L1) distance between two vectors

Returns the L1 (taxicab) distance: the sum of absolute element-wise differences.

use plugin embedding::{manhattan_distance}

let a = [1.0, 2.0, 3.0]
let b = [4.0, 0.0, 1.0]
print(manhattan_distance(a, b))  // 7.0

random_unit(n) → table

Generate a pseudo-random unit vector of length n

Generates a pseudo-random unit vector of length n using a deterministic LCG seeded from n. The result has magnitude 1.0.

use plugin embedding::{random_unit, magnitude}

let v = random_unit(3)
print(magnitude(v))  // 1.0

View source code