Skip to content

Sparse Vector: svector since v0.3.0 ‚Äč

Unlike dense vectors, sparse vectors are very high-dimensional but contain few non-zero values.

Typically, sparse vectors can be created from:

  • Word co-occurrence matrices
  • Term frequency-inverse document frequency (TF-IDF) vectors
  • User-item interaction matrices
  • Network adjacency matrices

Sparse vectors in pgvecto.rs are called svector.

Here's an example of creating a table with a svector column and inserting values:

sql
CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding svector(10) NOT NULL
);

INSERT INTO items (embedding) VALUES ('[0.1,0,0,0,0,0,0,0,0,0]'), ('[0,0,0,0,0,0,0,0,0,0.5]');

Index can be created on svector type as well.

sql
CREATE INDEX your_index_name ON items USING vectors (embedding svector_l2_ops);

SELECT * FROM items ORDER BY embedding <-> '[0.3,0,0,0,0,0,0,0,0,0]' LIMIT 1;

We support three operators to calculate the distance between two svector values.

  • <-> (svector_l2_ops): squared Euclidean distance, defined as .
  • <#> (svector_dot_ops): negative dot product, defined as .
  • <=> (svector_cos_ops): cosine distance, defined as .

There is also a function to_svector to create a svector. It will set the value at the specified position.

sql
-- to_svector(dim: INTEGER, position: ARRAY, value: ARRAY) -> svector
SELECT to_svector(5, '{0, 4}', '{0.3, 0.5}');
-- [0.3, 0, 0, 0, 0.5]

Sparse vectors are