pgvector adds vector similarity search to PostgreSQL, making it a viable vector database for AI embeddings, semantic search, and recommendation systems — without adding a separate service.
Install pgvector
# Ubuntu/Debian
apt-get install postgresql-15-pgvector
# Enable in database
CREATE EXTENSION vector;Store Embeddings
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(1536) -- OpenAI text-embedding-3-small dimension
);
-- Insert with embedding
INSERT INTO documents (content, embedding)
VALUES ('PostgreSQL is a powerful database', '[0.1, 0.2, ...]');Similarity Search
-- Cosine similarity (best for normalized embeddings)
SELECT id, content,
1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
-- Operators:
-- <=> cosine distance
-- <-> L2 (Euclidean) distance
-- <#> negative inner productHNSW Index (pgvector 0.5+)
-- HNSW: faster queries, higher memory, best for most use cases
CREATE INDEX idx_docs_embedding_hnsw ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- IVFFlat: lower memory, good for large datasets
CREATE INDEX idx_docs_embedding_ivf ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);Python Integration
import openai, psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect('postgresql://localhost/mydb')
register_vector(conn)
# Generate embedding
resp = openai.embeddings.create(model='text-embedding-3-small', input='search query')
query_vec = resp.data[0].embedding
# Semantic search
cur = conn.cursor()
cur.execute(
'SELECT id, content FROM documents ORDER BY embedding <=> %s LIMIT 5',
(query_vec,)
)
results = cur.fetchall()Hybrid Search: Vector + Full-Text
-- Combine vector similarity with keyword filtering
SELECT id, content,
embedding <=> $1 AS vec_dist
FROM documents
WHERE search_vector @@ to_tsquery('english', 'postgres & performance')
ORDER BY embedding <=> $1
LIMIT 10;Key Takeaways
- HNSW index is faster at query time; IVFFlat uses less memory — use HNSW for most cases
- Index must be created after inserting data — empty-table indexes are inaccurate for IVFFlat
- Use
vector_cosine_opsfor normalized embeddings (OpenAI, Cohere);vector_l2_opsfor unnormalized - Hybrid search combining pgvector with pg_tsvector gives better results than either alone
JusDB Can Help
pgvector lets you add AI-powered search to your existing PostgreSQL database. JusDB can design your vector search schema and tune index parameters.
