KDnuggets→ original

Vector search in Python from scratch: embeddings and similarity search

Building a search system based on vector similarity is easier than it seems. Learn how to create a vector search engine in Python using embeddings, cosine simil

Vector search in Python from scratch: embeddings and similarity search
Source: KDnuggets. Collage: Hamidun News.
◐ Listen to article

Vector search — this is a method of searching by semantic meaning, not by exact keyword match. Instead of traditional search, we convert data into vector representation (embeddings) and search for the most similar vectors in high-dimensional space. This is the foundation of RAG systems, semantic search, recommendation systems, and modern AI applications.

What are embeddings and similarity

Embeddings are numeric representations of text in the form of an array from 300 to 1500+ numbers. The phrase "cat sits on a couch" and "cat rests on an armchair" will have embeddings located close to each other in vector space, even though the words used are completely different. To measure the distance between vectors, cosine similarity is used — it shows the angle between two vectors. The value varies from 0 (complete difference) to 1 (identity), which makes it easy to rank search results.

Key components of vector search

A vector search engine consists of several critical parts, each important for correct operation:

  • Embedding generator — converts text into a vector (uses OpenAI, HuggingFace, SentenceTransformers)
  • Vector storage — saves embeddings (in memory, SQLite, or specialized databases like Pinecone, Weaviate)
  • Similarity function — calculates distance between vectors (cosine, Euclidean, dot product)
  • Retrieval logic — finds top-K similar results and ranks by relevance
  • Indexing — speeds up search through hierarchical data structures

Step-by-step implementation in Python

On Python this is implemented relatively simply. First, we choose a model for generating embeddings — for example, SentenceTransformers from Hugging Face, which works locally without needing API keys. Then for each document in our collection, we generate an embedding and save it to a structure (for example, a dictionary or DataFrame). When a user makes a query, we generate an embedding of their question and compare it with all stored embeddings. We calculate cosine similarity for each document, sort results in descending order of similarity, and return the top-5 or top-10 most relevant documents. The entire process takes milliseconds on small datasets.

"Vector search is not magic, but the application of simple linear

algebra with optimizations for speed."

Optimization and scaling

On small datasets (up to 10K documents), you can store embeddings directly in memory or in a regular database. But when scaling to millions of documents, you need specialized vector databases (Pinecone, Weaviate, Milvus, Qdrant) with built-in indexing for fast search. Indexing allows you to search not through linear iteration of all vectors, but through hierarchical structures (HNSW, IVF-PQ), which gives a 100–1000x speedup. The choice of embedding model is also critical — more powerful models (for example, OpenAI text-embedding-3-large) provide more accurate semantic understanding, but are slower and more expensive. For production, compact, optimized models that work locally are usually chosen.

What this means

Vector search has become an industry standard for AI applications. If you're building a chatbot with memory, a smart search engine, a recommendation system, a RAG application, or a plagiarism detection system — you need to understand vector search inside out. Starting from scratch on Python, you'll gain fundamental understanding of how modern vector databases work and why they're so powerful.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…