Models

Embedding Model

An embedding model converts text, images, or other data into fixed-length numerical vectors in a high-dimensional space, where semantically similar items are geometrically close to each other.

An embedding model is a neural network trained to map inputs—most commonly text, but also images, audio, or structured records—into dense, fixed-size numerical vectors called embeddings. These vectors encode semantic and syntactic meaning so that items with similar meaning occupy nearby regions of the vector space, while dissimilar items are far apart.

Training typically uses contrastive learning objectives. Sentence-embedding models such as Sentence-BERT (2019) pull semantically similar sentence pairs together in vector space while pushing dissimilar ones apart. Cross-modal models such as CLIP (OpenAI, 2021) align text and image representations by training on hundreds of millions of image-caption pairs. Output vectors commonly range from 384 to 3,072 dimensions. At inference time, measuring similarity between two embeddings reduces to a dot product or cosine similarity—an operation fast enough to run across millions of candidates in milliseconds with approximate nearest-neighbor indexes.

Embeddings underpin virtually every modern retrieval and search system. Because semantic similarity becomes a geometric distance, they enable large-scale approximate nearest-neighbor search and are the foundation of retrieval-augmented generation (RAG) pipelines, semantic search engines, recommendation systems, duplicate detection, and document clustering workflows.

As of mid-2025, leading text embedding models included OpenAI text-embedding-3-large, Cohere Embed v3, Google text-embedding-004, and open-source alternatives such as the BGE family (BAAI), E5-mistral (Microsoft), and GTE-Qwen (Alibaba). The Massive Text Embedding Benchmark (MTEB) leaderboard tracks model quality across dozens of retrieval, classification, and clustering tasks, with top models achieving strong zero-shot multilingual performance across more than 50 languages.

Example

A customer support team encodes its entire knowledge base of 50,000 articles into embeddings stored in a vector database; when a user submits a question, the query is embedded and the five nearest articles are retrieved in milliseconds and passed to a language model to draft a contextual answer.

Latest news on this topic

NVIDIA showed how to fine-tune an embedding model for a specific domain in a day2026-05-02 Perplexity releases pplx-embed: embedding models that change the rules of search2026-02-27

← Glossary