Embeddings
A practical introduction to embeddings, cosine similarity, and how they compare with BM25
Embeddings are one of the core ideas behind semantic search and modern RAG. Instead of treating text as plain strings, embeddings convert text into vectors so systems can compare meaning mathematically.
In this series, embeddings come after BM25. BM25 explains lexical retrieval, while embeddings explain semantic retrieval. Together, they form the most common first-stage retrieval pair in modern RAG systems.
In short:
- keyword search asks: do the same words appear?
- embeddings ask: do the meanings match?
For example, these two sentences may be semantically close even if they do not share the same wording:
Cats like sleepingKittens often rest during the day
With embeddings, a model can place them near each other in vector space.
What Are Embeddings?
An embedding is a vector representation of data such as a word, sentence, paragraph, or document.
[0.12, -0.83, 0.44, 0.91, ...]The individual dimensions are usually not interpretable by humans, but the overall position captures semantic information.
Common properties:
- semantically similar text tends to have nearby vectors
- semantically different text tends to have distant vectors
- embeddings can be used for search, clustering, recommendation, and classification
Why Embeddings Matter
Traditional keyword search works well when the query and the document share the same terms. But real users often search with:
- synonyms
- different phrasing
- natural language questions
- vague descriptions
Example:
Query:
How can I improve React initial page load performance?
Document:
Next.js initial rendering optimization strategies
The wording is different, but the meaning is similar. This is where embeddings are useful.
Typical Embedding Search Flow
The basic flow looks like this:
- split documents into chunks
- convert each chunk into a vector with an embedding model
- store those vectors in a vector index
- embed the user query
- compare the query vector with stored vectors and rank the closest matches
Common vector stores include:
- pgvector
- Pinecone
- Qdrant
- Weaviate
- Milvus
A very common similarity metric here is cosine similarity.
What Is Cosine Similarity?
Cosine similarity measures how similar two vectors are by comparing their direction rather than their magnitude.
Intuition:
- close to
1: very similar direction - close to
0: unrelated or orthogonal - close to
-1: opposite direction
For text embeddings, similar direction often means similar meaning.
Cosine Similarity Formula
Given two vectors and :
- is the dot product
- is the magnitude of
- is the magnitude of
Simple Example
Let:
First, compute the dot product:
Then compute magnitudes:
So:
This means the two vectors point in exactly the same direction. In practice, text vectors are higher-dimensional, but the intuition is the same: closer direction usually means closer meaning.
Why Cosine Similarity Is Common
Cosine similarity is popular for embeddings because in many cases we care more about semantic direction than vector length.
Benefits:
- good for semantic comparison
- less sensitive to magnitude
- widely supported in vector search systems
In practice, some systems also use:
- dot product
- euclidean distance
- inner product
The best choice depends on the model and whether vectors are normalized.
Embeddings vs BM25
BM25 is a classic lexical search algorithm used in systems like Elasticsearch and OpenSearch. It ranks documents based on term matching, term frequency, inverse document frequency, and document length normalization.
The key difference is:
- BM25 focuses on word overlap
- embeddings focus on semantic similarity
Strengths of Embeddings
- better for synonyms and paraphrases
- better for natural language queries
- useful for knowledge bases, FAQ, and RAG
- stronger for semantic retrieval
Weaknesses of Embeddings
- not always reliable for exact keyword matches
- harder to explain ranking
- higher infrastructure and model cost
- requires tuning such as chunking, top-k, and reranking
Strengths of BM25
- excellent for exact matches
- easy to explain
- mature and relatively cheap to deploy
- strong for error codes, model numbers, API names, and identifiers
Weaknesses of BM25
- poor at handling paraphrases
- may miss relevant content with different wording
- less effective for semantic search and question-style queries
Which One Should You Use?
Use BM25 first when queries are mostly precise keywords, such as:
ERR_CONNECTION_RESETuseEffect cleanupSKU-12345
Use embeddings when users ask broader questions, such as:
How do I reduce Next.js initial load time?How can I avoid unnecessary React rerenders?
In many real systems, the best answer is not choosing one over the other, but combining both with hybrid search, often using methods like RRF:
- BM25 for exact keyword matching
- embeddings for semantic matching
This usually gives more stable retrieval quality.
Conclusion
Embeddings turn text into vectors that capture meaning, making semantic search possible. Cosine similarity is one of the most common ways to compare these vectors by measuring how closely their directions align.
Compared with BM25:
- BM25 is stronger for exact term matching
- embeddings are stronger for semantic matching
- hybrid search often works best in practice
If you are building search or RAG, the practical question is usually not "Embeddings or BM25?", but "How should I combine them for my data and query patterns?"
That is exactly why the next step in this series is RRF: once you understand lexical and semantic retrieval separately, the natural question becomes how to merge both ranked lists into one strong candidate set.
Last updated on