Eric TechBlog

A practical introduction to embeddings, cosine similarity, and how they compare with BM25

Embeddings are one of the core ideas behind semantic search and modern RAG. Instead of treating text as plain strings, embeddings convert text into vectors so systems can compare meaning mathematically.

In this series, embeddings come after BM25. BM25 explains lexical retrieval, while embeddings explain semantic retrieval. Together, they form the most common first-stage retrieval pair in modern RAG systems.

In short:

keyword search asks: do the same words appear?
embeddings ask: do the meanings match?

For example, these two sentences may be semantically close even if they do not share the same wording:

Cats like sleeping
Kittens often rest during the day

With embeddings, a model can place them near each other in vector space.

What Are Embeddings?

An embedding is a vector representation of data such as a word, sentence, paragraph, or document.

[0.12, -0.83, 0.44, 0.91, ...]

The individual dimensions are usually not interpretable by humans, but the overall position captures semantic information.

Common properties:

semantically similar text tends to have nearby vectors
semantically different text tends to have distant vectors
embeddings can be used for search, clustering, recommendation, and classification

Why Embeddings Matter

Traditional keyword search works well when the query and the document share the same terms. But real users often search with:

synonyms
different phrasing
natural language questions
vague descriptions

Example:

Query:

How can I improve React initial page load performance?

Document:

Next.js initial rendering optimization strategies

The wording is different, but the meaning is similar. This is where embeddings are useful.

Typical Embedding Search Flow

The basic flow looks like this:

split documents into chunks
convert each chunk into a vector with an embedding model
store those vectors in a vector index
embed the user query
compare the query vector with stored vectors and rank the closest matches

Common vector stores include:

pgvector
Pinecone
Qdrant
Weaviate
Milvus

A very common similarity metric here is cosine similarity.

What Is Cosine Similarity?

Cosine similarity measures how similar two vectors are by comparing their direction rather than their magnitude.

Intuition:

close to 1: very similar direction
close to 0: unrelated or orthogonal
close to -1: opposite direction

For text embeddings, similar direction often means similar meaning.

Cosine Similarity Formula

Given two vectors $A$ and $B$ :

\text{cosineSimilarity}(A, B) = \frac{A \cdot B}{\|A\| \cdot \|B\|}

$A \cdot B$ is the dot product
$\|A\|$ is the magnitude of $A$
$\|B\|$ is the magnitude of $B$

Simple Example

Let:

A = [1, 2, 3]

B = [2, 4, 6]

First, compute the dot product:

A \cdot B = 1 \times 2 + 2 \times 4 + 3 \times 6 = 28

Then compute magnitudes:

\|A\| = \sqrt{1^2 + 2^2 + 3^2} = \sqrt{14}

\|B\| = \sqrt{2^2 + 4^2 + 6^2} = \sqrt{56}

So:

\text{cosineSimilarity}(A, B) = \frac{28}{\sqrt{14} \cdot \sqrt{56}} = 1

This means the two vectors point in exactly the same direction. In practice, text vectors are higher-dimensional, but the intuition is the same: closer direction usually means closer meaning.

Why Cosine Similarity Is Common

Cosine similarity is popular for embeddings because in many cases we care more about semantic direction than vector length.

Benefits:

good for semantic comparison
less sensitive to magnitude
widely supported in vector search systems

In practice, some systems also use:

dot product
euclidean distance
inner product

The best choice depends on the model and whether vectors are normalized.

Embeddings vs BM25

BM25 is a classic lexical search algorithm used in systems like Elasticsearch and OpenSearch. It ranks documents based on term matching, term frequency, inverse document frequency, and document length normalization.

The key difference is:

BM25 focuses on word overlap
embeddings focus on semantic similarity

Strengths of Embeddings

better for synonyms and paraphrases
better for natural language queries
useful for knowledge bases, FAQ, and RAG
stronger for semantic retrieval

Weaknesses of Embeddings

not always reliable for exact keyword matches
harder to explain ranking
higher infrastructure and model cost
requires tuning such as chunking, top-k, and reranking

Strengths of BM25

excellent for exact matches
easy to explain
mature and relatively cheap to deploy
strong for error codes, model numbers, API names, and identifiers

Weaknesses of BM25

poor at handling paraphrases
may miss relevant content with different wording
less effective for semantic search and question-style queries

Which One Should You Use?

Use BM25 first when queries are mostly precise keywords, such as:

ERR_CONNECTION_RESET
useEffect cleanup
SKU-12345

Use embeddings when users ask broader questions, such as:

How do I reduce Next.js initial load time?
How can I avoid unnecessary React rerenders?

In many real systems, the best answer is not choosing one over the other, but combining both with hybrid search, often using methods like RRF:

BM25 for exact keyword matching
embeddings for semantic matching

This usually gives more stable retrieval quality.

Conclusion

Embeddings turn text into vectors that capture meaning, making semantic search possible. Cosine similarity is one of the most common ways to compare these vectors by measuring how closely their directions align.

Compared with BM25:

BM25 is stronger for exact term matching
embeddings are stronger for semantic matching
hybrid search often works best in practice

If you are building search or RAG, the practical question is usually not "Embeddings or BM25?", but "How should I combine them for my data and query patterns?"

That is exactly why the next step in this series is RRF: once you understand lexical and semantic retrieval separately, the natural question becomes how to merge both ranked lists into one strong candidate set.

Embeddings

On this page