Eric TechBlog

Chunking is the design of the retrieval unit. It determines what a system can match, return, and use as evidence.

What chunking is

Chunking means splitting source content into smaller units. In retrieval systems, that is not just a preprocessing step. It is the decision of what the system is allowed to retrieve.

That unit might be:

a fixed-size window
a paragraph
a section
a FAQ item
an API endpoint with its explanation
a transcript segment
a small child passage that points to a larger parent block

So the main question is not "What chunk size should I use?" It is "What retrieval unit gives my queries the best chance of finding usable evidence?"

Core idea

Chunking is not mainly a token-count problem. It is a retrieval-unit design problem.

How chunking fits retrieval

Most retrieval systems do not search whole documents directly. They search the chunks you created ahead of time. That means chunking defines the candidate set, and therefore the search space itself.

In practice:

split documents into chunks
index each chunk with text, metadata, and often embeddings
match a query against those chunks
rank the matched chunks
pass the best ones to the UI or the LLM

Retrieval does not really retrieve "documents." It retrieves units. That is why chunking directly affects what can be matched, ranked, and used as evidence.

One useful mental model:

embedding quality affects how well chunks are represented
ranking quality affects how well chunks are ordered
chunking quality affects what kind of units are available to retrieve

Why chunking matters

Good chunking usually improves three things at once:

recall: relevant ideas stay intact, so the retriever is less likely to miss them
precision: each chunk stays focused, so ranking is less likely to reward noise
usability: retrieved chunks include enough surrounding context to actually support an answer

This matters even more in RAG, where retrieved chunks become the model's working context.

Common chunking failures

Chunks are too large

Large chunks often mix multiple topics together. That can cause:

weaker semantic matching
more irrelevant text in the result
lower ranking quality
wasted context budget downstream

Chunks are too small

Small chunks often lose the context needed to be useful. That can cause:

incomplete evidence
broken references
missing definitions
poor support for answer generation

Boundaries are arbitrary

Fixed-size chunking is simple, but it often cuts through natural boundaries. That can cause:

one idea split across chunks
headings separated from their content
examples detached from explanations
retrieval results that are technically matched but hard to interpret

Overlap is used without intention

Overlap can help preserve context, but too much overlap creates duplication. That can cause:

repeated results
noisier retrieval sets
higher storage cost
less diversity in top-k results

Practical chunking principles

1. Split by meaning before you split by size

Natural boundaries are usually better than arbitrary ones. Token limits still matter, but they should follow the retrieval design instead of driving it.

Useful boundaries often include:

titles
sections
paragraphs
bullet groups
table blocks
speaker turns
code examples with their explanation

2. Design chunks around likely queries

Think about what users will actually ask:

a FAQ assistant may work well with small, focused chunks
a policy assistant may need clause-level context
API docs may work better when split by endpoint or method
transcripts may need speaker-aware or topic-aware chunks

Some evidence only makes sense when read together:

a policy rule and its exception
an API parameter and its constraint
a claim and its supporting explanation
a metric and its definition

4. Do not optimize only for retrieval match

A chunk that is easy to retrieve is not always easy to use. Optimize for both:

retrievability
usefulness after retrieval

5. Use overlap deliberately

Use overlap only when you know what context would otherwise be lost.

6. Attach metadata

Metadata often helps retrieval more than teams expect:

title
section path
source
page number
timestamp
speaker
document type
product area

Common chunking strategies

There is no universal best strategy, but these patterns are common:

Simple and fast. You split text by a constant token or character length, sometimes with overlap.

Use it when:

you need a quick baseline
documents are messy or inconsistent
implementation speed matters more than precision at first

This uses document structure such as:

titles
sections
paragraphs
lists
tables
code blocks with nearby explanation

Use it when:

documents have clear formatting
users ask section-level questions
preserving human-readable boundaries matters

This tries to split content based on topic shifts or semantic transitions. It can produce coherent chunks, but it is usually more complex and less predictable than structure-aware splitting.

Use it when:

documents contain long flowing text
structure alone is not enough
semantic coherence matters more than strict formatting

In this pattern, retrieval matches smaller child units but returns a larger parent unit for context. This is often a strong compromise:

small units improve matching precision
larger parent units improve usability after retrieval

How to evaluate chunking

Judge chunking by retrieval outcomes, not by elegance alone:

do top results become more relevant?
does the system miss fewer obviously relevant passages?
are retrieved chunks easier to use as evidence?
do downstream answers become more grounded and complete?

Useful metrics include:

Recall@k
Precision@k
MRR
nDCG

Even simple qualitative testing is valuable if it uses real user queries and checks both retrieval quality and downstream answer quality.

The main takeaway

Chunking is not just text splitting. It is the design of the unit that retrieval can understand, match, and return.

If your goal is retrieval quality, do not start with "What chunk size should I use?" Start with "What unit gives my queries the best chance of finding the right evidence?"

Once the retrieval unit is designed, the next question is how to retrieve it well. That is where BM25 and embeddings enter the picture.

Chunking

On this page