Eric TechBlog
AI

Chunking

Chunking is the design of the retrieval unit. It determines what a system can match, return, and use as evidence.

What chunking is

Chunking means splitting source content into smaller units. In retrieval systems, that is not just a preprocessing step. It is the decision of what the system is allowed to retrieve.

That unit might be:

  • a fixed-size window
  • a paragraph
  • a section
  • a FAQ item
  • an API endpoint with its explanation
  • a transcript segment
  • a small child passage that points to a larger parent block

So the main question is not "What chunk size should I use?" It is "What retrieval unit gives my queries the best chance of finding usable evidence?"

Core idea

Chunking is not mainly a token-count problem. It is a retrieval-unit design problem.

How chunking fits retrieval

Most retrieval systems do not search whole documents directly. They search the chunks you created ahead of time. That means chunking defines the candidate set, and therefore the search space itself.

In practice:

  1. split documents into chunks
  2. index each chunk with text, metadata, and often embeddings
  3. match a query against those chunks
  4. rank the matched chunks
  5. pass the best ones to the UI or the LLM

Retrieval does not really retrieve "documents." It retrieves units. That is why chunking directly affects what can be matched, ranked, and used as evidence.

One useful mental model:

  • embedding quality affects how well chunks are represented
  • ranking quality affects how well chunks are ordered
  • chunking quality affects what kind of units are available to retrieve

Why chunking matters

Good chunking usually improves three things at once:

  • recall: relevant ideas stay intact, so the retriever is less likely to miss them
  • precision: each chunk stays focused, so ranking is less likely to reward noise
  • usability: retrieved chunks include enough surrounding context to actually support an answer

This matters even more in RAG, where retrieved chunks become the model's working context.

Common chunking failures

Chunks are too large

Large chunks often mix multiple topics together. That can cause:

  • weaker semantic matching
  • more irrelevant text in the result
  • lower ranking quality
  • wasted context budget downstream

Chunks are too small

Small chunks often lose the context needed to be useful. That can cause:

  • incomplete evidence
  • broken references
  • missing definitions
  • poor support for answer generation

Boundaries are arbitrary

Fixed-size chunking is simple, but it often cuts through natural boundaries. That can cause:

  • one idea split across chunks
  • headings separated from their content
  • examples detached from explanations
  • retrieval results that are technically matched but hard to interpret

Overlap is used without intention

Overlap can help preserve context, but too much overlap creates duplication. That can cause:

  • repeated results
  • noisier retrieval sets
  • higher storage cost
  • less diversity in top-k results

Practical chunking principles

1. Split by meaning before you split by size

Natural boundaries are usually better than arbitrary ones. Token limits still matter, but they should follow the retrieval design instead of driving it.

Useful boundaries often include:

  • titles
  • sections
  • paragraphs
  • bullet groups
  • table blocks
  • speaker turns
  • code examples with their explanation

2. Design chunks around likely queries

Think about what users will actually ask:

  • a FAQ assistant may work well with small, focused chunks
  • a policy assistant may need clause-level context
  • API docs may work better when split by endpoint or method
  • transcripts may need speaker-aware or topic-aware chunks

Some evidence only makes sense when read together:

  • a policy rule and its exception
  • an API parameter and its constraint
  • a claim and its supporting explanation
  • a metric and its definition

4. Do not optimize only for retrieval match

A chunk that is easy to retrieve is not always easy to use. Optimize for both:

  • retrievability
  • usefulness after retrieval

5. Use overlap deliberately

Use overlap only when you know what context would otherwise be lost.

6. Attach metadata

Metadata often helps retrieval more than teams expect:

  • title
  • section path
  • source
  • page number
  • timestamp
  • speaker
  • document type
  • product area

Common chunking strategies

There is no universal best strategy, but these patterns are common:

Simple and fast. You split text by a constant token or character length, sometimes with overlap.

Use it when:

  • you need a quick baseline
  • documents are messy or inconsistent
  • implementation speed matters more than precision at first

This uses document structure such as:

  • titles
  • sections
  • paragraphs
  • lists
  • tables
  • code blocks with nearby explanation

Use it when:

  • documents have clear formatting
  • users ask section-level questions
  • preserving human-readable boundaries matters

This tries to split content based on topic shifts or semantic transitions. It can produce coherent chunks, but it is usually more complex and less predictable than structure-aware splitting.

Use it when:

  • documents contain long flowing text
  • structure alone is not enough
  • semantic coherence matters more than strict formatting

In this pattern, retrieval matches smaller child units but returns a larger parent unit for context. This is often a strong compromise:

  • small units improve matching precision
  • larger parent units improve usability after retrieval

How to evaluate chunking

Judge chunking by retrieval outcomes, not by elegance alone:

  • do top results become more relevant?
  • does the system miss fewer obviously relevant passages?
  • are retrieved chunks easier to use as evidence?
  • do downstream answers become more grounded and complete?

Useful metrics include:

  • Recall@k
  • Precision@k
  • MRR
  • nDCG

Even simple qualitative testing is valuable if it uses real user queries and checks both retrieval quality and downstream answer quality.

The main takeaway

Chunking is not just text splitting. It is the design of the unit that retrieval can understand, match, and return.

If your goal is retrieval quality, do not start with "What chunk size should I use?" Start with "What unit gives my queries the best chance of finding the right evidence?"

Once the retrieval unit is designed, the next question is how to retrieve it well. That is where BM25 and embeddings enter the picture.

Last updated on

On this page