RAG Chunk Calculator

Paste a document and preview exactly how it chunks at different sizes and overlap settings. See token counts per chunk, estimated embedding costs, and retrieval quality tips.

Document275 tokens

The key components of a RAG system include: a document store or vector database, an embedding model to convert text into vectors, a retrieval mechanism (usually approximate nearest neighbor search), and a language model that uses the retrieved context to generate accurate responses.

Chunk size is one of the most important hyperparameters in RAG. Smaller chunks (128-256 tokens) produce more precise retrieval but may miss important context. Larger chunks (512-1024 tokens) provide more context per retrieved passage but may include irrelevant information and cost more to embed.

Overlap between chunks helps prevent important information from being split across chunk boundaries. A good rule of thumb is to use 10-20% overlap relative to the chunk size. For example, with a 512-token chunk size, use 50-100 tokens of overlap.

Chunk size: 256 words

Overlap: 50 words

Embedding model

Chunks

274

Avg tokens/chunk

$0.0055

Embed cost

0.006 MB

Vector storage

Recommended Strategies

Chunk Preview (1 chunks)

Show chunks

Chunk 1~274 tokens

Retrieval-Augmented Generation (RAG) is a technique that enhances large language models by retrieving relevant information from external knowledge sources before generating responses. Instead of relying solely on the model's training data, RAG systems first query a vector database or knowledge base to find the most relevant documents or passages, then provide these as context to the language model. The key components of a RAG system include: a document store or vector database, an embedding model to convert text into vectors, a retrieval mechanism (usually approximate nearest neighbor search), and a language model that uses the retrieved context to generate accurate responses. Chunk size is one of the most important hyperparameters in RAG. Smaller chunks (128-256 tokens) produce more precise retrieval but may miss important context. Larger chunks (512-1024 tokens) provide more context per retrieved passage but may include irrelevant information and cost more to embed. Overlap between chunks helps prevent important information from being split across chunk boundaries. A good rule of thumb is to use 10-20% overlap relative to the chunk size. For example, with a 512-token chunk size, use 50-100 tokens of overlap.

Frequently Asked Questions

What chunk size should I use?

It depends on your use case. For Q&A over short facts, use 128–256 tokens for precision. For document summarization or long-form content, use 512–1024 tokens for better context. Always test multiple chunk sizes with your specific data and retrieval task.

What is chunk overlap and why does it matter?

Overlap prevents important information from being split across chunk boundaries. Without overlap, a key sentence at the end of one chunk and the beginning of the next would never appear together in a retrieval result. A 10–20% overlap of the chunk size is a good starting point.

What embedding model should I use?

For production: text-embedding-3-small is the best value (cheap, high quality). For highest quality: text-embedding-3-large. For free/local embedding: nomic-embed-text or bge-large-en-v1.5 via Ollama or HuggingFace. For multilingual: multilingual-e5-large.