The Art of Chunking: Boosting AI Performance in RAG Architectures

Author:Murphy | View: 27642 | Time: 2025-03-23 11:49:16

Free link: Please help me like this LinkedIn post.

Smart people are lazy. They find the most efficient ways to solve complex problems, minimizing effort while maximizing results.

In Generative AI applications, this efficiency is achieved through chunking. Just like breaking a book into chapters makes it easier to read, chunking divides significant texts into smaller, manageable parts, making them easier to process and understand.

Before exploring the mechanics of chunking, it's essential to understand the broader framework in which this technique operates: Retrieval-Augmented Generation or RAG.

What is RAG?

Retrieval-augmented generation (RAG) is an approach that integrates retrieval mechanisms with large language models (LLM models). It enhances AI capabilities using retrieved documents to generate more accurate and contextually enriched responses.

Introducing Chunking

Chunking is breaking down large pieces of text into smaller, more manageable chunks. This process has two main phases:

Data Preparation: Reliable data sources are segmented into chunked documents and stored in a database. The database can be a vector store if you generate embeddings within the chunks.
Retrieval: When a user asks a question, the system searches through the document chunks using vector search, full-text search, or a combination of both. This process identifies and retrieves the chunks most relevant to the user's query.

Why Chunking is Crucial in RAG Architectures

Chunking is absolutely essential in RAG architectures because it is the first element that decides the accuracy of your Gen AI application.

Chunks should be small to increase accuracy: Chunking enables the system to index and search through smaller text segments, increasing the accuracy of finding relevant documents. When a query is made, the system can quickly pinpoint the most pertinent chunks, improving the precision of the retrieval process.
Chunks should be big to enhance Contextual Generation: Not chunks should be small. By working with smaller chunks, the generative model can better understand and utilize the context provided by each segment. This results in more coherent and contextually accurate responses, as the model can draw on specific, relevant information rather than sifting through a large, undivided document.
Scalability and Performance: Chunking allows for more scalable and efficient processing of large datasets. It reduces the computational load by breaking down the data into manageable pieces, which can be processed in parallel, enhancing the overall performance of the RAG system. However, scalability should be ensured by

Chunking is a technological necessity and a strategic approach to ensuring robust, efficient, and scalable RAG systems. It enhances retrieval accuracy, processing efficiency, and resource utilization, playing a crucial role in the success of RAG applications.

Techniques to Improve Chunking

Several techniques can improve chunking, ranging from basic to advanced methods:

Fixed Character Sizes: Simple and straightforward, splitting text into chunks of a fixed number of characters.
Recursive Character Text Splitting: Using separators like spaces or punctuation to create more contextually meaningful chunks.
Document-Specific Splitting: Tailoring the chunking method to the document type, such as PDFs or Markdown files.
Semantic Splitting: Using embeddings to chunk text based on semantic content.
Agentic Splitting: Employing large language models to determine optimal chunking based on content and context.

By adopting these techniques, RAG systems can achieve higher performance and more accurate results, solidifying their role as essential tools in AI.

Fixed Character Sizes

Fixed character size chunking is the most basic method of splitting text. This approach involves dividing the text into chunks of a predetermined number of characters, regardless of the content. This method is straightforward but lacks consideration for the text's structure and context, which can lead to less meaningful chunks.

Pros:

Simplicity: Easy to implement and requires minimal computational resources.
Consistency: Produces uniform chunks, simplifying downstream processing.

Cons:

Context Ignorance: Ignores the structure and meaning of the text, resulting in fragmented information.
Inefficiency: May cut off important context, requiring additional processing to reassemble meaningful information.

Here's an example of how to implement fixed character size chunking using the code provided previously:

# Sample text to chunk
text = "This is the text I would like to chunk up. It is the example text for this exercise."

# Set the chunk size
chunk_size = 35
# Initialize a list to hold the chunks
chunks = []
# Iterate over the text to create chunks
for i in range(0, len(text), chunk_size):
    chunk = text[i:i + chunk_size]
    chunks.append(chunk)
# Display the chunks
print(chunks)
# Output: ['This is the text I would like to ch', 'unk up. It is the example text for ', 'this exercise']

Using LangChain's CharacterTextSplitter to achieve the same result:

from langchain.text_splitter import CharacterTextSplitter

# Initialize the text splitter with specified chunk size
text_splitter = CharacterTextSplitter(chunk_size=35, chunk_overlap=0, separator='', strip_whitespace=False)
# Create documents using the text splitter
documents = text_splitter.create_documents([text])
# Display the created documents
for doc in documents:
    print(doc.page_content)
# Output: 
# This is the text I would like to ch
# unk up. It is the example text for 
# this exercise

Fixed character size chunking is a simple yet foundational technique, often used as a baseline before moving to more sophisticated methods.

Recursive Character Text Splitting

Recursive character text splitting is a more advanced technique that considers the structure of the text. It uses a series of separators to recursively divide the text into chunks, ensuring that the chunks are more meaningful and contextually relevant.

In the above example, with a chunk size of 30 characters and an overlap of 20 characters, the RecursiveCharacterTextSplitter will attempt to split the text while maintaining logical boundaries. However, this also shows that due to the small chunk size, it might still split in the middle of words or sentences, which is not optimal at all.

Pros:

Improved Context: This method preserves the text's natural structure using separators like paragraphs or sentences.
Flexibility: Allows for varying chunk sizes and overlaps, providing better control over the chunking process.

Cons:

The chunk size matters: It should be manageable but still contain at least one phrase or more. Otherwise, we need to gain precision while retrieving the chunk.
Performance Overhead: Requires more computational resources due to recursive splitting and handling of multiple separators. And we generate more chunks compared to fixed-size chunks.

Here's an example of how to implement recursive character text splitting in Langchain:

%pip install -qU langchain-text-splitters

Firstly install the long-chain-text-splitters library if you haven't done this yet.

from langchain_text_splitters import RecursiveCharacterTextSplitter
# Sample text to chunk
text = """
The Olympic Games, originally held in ancient Greece, were revived in 1896 and
have since become the world's foremost sports competition, bringing together 
athletes from around the globe.
"""
# Initialize the recursive character text splitter with specified chunk size
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=30,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)

# Create documents using the text splitter
documents = text_splitter.create_documents([text])
# Display the created documents
for doc in documents:
    print(doc.page_content)
# Output:
# "The Olympic Games, originally"
# "held in ancient Greece, were"
# "revived in 1896 and have"
# "have since become the world's"
# "world's foremost sports"
# "competition, bringing together"
# "together athletes from around"
# "around the globe."

In this method, the text is first split by larger structures like paragraphs, and if the chunks are still too large, it further splits them using smaller structures like sentences. Each chunk maintains meaningful context and avoids cutting off vital information.

Recursive character text splitting strikes a balance between simplicity and sophistication, providing a robust method for chunking that respects the text's inherent structure.

Document-Specific Splitting

Document-specific splitting tailors the chunking process to different document types, such as Markdown files, Python scripts, JSON document or HTML, ensuring each type is split in a way that best suits its content and structure.

For example, Markdown is widely used across platforms like GitHub, Medium, and Confluence, making it a natural choice for ingestion in RAG systems, where clean, structured data is essential for generating accurate responses.

Additionally, language-specific splitters are available for various programming languages, including C++, Go, Java, Python, and more, ensuring that code is chunked effectively for analysis and retrieval.

Document-specific splitting – Markdown file

Pros:

Relevance: Different document types are split using the most appropriate method, preserving their logical structure.
Precision: Tailors the chunking process to the unique characteristics of each document type.

Cons:

Complex Implementation: Requires different chunking strategies and libraries for various document types.
Maintenance: Maintenance is more complex due to the diversity of methods.

Here's an example of how to implement document-specific splitting for Markdown and Python files:

Markdown Splitting

from langchain.text_splitter import MarkdownTextSplitter
# Sample Markdown text
markdown_text = """
# Fun in California
## Driving
Try driving on the 1 down to San Diego
### Food
Make sure to eat a burrito while you're there
## Hiking
Go to Yosemite
"""
# Initialize the Markdown text splitter
splitter = MarkdownTextSplitter(chunk_size=40, chunk_overlap=0)
# Create documents using the text splitter
documents = splitter.create_documents([markdown_text])
# Display the created documents
for doc in documents:
    print(doc.page_content)
# Output:
# # Fun in Californiann## Driving
# Try driving on the 1 down to San Diego
# ### Food
# Make sure to eat a burrito while you're
# there
# ## HikingnnGo to Yosemite

Python Code Splitting

from langchain.text_splitter import PythonCodeTextSplitter
# Sample Python code
python_text = """
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
p1 = Person("John", 36)
for i in range(10):
    print(i)
"""
# Initialize the Python code text splitter
python_splitter = PythonCodeTextSplitter(chunk_size=100, chunk_overlap=0)
# Create documents using the text splitter
documents = python_splitter.create_documents([python_text])
# Display the created documents
for doc in documents:
    print(doc.page_content)
# Output:
# class Person:n    def __init__(self, name, age):n        self.name = namen        self.age = age
# p1 = Person("John", 36)nnfor i in range(10):n    print(i)

Document-specific splitting preserves the document's logical structure, making the chunks more meaningful and contextually accurate. For example, headers and sections are separated in Markdown files, while classes and functions are used in Python code.

This method enhances the system's ability to retrieve and generate relevant responses by maintaining the integrity of different document types, thereby improving the overall performance and accuracy of the RAG system.

Semantic Splitting

Unlike previous splitting methods with arbitrary lengths or syntactic rules, semantic splitting takes chunking to the next level by using the meaning of the text to determine chunk boundaries.

This method leverages embeddings to group semantically similar content, ensuring each chunk contains contextually coherent information.

The above diagram illustrates the workflow for semantic chunking, starting with sentence splitting, then generating embeddings, and finally grouping sentences based on similarity. The process ensures that chunks are semantically coherent, enhancing the relevance and accuracy of information retrieval.

Let's see the output of this method through an example.

This diagram provides a practical example of how sentences are grouped into chunks using cosine similarity. Thematically related sentences are grouped, while those that differ in meaning are kept separate. The visual explanation clarifies how semantic chunking is applied to maintain context and coherence in the text.

Pros:

Contextual Relevance: Ensures that chunks contain semantically similar content, enhancing the accuracy of information retrieval and generation.
Dynamic Adaptability: Can adapt to various text structures and content types based on meaning rather than rigid rules.

Cons:

Computational Overhead: Requires additional computational resources to generate and compare embeddings.
Complexity: More complex to implement compared to simpler splitting methods.

Here's an example of how to implement semantic splitting using embeddings. This code is coming from Greg Kamradt's notebook: 5_Levels_Of_Text_Splitting.

from sklearn.metrics.pairwise import cosine_similarity
from langchain.embeddings import OpenAIEmbeddings
import re
# Sample text
text = """
One of the most important things I didn't understand about the world when I was a child is the degree to which the returns for performance are superlinear.
Teachers and coaches implicitly told us the returns were linear. "You get out," I heard a thousand times, "what you put in." They meant well, but this is rarely true. If your product is only half as good as your competitor's, you don't get half as many customers. You get no customers, and you go out of business.
It's obviously true that the returns for performance are superlinear in business. Some think this is a flaw of capitalism, and that if we changed the rules it would stop being true. But superlinear returns for performance are a feature of the world, not an artifact of rules we've invented. We see the same pattern in fame, power, military victories, knowledge, and even benefit to humanity. In all of these, the rich get richer.
"""
# Splitting the text into sentences
sentences = re.split(r'(?<=[.?!])s+', text)
sentences = [{'sentence': x, 'index' : i} for i, x in enumerate(sentences)]
# Combine sentences for context
def combine_sentences(sentences, buffer_size=1):
    for i in range(len(sentences)):
        combined_sentence = ''
        for j in range(i - buffer_size, i):
            if j >= 0:
                combined_sentence += sentences[j]['sentence'] + ' '
        combined_sentence += sentences[i]['sentence']
        for j in range(i + 1, i + 1 + buffer_size):
            if j < len(sentences):
                combined_sentence += ' ' + sentences[j]['sentence']
        sentences[i]['combined_sentence'] = combined_sentence
    return sentences
sentences = combine_sentences(sentences)
# Generate embeddings
oai_embeds = OpenAIEmbeddings()
embeddings = oai_embeds.embed_documents([x['combined_sentence'] for x in sentences])
# Add embeddings to sentences
for i, sentence in enumerate(sentences):
    sentence['combined_sentence_embedding'] = embeddings[i]
# Calculate cosine distances
def calculate_cosine_distances(sentences):
    distances = []
    for i in range(len(sentences) - 1):
        embedding_current = sentences[i]['combined_sentence_embedding']
        embedding_next = sentences[i + 1]['combined_sentence_embedding']
        similarity = cosine_similarity([embedding_current], [embedding_next])[0][0]
        distance = 1 - similarity
        distances.append(distance)
        sentences[i]['distance_to_next'] = distance
    return distances, sentences
distances, sentences = calculate_cosine_distances(sentences)
# Determine breakpoints and create chunks
import numpy as np
breakpoint_distance_threshold = np.percentile(distances, 95)
indices_above_thresh = [i for i, x in enumerate(distances) if x > breakpoint_distance_threshold]
# Combine sentences into chunks
chunks = []
start_index = 0
for index in indices_above_thresh:
    end_index = index
    group = sentences[start_index:end_index + 1]
    combined_text = ' '.join([d['sentence'] for d in group])
    chunks.append(combined_text)
    start_index = index + 1
if start_index < len(sentences):
    combined_text = ' '.join([d['sentence'] for d in sentences[start_index:]])
    chunks.append(combined_text)
# Display the created chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk #{i+1}:n{chunk}n")

Semantic splitting uses embeddings to create semantically similar chunks, improving retrieval accuracy and contextual generation in RAG systems. Focusing on the meaning of the text ensures that each chunk contains coherent and relevant information, enhancing the performance and reliability of the RAG application.

Agentic Splitting

Agentic splitting leverages the power of large language models to dynamically create chunks based on the semantic understanding of the text.

This advanced method mimics the human approach to chunking by evaluating the content and context to determine the optimal chunk boundaries.

Instead of relying on predefined rules or purely statistical methods, the Agentic Splitter processes the text by assessing the content dynamically, similar to how a person would read through a document and decide where to split it based on the flow of ideas and the context of the sentences. This approach enhances the coherence and relevance of the resulting chunks.

Pros:

High Precision: Provides highly relevant and contextually accurate chunks by using sophisticated language models.
Adaptability: Can handle diverse types of text and adjust chunking strategies on the fly.

Cons:

Resource Intensive and Additional LLM cost: Requires significant computational resources to run large language models.
Complex Implementation: Involves setting up and fine-tuning language models for optimal performance.

How to Implement an Agentic Splitter in LangGraph

Understand Nodes in LangGraph: Nodes in LangGraph represent operations or steps in a workflow. Each node takes an input, processes it, and produces an output that is passed to the next node.

from langgraph.nodes import InputNode, SentenceSplitterNode, LLMDecisionNode, ChunkingNode

# Step 1: Input Node
input_node = InputNode(name="Document Input")

# Step 2: Sentence Splitting Node
splitter_node = SentenceSplitterNode(input=input_node.output, name="Sentence Splitter")

# Step 3: LLM Decision Node
decision_node = LLMDecisionNode(
    input=splitter_node.output, 
    prompt_template="Does the sentence '{next_sentence}' belong to the same chunk as '{current_chunk}'?", 
    name="LLM Decision"
)

# Step 4: Chunking Node
chunking_node = ChunkingNode(input=decision_node.output, name="Semantic Chunking")

# Run the graph
document = "Your document text here..."
result = chunking_node.run(document=document)
print(result)

Conclusion

In conclusion, chunking is a vital strategy in optimizing Retrieval-Augmented Generation (RAG) systems, enabling more accurate, contextually relevant, and scalable responses.

By breaking down large texts into manageable pieces, we enhance retrieval accuracy and improve the overall efficiency of AI applications.

Embracing advanced chunking techniques is essential for AI-driven solutions' continued success and advancement.