Vector Search and RAG: Retrieval Strategies
The Library Analogy 📚
Imagine you’re a super-smart librarian in a magical library. People come asking questions, and your job is to find the BEST books to answer them. But here’s the twist: there are millions of books!
RAG (Retrieval-Augmented Generation) is like giving an AI its own magical librarian. Instead of guessing answers, the AI first retrieves the best information, then uses it to give you a perfect answer.
Today, we’ll learn all the different ways our magical librarian can find the right books!
1. Retriever Fundamentals
What Is a Retriever?
A retriever is your search helper. When you ask a question, it runs through all your documents and brings back the most relevant ones.
Think of it like this:
You: “Tell me about dinosaurs!” Retriever: runs to shelves → comes back with 5 best dinosaur books
How It Works (Simple Version)
graph TD A["Your Question"] --> B["Retriever"] B --> C["Searches Documents"] C --> D["Returns Top Matches"] D --> E["AI Uses These to Answer"]
The Basic Pattern
# Create a retriever from your documents
retriever = vectorstore.as_retriever()
# Ask it to find relevant docs
docs = retriever.get_relevant_documents(
"What is photosynthesis?"
)
Key Point: Every retriever follows this pattern - you give it a question, it gives you documents!
2. BM25Retriever
The Word-Counting Champion
BM25 is like a librarian who counts words really carefully.
Imagine you ask: “Tell me about red apples”
BM25 thinks:
- 📖 Book A mentions “apple” 50 times → High score!
- 📖 Book B mentions “apple” 2 times → Low score
- 📖 Book C mentions “red” AND “apple” → Even higher!
Why It’s Special
BM25 is keyword-based. It doesn’t understand meaning - it just counts words smartly.
Good for: Finding exact matches Bad for: Understanding “automobile” = “car”
Simple Example
from langchain_community.retrievers import (
BM25Retriever
)
# Your documents
docs = ["Cats love milk",
"Dogs chase balls",
"Cats are fluffy"]
# Create BM25 retriever
retriever = BM25Retriever.from_texts(docs)
# Search!
results = retriever.get_relevant_documents(
"What do cats like?"
)
# Returns: "Cats love milk" (highest score)
When to Use BM25
| Use BM25 When… | Don’t Use When… |
|---|---|
| Exact words matter | Meaning matters more |
| Technical terms | Synonym-heavy queries |
| Product codes | Conversational questions |
3. Self-Query Retriever
The Smart Filter
Imagine asking: “Find me comedy movies from 2020 with rating above 8”
A normal retriever would search for all those words. But a Self-Query Retriever is smarter - it understands your filters!
graph TD A["Find comedy movies from 2020, rating > 8"] A --> B["Self-Query Analyzes"] B --> C["Query: comedy movies"] B --> D["Filter: year=2020, rating>8"] C --> E["Search by Meaning"] D --> F["Apply Filters"] E --> G["Combined Results"] F --> G
The Magic Inside
from langchain.retrievers import (
SelfQueryRetriever
)
# Define what filters exist
metadata_field_info = [
{
"name": "genre",
"type": "string",
"description": "Movie genre"
},
{
"name": "year",
"type": "integer",
"description": "Release year"
}
]
# Create the smart retriever
retriever = SelfQueryRetriever.from_llm(
llm=llm,
vectorstore=vectorstore,
document_contents="Movie descriptions",
metadata_field_info=metadata_field_info
)
Real-World Use
User asks: “Red dresses under $50”
Self-Query splits into:
- Search: “dresses” (meaning-based)
- Filter: color=red, price<50
Super powerful for e-commerce and databases!
4. Parent Document Retriever
The Context Keeper
Here’s a problem: AI works best with small chunks of text. But small chunks lose context!
Parent Document Retriever solves this brilliantly:
- Store small chunks (for accurate searching)
- Return full documents (for complete context)
The Clever Trick
graph TD A["Big Document"] --> B["Split into Chunks"] B --> C["Chunk 1"] B --> D["Chunk 2"] B --> E["Chunk 3"] C --> F["Search finds Chunk 2"] F --> G["Return FULL Document!"]
Example
from langchain.retrievers import (
ParentDocumentRetriever
)
from langchain.storage import InMemoryStore
# Storage for full documents
docstore = InMemoryStore()
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=docstore,
child_splitter=child_splitter
)
# Add documents (auto-splits internally)
retriever.add_documents(documents)
# Search returns FULL parent docs
results = retriever.get_relevant_documents(
"specific detail question"
)
Why This Matters
| Without Parent Retriever | With Parent Retriever |
|---|---|
| “…the cat sat on…” | “Once upon a time, in a cozy house, there lived a fluffy cat. The cat sat on the warm windowsill…” |
You get the complete story, not just a snippet!
5. Ensemble Retriever
The Team Approach
Why use one search method when you can use MANY?
Ensemble Retriever combines multiple retrievers and merges their results. It’s like asking 3 librarians and combining their recommendations!
graph TD A["Your Question"] --> B["BM25 Retriever"] A --> C["Vector Retriever"] A --> D["Other Retriever"] B --> E["Combine Results"] C --> E D --> E E --> F["Best of All Worlds!"]
How to Build One
from langchain.retrievers import (
EnsembleRetriever
)
# Create two different retrievers
bm25_retriever = BM25Retriever.from_texts(docs)
vector_retriever = vectorstore.as_retriever()
# Combine them!
ensemble = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.5, 0.5] # Equal importance
)
# Now searches use BOTH methods
results = ensemble.get_relevant_documents(
"machine learning basics"
)
The Power of Weights
# Trust BM25 more (for exact matches)
weights=[0.7, 0.3]
# Trust vectors more (for meaning)
weights=[0.3, 0.7]
You can tune it for your specific use case!
6. Multi-Vector Retriever
Multiple Views, One Document
Some documents are complex. A research paper has:
- A title
- An abstract (summary)
- Full content
- Key findings
Multi-Vector Retriever stores multiple “views” of each document!
graph TD A["Research Paper"] --> B["Generate Summary"] A --> C["Extract Questions"] A --> D["Key Points"] B --> E["Vector Store"] C --> E D --> E E --> F["Search finds any view"] F --> G["Return Original Paper"]
Example Setup
from langchain.retrievers import (
MultiVectorRetriever
)
from langchain.storage import InMemoryStore
# Store original docs
docstore = InMemoryStore()
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
docstore=docstore,
id_key="doc_id"
)
# For each document, create multiple vectors
for doc in documents:
doc_id = str(uuid.uuid4())
# Summary vector
summary = llm.summarize(doc)
# Questions vector
questions = llm.generate_questions(doc)
# Add all vectors pointing to same doc
retriever.vectorstore.add_documents(
[summary, questions],
ids=[doc_id + "_sum", doc_id + "_q"]
)
retriever.docstore.mset([(doc_id, doc)])
Why Multiple Vectors?
- User asks broad question → Summary matches!
- User asks specific question → Content matches!
- User asks “what does X explain?” → Question matches!
More ways to find the right document!
7. Contextual Compression
Squeeze Out the Noise
Sometimes retrievers return documents that are mostly irrelevant, with just one useful sentence buried inside.
Contextual Compression extracts ONLY the relevant parts!
graph TD A["Question"] --> B["Retriever"] B --> C["Document 1: 500 words"] B --> D["Document 2: 300 words"] C --> E["Compressor"] D --> E E --> F["Relevant sentence 1"] E --> G["Relevant sentence 2"]
How It Works
from langchain.retrievers import (
ContextualCompressionRetriever
)
from langchain.retrievers.document_compressors import (
LLMChainExtractor
)
# Create a compressor (uses LLM to extract)
compressor = LLMChainExtractor.from_llm(llm)
# Wrap any retriever with compression
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=base_retriever
)
# Results are now compressed!
results = compression_retriever.get_relevant_documents(
"What is the capital of France?"
)
Before vs After Compression
| Before (Raw) | After (Compressed) |
|---|---|
| “France is a beautiful country in Europe. It has many tourist attractions. The capital of France is Paris. Paris has the Eiffel Tower…” | “The capital of France is Paris.” |
Only the answer, no fluff!
8. Document Reranking
The Quality Judge
Retrievers return documents, but are they in the best order?
Reranking is like having a judge review the librarian’s picks and say: “Actually, book #3 should be first!”
graph TD A["Question"] --> B["Retriever"] B --> C["Doc 1: Score 0.8"] B --> D["Doc 2: Score 0.7"] B --> E["Doc 3: Score 0.6"] C --> F["Reranker"] D --> F E --> F F --> G["Doc 3: Now #1!"] F --> H["Doc 1: Now #2"] F --> I["Doc 2: Now #3"]
Why Rerank?
Initial retrieval is fast but rough. Reranking is slower but precise.
Think of it like:
- Google Search → Shows 100 results quickly
- You reading → Picks the best 3 carefully
Example with Cohere Reranker
from langchain.retrievers import (
ContextualCompressionRetriever
)
from langchain_cohere import CohereRerank
# Create reranker
reranker = CohereRerank(
model="rerank-english-v2.0"
)
# Wrap retriever with reranking
reranking_retriever = ContextualCompressionRetriever(
base_compressor=reranker,
base_retriever=base_retriever
)
# Get better-ordered results
results = reranking_retriever.get_relevant_documents(
"How does photosynthesis work?"
)
Cross-Encoder Magic
Rerankers often use cross-encoders - they look at query AND document together, not separately. This gives much better relevance scores!
Putting It All Together 🎯
Here’s how real applications combine these strategies:
graph TD A["User Query"] --> B["Self-Query: Extract Filters"] B --> C["Ensemble: BM25 + Vector"] C --> D["Parent Docs: Get Full Context"] D --> E["Compression: Remove Noise"] E --> F["Rerank: Best Order"] F --> G["Top 3 Perfect Documents!"]
Choosing Your Strategy
| Your Need | Best Strategy |
|---|---|
| Exact keyword matches | BM25 |
| Filter by attributes | Self-Query |
| Full document context | Parent Document |
| Best of multiple methods | Ensemble |
| Complex documents | Multi-Vector |
| Remove irrelevant parts | Compression |
| Perfect ordering | Reranking |
Quick Wins for Your Project 🚀
- Start simple: Use basic vector retriever first
- Add BM25: Ensemble with keywords helps a lot
- Enable compression: Cleaner results, better answers
- Consider reranking: When quality matters most
Remember: The best retrieval strategy depends on YOUR data and YOUR users. Experiment and measure!
Summary
You’ve learned how to be a master librarian for AI! Each retrieval strategy is a tool in your toolkit:
- 🔍 BM25 - Word counting expert
- đź§ Self-Query - Smart filter extractor
- đź“„ Parent Document - Context keeper
- 🤝 Ensemble - Team combiner
- 🎠Multi-Vector - Multiple perspectives
- ✂️ Compression - Noise remover
- 🏆 Reranking - Quality judge
Now go build amazing search systems! Your AI will thank you for finding the perfect information every time.
