Retrieval Strategies

Back

Loading concept...

Vector Search and RAG: Retrieval Strategies

The Library Analogy 📚

Imagine you’re a super-smart librarian in a magical library. People come asking questions, and your job is to find the BEST books to answer them. But here’s the twist: there are millions of books!

RAG (Retrieval-Augmented Generation) is like giving an AI its own magical librarian. Instead of guessing answers, the AI first retrieves the best information, then uses it to give you a perfect answer.

Today, we’ll learn all the different ways our magical librarian can find the right books!


1. Retriever Fundamentals

What Is a Retriever?

A retriever is your search helper. When you ask a question, it runs through all your documents and brings back the most relevant ones.

Think of it like this:

You: “Tell me about dinosaurs!” Retriever: runs to shelves → comes back with 5 best dinosaur books

How It Works (Simple Version)

graph TD A["Your Question"] --> B["Retriever"] B --> C["Searches Documents"] C --> D["Returns Top Matches"] D --> E["AI Uses These to Answer"]

The Basic Pattern

# Create a retriever from your documents
retriever = vectorstore.as_retriever()

# Ask it to find relevant docs
docs = retriever.get_relevant_documents(
    "What is photosynthesis?"
)

Key Point: Every retriever follows this pattern - you give it a question, it gives you documents!


2. BM25Retriever

The Word-Counting Champion

BM25 is like a librarian who counts words really carefully.

Imagine you ask: “Tell me about red apples”

BM25 thinks:

  • đź“– Book A mentions “apple” 50 times → High score!
  • đź“– Book B mentions “apple” 2 times → Low score
  • đź“– Book C mentions “red” AND “apple” → Even higher!

Why It’s Special

BM25 is keyword-based. It doesn’t understand meaning - it just counts words smartly.

Good for: Finding exact matches Bad for: Understanding “automobile” = “car”

Simple Example

from langchain_community.retrievers import (
    BM25Retriever
)

# Your documents
docs = ["Cats love milk",
        "Dogs chase balls",
        "Cats are fluffy"]

# Create BM25 retriever
retriever = BM25Retriever.from_texts(docs)

# Search!
results = retriever.get_relevant_documents(
    "What do cats like?"
)
# Returns: "Cats love milk" (highest score)

When to Use BM25

Use BM25 When… Don’t Use When…
Exact words matter Meaning matters more
Technical terms Synonym-heavy queries
Product codes Conversational questions

3. Self-Query Retriever

The Smart Filter

Imagine asking: “Find me comedy movies from 2020 with rating above 8”

A normal retriever would search for all those words. But a Self-Query Retriever is smarter - it understands your filters!

graph TD A["Find comedy movies from 2020, rating > 8"] A --> B["Self-Query Analyzes"] B --> C["Query: comedy movies"] B --> D["Filter: year=2020, rating>8"] C --> E["Search by Meaning"] D --> F["Apply Filters"] E --> G["Combined Results"] F --> G

The Magic Inside

from langchain.retrievers import (
    SelfQueryRetriever
)

# Define what filters exist
metadata_field_info = [
    {
        "name": "genre",
        "type": "string",
        "description": "Movie genre"
    },
    {
        "name": "year",
        "type": "integer",
        "description": "Release year"
    }
]

# Create the smart retriever
retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents="Movie descriptions",
    metadata_field_info=metadata_field_info
)

Real-World Use

User asks: “Red dresses under $50”

Self-Query splits into:

  • Search: “dresses” (meaning-based)
  • Filter: color=red, price<50

Super powerful for e-commerce and databases!


4. Parent Document Retriever

The Context Keeper

Here’s a problem: AI works best with small chunks of text. But small chunks lose context!

Parent Document Retriever solves this brilliantly:

  1. Store small chunks (for accurate searching)
  2. Return full documents (for complete context)

The Clever Trick

graph TD A["Big Document"] --> B["Split into Chunks"] B --> C["Chunk 1"] B --> D["Chunk 2"] B --> E["Chunk 3"] C --> F["Search finds Chunk 2"] F --> G["Return FULL Document!"]

Example

from langchain.retrievers import (
    ParentDocumentRetriever
)
from langchain.storage import InMemoryStore

# Storage for full documents
docstore = InMemoryStore()

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    child_splitter=child_splitter
)

# Add documents (auto-splits internally)
retriever.add_documents(documents)

# Search returns FULL parent docs
results = retriever.get_relevant_documents(
    "specific detail question"
)

Why This Matters

Without Parent Retriever With Parent Retriever
“…the cat sat on…” “Once upon a time, in a cozy house, there lived a fluffy cat. The cat sat on the warm windowsill…”

You get the complete story, not just a snippet!


5. Ensemble Retriever

The Team Approach

Why use one search method when you can use MANY?

Ensemble Retriever combines multiple retrievers and merges their results. It’s like asking 3 librarians and combining their recommendations!

graph TD A["Your Question"] --> B["BM25 Retriever"] A --> C["Vector Retriever"] A --> D["Other Retriever"] B --> E["Combine Results"] C --> E D --> E E --> F["Best of All Worlds!"]

How to Build One

from langchain.retrievers import (
    EnsembleRetriever
)

# Create two different retrievers
bm25_retriever = BM25Retriever.from_texts(docs)
vector_retriever = vectorstore.as_retriever()

# Combine them!
ensemble = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.5, 0.5]  # Equal importance
)

# Now searches use BOTH methods
results = ensemble.get_relevant_documents(
    "machine learning basics"
)

The Power of Weights

# Trust BM25 more (for exact matches)
weights=[0.7, 0.3]

# Trust vectors more (for meaning)
weights=[0.3, 0.7]

You can tune it for your specific use case!


6. Multi-Vector Retriever

Multiple Views, One Document

Some documents are complex. A research paper has:

  • A title
  • An abstract (summary)
  • Full content
  • Key findings

Multi-Vector Retriever stores multiple “views” of each document!

graph TD A["Research Paper"] --> B["Generate Summary"] A --> C["Extract Questions"] A --> D["Key Points"] B --> E["Vector Store"] C --> E D --> E E --> F["Search finds any view"] F --> G["Return Original Paper"]

Example Setup

from langchain.retrievers import (
    MultiVectorRetriever
)
from langchain.storage import InMemoryStore

# Store original docs
docstore = InMemoryStore()

retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    id_key="doc_id"
)

# For each document, create multiple vectors
for doc in documents:
    doc_id = str(uuid.uuid4())

    # Summary vector
    summary = llm.summarize(doc)
    # Questions vector
    questions = llm.generate_questions(doc)

    # Add all vectors pointing to same doc
    retriever.vectorstore.add_documents(
        [summary, questions],
        ids=[doc_id + "_sum", doc_id + "_q"]
    )
    retriever.docstore.mset([(doc_id, doc)])

Why Multiple Vectors?

  • User asks broad question → Summary matches!
  • User asks specific question → Content matches!
  • User asks “what does X explain?” → Question matches!

More ways to find the right document!


7. Contextual Compression

Squeeze Out the Noise

Sometimes retrievers return documents that are mostly irrelevant, with just one useful sentence buried inside.

Contextual Compression extracts ONLY the relevant parts!

graph TD A["Question"] --> B["Retriever"] B --> C["Document 1: 500 words"] B --> D["Document 2: 300 words"] C --> E["Compressor"] D --> E E --> F["Relevant sentence 1"] E --> G["Relevant sentence 2"]

How It Works

from langchain.retrievers import (
    ContextualCompressionRetriever
)
from langchain.retrievers.document_compressors import (
    LLMChainExtractor
)

# Create a compressor (uses LLM to extract)
compressor = LLMChainExtractor.from_llm(llm)

# Wrap any retriever with compression
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=base_retriever
)

# Results are now compressed!
results = compression_retriever.get_relevant_documents(
    "What is the capital of France?"
)

Before vs After Compression

Before (Raw) After (Compressed)
“France is a beautiful country in Europe. It has many tourist attractions. The capital of France is Paris. Paris has the Eiffel Tower…” “The capital of France is Paris.”

Only the answer, no fluff!


8. Document Reranking

The Quality Judge

Retrievers return documents, but are they in the best order?

Reranking is like having a judge review the librarian’s picks and say: “Actually, book #3 should be first!”

graph TD A["Question"] --> B["Retriever"] B --> C["Doc 1: Score 0.8"] B --> D["Doc 2: Score 0.7"] B --> E["Doc 3: Score 0.6"] C --> F["Reranker"] D --> F E --> F F --> G["Doc 3: Now &#35;1!"] F --> H["Doc 1: Now &#35;2"] F --> I["Doc 2: Now &#35;3"]

Why Rerank?

Initial retrieval is fast but rough. Reranking is slower but precise.

Think of it like:

  1. Google Search → Shows 100 results quickly
  2. You reading → Picks the best 3 carefully

Example with Cohere Reranker

from langchain.retrievers import (
    ContextualCompressionRetriever
)
from langchain_cohere import CohereRerank

# Create reranker
reranker = CohereRerank(
    model="rerank-english-v2.0"
)

# Wrap retriever with reranking
reranking_retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

# Get better-ordered results
results = reranking_retriever.get_relevant_documents(
    "How does photosynthesis work?"
)

Cross-Encoder Magic

Rerankers often use cross-encoders - they look at query AND document together, not separately. This gives much better relevance scores!


Putting It All Together 🎯

Here’s how real applications combine these strategies:

graph TD A["User Query"] --> B["Self-Query: Extract Filters"] B --> C["Ensemble: BM25 + Vector"] C --> D["Parent Docs: Get Full Context"] D --> E["Compression: Remove Noise"] E --> F["Rerank: Best Order"] F --> G["Top 3 Perfect Documents!"]

Choosing Your Strategy

Your Need Best Strategy
Exact keyword matches BM25
Filter by attributes Self-Query
Full document context Parent Document
Best of multiple methods Ensemble
Complex documents Multi-Vector
Remove irrelevant parts Compression
Perfect ordering Reranking

Quick Wins for Your Project 🚀

  1. Start simple: Use basic vector retriever first
  2. Add BM25: Ensemble with keywords helps a lot
  3. Enable compression: Cleaner results, better answers
  4. Consider reranking: When quality matters most

Remember: The best retrieval strategy depends on YOUR data and YOUR users. Experiment and measure!


Summary

You’ve learned how to be a master librarian for AI! Each retrieval strategy is a tool in your toolkit:

  • 🔍 BM25 - Word counting expert
  • đź§  Self-Query - Smart filter extractor
  • đź“„ Parent Document - Context keeper
  • 🤝 Ensemble - Team combiner
  • 🎭 Multi-Vector - Multiple perspectives
  • ✂️ Compression - Noise remover
  • 🏆 Reranking - Quality judge

Now go build amazing search systems! Your AI will thank you for finding the perfect information every time.

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.