What is a retriever in LangChain?

A retriever is a search helper that takes your question, runs through documents, and returns the most relevant ones for the AI to use.

What is BM25 retrieval?

BM25 is a keyword-based retriever that counts word matches. It's great for exact terms but doesn't understand synonyms.

When should I use document reranking?

Use reranking when result quality matters most. It's slower but reorders documents more precisely using cross-encoders.

What is an ensemble retriever?

An ensemble retriever combines multiple search methods like BM25 and vector search, merging results for better coverage.

Retrieval Strategies | LangChain RAG Guide

Vector Search and RAG: Retrieval Strategies

The Library Analogy 📚

Imagine you’re a super-smart librarian in a magical library. People come asking questions, and your job is to find the BEST books to answer them. But here’s the twist: there are millions of books!

RAG (Retrieval-Augmented Generation) is like giving an AI its own magical librarian. Instead of guessing answers, the AI first retrieves the best information, then uses it to give you a perfect answer.

Today, we’ll learn all the different ways our magical librarian can find the right books!

1. Retriever Fundamentals

What Is a Retriever?

A retriever is your search helper. When you ask a question, it runs through all your documents and brings back the most relevant ones.

Think of it like this:

You: “Tell me about dinosaurs!” Retriever: runs to shelves → comes back with 5 best dinosaur books

How It Works (Simple Version)

graph TD
    A["Your Question"] --> B["Retriever"]
    B --> C["Searches Documents"]
    C --> D["Returns Top Matches"]
    D --> E["AI Uses These to Answer"]

The Basic Pattern

# Create a retriever from your documents
retriever = vectorstore.as_retriever()

# Ask it to find relevant docs
docs = retriever.get_relevant_documents(
    "What is photosynthesis?"
)

Key Point: Every retriever follows this pattern - you give it a question, it gives you documents!

2. BM25Retriever

The Word-Counting Champion

BM25 is like a librarian who counts words really carefully.

Imagine you ask: “Tell me about red apples”

BM25 thinks:

📖 Book A mentions “apple” 50 times → High score!
📖 Book B mentions “apple” 2 times → Low score
📖 Book C mentions “red” AND “apple” → Even higher!

Why It’s Special

BM25 is keyword-based. It doesn’t understand meaning - it just counts words smartly.

Good for: Finding exact matches Bad for: Understanding “automobile” = “car”

Simple Example

from langchain_community.retrievers import (
    BM25Retriever
)

# Your documents
docs = ["Cats love milk",
        "Dogs chase balls",
        "Cats are fluffy"]

# Create BM25 retriever
retriever = BM25Retriever.from_texts(docs)

# Search!
results = retriever.get_relevant_documents(
    "What do cats like?"
)
# Returns: "Cats love milk" (highest score)

When to Use BM25

Use BM25 When…	Don’t Use When…
Exact words matter	Meaning matters more
Technical terms	Synonym-heavy queries
Product codes	Conversational questions

3. Self-Query Retriever

The Smart Filter

Imagine asking: “Find me comedy movies from 2020 with rating above 8”

A normal retriever would search for all those words. But a Self-Query Retriever is smarter - it understands your filters!

graph TD
    A["Find comedy movies from 2020, rating &gt; 8"]
    A --> B["Self-Query Analyzes"]
    B --> C["Query: comedy movies"]
    B --> D["Filter: year=2020, rating&gt;8"]
    C --> E["Search by Meaning"]
    D --> F["Apply Filters"]
    E --> G["Combined Results"]
    F --> G

The Magic Inside

from langchain.retrievers import (
    SelfQueryRetriever
)

# Define what filters exist
metadata_field_info = [
    {
        "name": "genre",
        "type": "string",
        "description": "Movie genre"
    },
    {
        "name": "year",
        "type": "integer",
        "description": "Release year"
    }
]

# Create the smart retriever
retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents="Movie descriptions",
    metadata_field_info=metadata_field_info
)

Real-World Use

User asks: “Red dresses under $50”

Self-Query splits into:

Search: “dresses” (meaning-based)
Filter: color=red, price<50

Super powerful for e-commerce and databases!

4. Parent Document Retriever

The Context Keeper

Here’s a problem: AI works best with small chunks of text. But small chunks lose context!

Parent Document Retriever solves this brilliantly:

Store small chunks (for accurate searching)
Return full documents (for complete context)

The Clever Trick

graph TD
    A["Big Document"] --> B["Split into Chunks"]
    B --> C["Chunk 1"]
    B --> D["Chunk 2"]
    B --> E["Chunk 3"]
    C --> F["Search finds Chunk 2"]
    F --> G["Return FULL Document!"]

Example

from langchain.retrievers import (
    ParentDocumentRetriever
)
from langchain.storage import InMemoryStore

# Storage for full documents
docstore = InMemoryStore()

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    child_splitter=child_splitter
)

# Add documents (auto-splits internally)
retriever.add_documents(documents)

# Search returns FULL parent docs
results = retriever.get_relevant_documents(
    "specific detail question"
)

Why This Matters

Without Parent Retriever	With Parent Retriever
“…the cat sat on…”	“Once upon a time, in a cozy house, there lived a fluffy cat. The cat sat on the warm windowsill…”

You get the complete story, not just a snippet!

5. Ensemble Retriever

The Team Approach

Why use one search method when you can use MANY?

Ensemble Retriever combines multiple retrievers and merges their results. It’s like asking 3 librarians and combining their recommendations!

graph TD
    A["Your Question"] --> B["BM25 Retriever"]
    A --> C["Vector Retriever"]
    A --> D["Other Retriever"]
    B --> E["Combine Results"]
    C --> E
    D --> E
    E --> F["Best of All Worlds!"]

How to Build One

from langchain.retrievers import (
    EnsembleRetriever
)

# Create two different retrievers
bm25_retriever = BM25Retriever.from_texts(docs)
vector_retriever = vectorstore.as_retriever()

# Combine them!
ensemble = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.5, 0.5]  # Equal importance
)

# Now searches use BOTH methods
results = ensemble.get_relevant_documents(
    "machine learning basics"
)

The Power of Weights

# Trust BM25 more (for exact matches)
weights=[0.7, 0.3]

# Trust vectors more (for meaning)
weights=[0.3, 0.7]

You can tune it for your specific use case!

6. Multi-Vector Retriever

Multiple Views, One Document

Some documents are complex. A research paper has:

A title
An abstract (summary)
Full content
Key findings

Multi-Vector Retriever stores multiple “views” of each document!

graph TD
    A["Research Paper"] --> B["Generate Summary"]
    A --> C["Extract Questions"]
    A --> D["Key Points"]
    B --> E["Vector Store"]
    C --> E
    D --> E
    E --> F["Search finds any view"]
    F --> G["Return Original Paper"]

Example Setup

from langchain.retrievers import (
    MultiVectorRetriever
)
from langchain.storage import InMemoryStore

# Store original docs
docstore = InMemoryStore()

retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    id_key="doc_id"
)

# For each document, create multiple vectors
for doc in documents:
    doc_id = str(uuid.uuid4())

    # Summary vector
    summary = llm.summarize(doc)
    # Questions vector
    questions = llm.generate_questions(doc)

    # Add all vectors pointing to same doc
    retriever.vectorstore.add_documents(
        [summary, questions],
        ids=[doc_id + "_sum", doc_id + "_q"]
    )
    retriever.docstore.mset([(doc_id, doc)])

Why Multiple Vectors?

User asks broad question → Summary matches!
User asks specific question → Content matches!
User asks “what does X explain?” → Question matches!

7. Contextual Compression

Squeeze Out the Noise

Sometimes retrievers return documents that are mostly irrelevant, with just one useful sentence buried inside.

Contextual Compression extracts ONLY the relevant parts!

graph TD
    A["Question"] --> B["Retriever"]
    B --> C["Document 1: 500 words"]
    B --> D["Document 2: 300 words"]
    C --> E["Compressor"]
    D --> E
    E --> F["Relevant sentence 1"]
    E --> G["Relevant sentence 2"]

How It Works

from langchain.retrievers import (
    ContextualCompressionRetriever
)
from langchain.retrievers.document_compressors import (
    LLMChainExtractor
)

# Create a compressor (uses LLM to extract)
compressor = LLMChainExtractor.from_llm(llm)

# Wrap any retriever with compression
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=base_retriever
)

# Results are now compressed!
results = compression_retriever.get_relevant_documents(
    "What is the capital of France?"
)

Before vs After Compression

Before (Raw)	After (Compressed)
“France is a beautiful country in Europe. It has many tourist attractions. The capital of France is Paris. Paris has the Eiffel Tower…”	“The capital of France is Paris.”

Only the answer, no fluff!

8. Document Reranking

The Quality Judge

Retrievers return documents, but are they in the best order?

Reranking is like having a judge review the librarian’s picks and say: “Actually, book #3 should be first!”

graph TD
    A["Question"] --> B["Retriever"]
    B --> C["Doc 1: Score 0.8"]
    B --> D["Doc 2: Score 0.7"]
    B --> E["Doc 3: Score 0.6"]
    C --> F["Reranker"]
    D --> F
    E --> F
    F --> G["Doc 3: Now &#35;1!"]
    F --> H["Doc 1: Now &#35;2"]
    F --> I["Doc 2: Now &#35;3"]

Why Rerank?

Initial retrieval is fast but rough. Reranking is slower but precise.

Think of it like:

Google Search → Shows 100 results quickly
You reading → Picks the best 3 carefully

Example with Cohere Reranker

from langchain.retrievers import (
    ContextualCompressionRetriever
)
from langchain_cohere import CohereRerank

# Create reranker
reranker = CohereRerank(
    model="rerank-english-v2.0"
)

# Wrap retriever with reranking
reranking_retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

# Get better-ordered results
results = reranking_retriever.get_relevant_documents(
    "How does photosynthesis work?"
)

Cross-Encoder Magic

Rerankers often use cross-encoders - they look at query AND document together, not separately. This gives much better relevance scores!

Putting It All Together 🎯

Here’s how real applications combine these strategies:

graph TD
    A["User Query"] --> B["Self-Query: Extract Filters"]
    B --> C["Ensemble: BM25 + Vector"]
    C --> D["Parent Docs: Get Full Context"]
    D --> E["Compression: Remove Noise"]
    E --> F["Rerank: Best Order"]
    F --> G["Top 3 Perfect Documents!"]

Choosing Your Strategy

Your Need	Best Strategy
Exact keyword matches	BM25
Filter by attributes	Self-Query
Full document context	Parent Document
Best of multiple methods	Ensemble
Complex documents	Multi-Vector
Remove irrelevant parts	Compression
Perfect ordering	Reranking

Quick Wins for Your Project 🚀

Start simple: Use basic vector retriever first
Add BM25: Ensemble with keywords helps a lot
Enable compression: Cleaner results, better answers
Consider reranking: When quality matters most

Remember: The best retrieval strategy depends on YOUR data and YOUR users. Experiment and measure!

Summary

You’ve learned how to be a master librarian for AI! Each retrieval strategy is a tool in your toolkit:

🔍 BM25 - Word counting expert
🧠 Self-Query - Smart filter extractor
📄 Parent Document - Context keeper
🤝 Ensemble - Team combiner
🎭 Multi-Vector - Multiple perspectives
✂️ Compression - Noise remover
🏆 Reranking - Quality judge

Now go build amazing search systems! Your AI will thank you for finding the perfect information every time.

Retrieval Strategies

Unable to load concept

Coming Soon...

Vector Search and RAG: Retrieval Strategies

The Library Analogy 📚

1. Retriever Fundamentals

What Is a Retriever?

How It Works (Simple Version)

The Basic Pattern

2. BM25Retriever

The Word-Counting Champion

Why It’s Special

Simple Example

When to Use BM25

3. Self-Query Retriever

The Smart Filter

The Magic Inside

Real-World Use

4. Parent Document Retriever

The Context Keeper

The Clever Trick

Example

Why This Matters

5. Ensemble Retriever

The Team Approach

How to Build One

The Power of Weights

6. Multi-Vector Retriever

Multiple Views, One Document

Example Setup

Why Multiple Vectors?

7. Contextual Compression

Squeeze Out the Noise

How It Works

Before vs After Compression

8. Document Reranking

The Quality Judge

Why Rerank?

Example with Cohere Reranker

Cross-Encoder Magic

Putting It All Together 🎯

Choosing Your Strategy

Quick Wins for Your Project 🚀

Summary

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue