🔍 Vector Search and RAG: Finding Your Perfect Match
The Library Analogy
Imagine you have a magical library with millions of books. But this library is special. Instead of organizing books by title or author, it organizes them by what they mean.
When you ask a question, the librarian doesn’t just look for your exact words. She finds books that understand your question — even if they use different words!
This is Vector Search. And when we combine it with AI to generate answers? That’s RAG (Retrieval-Augmented Generation).
🎯 What is a Vector Store?
Think of it like a memory palace for your AI.
Every piece of text (document, sentence, paragraph) gets converted into a list of numbers called a vector. These numbers capture the meaning of the text.
# A sentence becomes numbers!
"I love pizza" → [0.2, 0.8, 0.1, ...]
"Pizza is delicious" → [0.3, 0.7, 0.2, ...]
# Similar meanings = similar numbers!
Why does this matter?
- Similar meanings = vectors that are close together
- Different meanings = vectors that are far apart
🔎 Similarity Search
The Story
You’re in that magical library. You whisper: “Tell me about healthy breakfast ideas.”
The librarian runs through the library and brings back:
- “Oatmeal recipes for morning energy”
- “Nutritious ways to start your day”
- “Breakfast smoothie guide”
She didn’t search for your exact words. She found books with similar meaning.
How It Works
# Your question becomes a vector
query = "healthy breakfast ideas"
# Find the 3 most similar documents
results = vectorstore.similarity_search(
query,
k=3 # Return top 3 matches
)
The Magic Behind It
graph TD A["Your Question"] --> B["Convert to Vector"] B --> C["Compare with All Stored Vectors"] C --> D["Find Closest Matches"] D --> E["Return Top Results"]
Key Point: The closer two vectors are, the more similar their meaning!
🎪 Maximum Marginal Relevance (MMR)
The Problem
Imagine asking for “pizza recipes” and getting:
- Pizza dough recipe
- Pizza dough recipe (slightly different)
- Pizza dough recipe (from another book)
All relevant, but boring and repetitive!
The Solution: MMR
MMR is like a smart librarian who says:
“I’ll give you relevant results, but I’ll also make sure they’re different from each other!”
# Get diverse results, not just similar ones
results = vectorstore.max_marginal_relevance_search(
query="pizza recipes",
k=3, # Return 3 results
fetch_k=10, # Consider top 10 first
lambda_mult=0.5 # Balance: relevance vs diversity
)
What You Get Instead
- Pizza dough recipe
- Pizza sauce techniques
- Pizza topping combinations
Same topic, different angles!
Understanding lambda_mult
| Value | What It Means |
|---|---|
| 1.0 | 100% relevance (might be repetitive) |
| 0.5 | Balanced mix |
| 0.0 | 100% diversity (might miss relevant stuff) |
🏷️ Metadata Filtering
The Story
Back to our magical library. You say:
“I want pizza recipes, but ONLY from Italian cookbooks published after 2020.”
The librarian now has two jobs:
- Find books about pizza (similarity search)
- Filter by your rules (metadata filtering)
What is Metadata?
Extra information attached to each document:
# A document with metadata
{
"content": "How to make perfect pizza dough",
"metadata": {
"author": "Mario Rossi",
"year": 2022,
"cuisine": "Italian",
"difficulty": "easy"
}
}
Using Filters
# Search with filters
results = vectorstore.similarity_search(
query="pizza recipes",
k=3,
filter={
"cuisine": "Italian",
"year": {"$gte": 2020} # >= 2020
}
)
Common Filter Operations
| Symbol | Meaning | Example |
|---|---|---|
$eq |
Equals | {"year": {"$eq": 2022}} |
$ne |
Not equals | {"type": {"$ne": "draft"}} |
$gt |
Greater than | {"rating": {"$gt": 4}} |
$gte |
Greater or equal | {"year": {"$gte": 2020}} |
$lt |
Less than | {"price": {"$lt": 50}} |
$in |
In list | {"tag": {"$in": ["easy", "quick"]}} |
🔀 Hybrid Search
The Best of Both Worlds
Remember our two search methods?
- Keyword search (traditional): Finds exact words
- Vector search (semantic): Finds similar meanings
Hybrid search combines both!
Why Do We Need Both?
| Scenario | Best Method |
|---|---|
| Looking for “Python 3.11” | Keyword (exact match) |
| “How do I loop in Python?” | Vector (meaning) |
| “Python error TypeError” | Hybrid (both!) |
How It Works
graph TD A["Your Query"] --> B["Keyword Search"] A --> C["Vector Search"] B --> D["Merge Results"] C --> D D --> E["Rank & Return Best"]
Code Example
# Many vector stores support hybrid search
results = vectorstore.hybrid_search(
query="Python TypeError fix",
k=5,
alpha=0.5 # 0 = keyword only, 1 = vector only
)
When to Use What?
| Alpha Value | Use When |
|---|---|
| 0.0 - 0.3 | Searching for exact terms, codes, names |
| 0.4 - 0.6 | General questions (balanced) |
| 0.7 - 1.0 | Conceptual, meaning-based queries |
🔄 Vector Store as Retriever
The Big Picture
In LangChain, a Retriever is like a helper that fetches documents for your AI. And guess what? Your vector store can become one!
Why Make It a Retriever?
Think of it like this:
- Vector Store = The library
- Retriever = The librarian who knows how to use the library
When you turn a vector store into a retriever, you can:
- Plug it into LangChain chains
- Use it in RAG pipelines
- Combine it with other tools
Simple Conversion
# Turn your vector store into a retriever
retriever = vectorstore.as_retriever()
# Now use it!
docs = retriever.invoke("What is pizza?")
Customizing Your Retriever
# With search parameters
retriever = vectorstore.as_retriever(
search_type="similarity", # or "mmr"
search_kwargs={
"k": 5,
"filter": {"category": "recipes"}
}
)
Different Search Types
| Search Type | What It Does |
|---|---|
similarity |
Standard vector search |
mmr |
Diverse results (MMR) |
similarity_score_threshold |
Only if score > threshold |
Using in a RAG Chain
from langchain.chains import RetrievalQA
# Create a Q&A chain with your retriever
qa_chain = RetrievalQA.from_chain_type(
llm=my_llm,
retriever=retriever,
chain_type="stuff"
)
# Ask questions!
answer = qa_chain.invoke(
"What's the best pizza dough recipe?"
)
graph TD A["User Question"] --> B["Retriever"] B --> C["Fetch Relevant Docs"] C --> D["Pass to LLM"] D --> E["Generate Answer"] E --> F["Return to User"]
🎯 Putting It All Together
Here’s how all these pieces work in a real RAG system:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
# 1. Create vector store
vectorstore = Chroma.from_documents(
documents=my_docs,
embedding=OpenAIEmbeddings()
)
# 2. Convert to retriever with MMR
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={
"k": 4,
"fetch_k": 10,
"filter": {"verified": True}
}
)
# 3. Use in RAG pipeline
relevant_docs = retriever.invoke(
"How do I make pizza dough?"
)
📝 Quick Summary
| Concept | What It Does | When to Use |
|---|---|---|
| Similarity Search | Finds similar meanings | Default choice |
| MMR | Diverse + relevant results | Avoid repetition |
| Metadata Filtering | Filter by attributes | Narrow down results |
| Hybrid Search | Keywords + meaning | Best of both worlds |
| Retriever | LangChain integration | Building RAG pipelines |
🚀 You’ve Got This!
Vector search isn’t magic — it’s just smart math that understands meaning. Now you know:
- âś… How to find similar documents
- âś… How to get diverse results with MMR
- âś… How to filter by metadata
- âś… How to combine keyword and semantic search
- âś… How to use vector stores in LangChain
Go build something amazing! 🎉
