🧭 Vector Search and RAG: The Magic of Embeddings
The Story of the Librarian Who Understood Meaning
Imagine you’re a librarian in a magical library. But this isn’t an ordinary library — books here don’t have titles or authors on the cover. Instead, each book is stored as a special “location code” that tells you what the book is about, not just what it’s called.
When someone asks for “a story about brave heroes fighting dragons,” you don’t search for those exact words. Instead, you find books whose location codes are close together in your magical map — because similar stories live near each other!
That’s exactly what embeddings do for AI! 🎯
🎪 What Are Embeddings? (Overview)
The Simple Idea
An embedding is like a secret address for words, sentences, or documents.
Instead of storing text as letters (“Hello”), we convert it into a list of numbers:
"Hello" → [0.23, -0.45, 0.89, 0.12, ...]
These numbers capture the meaning — not just the spelling.
Why Numbers?
Think of it like GPS coordinates! 🗺️
| Your Address | GPS Coordinates |
|---|---|
| “My House” | (40.7128, -74.0060) |
| “Neighbor’s House” | (40.7130, -74.0058) |
Close coordinates = close locations!
With embeddings:
- “Happy” and “Joyful” → nearby numbers
- “Happy” and “Refrigerator” → far apart numbers
Real Example
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
# Turn text into numbers!
result = embeddings.embed_query("I love pizza")
print(len(result)) # 1536 numbers!
The computer now “understands” that “I love pizza” is similar to “Pizza is my favorite food” — because their number lists look alike!
⚙️ Embedding Model Configuration
Picking Your Translator
Different embedding models are like different translators. Some are fast, some are accurate, some are cheap!
graph TD A["Choose Model"] --> B{What matters most?} B -->|Speed| C["text-embedding-3-small"] B -->|Accuracy| D["text-embedding-3-large"] B -->|Free/Local| E["HuggingFace Models"] B -->|Privacy| F["Ollama Local"]
OpenAI Configuration
from langchain_openai import OpenAIEmbeddings
# Basic setup
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small"
)
# With more options
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
dimensions=1024 # Smaller = faster
)
HuggingFace (Free!)
from langchain_huggingface import (
HuggingFaceEmbeddings
)
embeddings = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2"
)
Configuration Tips
| Setting | What It Does |
|---|---|
model |
Which brain to use |
dimensions |
Size of number list |
chunk_size |
How many at once |
Remember: Bigger isn’t always better! A smaller, faster model often works great.
🛠️ Creating Embeddings
Two Types of Creation
Think of embeddings like making IDs:
- Query Embeddings — for questions you ask
- Document Embeddings — for information you store
graph TD A["Your Text"] --> B{What is it?} B -->|A Question| C["embed_query"] B -->|Info to Store| D["embed_documents"] C --> E["One Vector"] D --> F["List of Vectors"]
Single Query
When someone asks a question:
# User asks: "What's the weather?"
question = "What's the weather?"
# Turn it into numbers
query_vector = embeddings.embed_query(
question
)
print(type(query_vector)) # list
print(len(query_vector)) # 1536
Multiple Documents
When storing information:
# Your knowledge base
docs = [
"The sun is a star",
"Water boils at 100°C",
"Python is a programming language"
]
# Turn ALL into numbers
doc_vectors = embeddings.embed_documents(
docs
)
print(len(doc_vectors)) # 3 vectors
print(len(doc_vectors[0])) # 1536 each
Quick Example: Finding Similar Text
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
# Store some facts
facts = [
"Dogs are loyal pets",
"Cats are independent",
"Fish live in water"
]
fact_vectors = embeddings.embed_documents(facts)
# Ask a question
question = "Which pet loves its owner?"
q_vector = embeddings.embed_query(question)
# Now compare! (we'll learn how soon)
💾 CacheBackedEmbeddings
The Problem
Creating embeddings costs:
- ⏱️ Time — API calls take milliseconds
- 💰 Money — Each call = more charges
- 🌐 Bandwidth — Network requests
What if you embed the same text twice? Wasteful!
The Solution: Caching!
Imagine a notebook 📓 where you write down every translation you’ve ever done. Next time someone asks for the same translation — just look it up!
graph TD A["Text to Embed"] --> B{In Cache?} B -->|Yes!| C["Return Saved Result"] B -->|No| D["Call API"] D --> E["Save to Cache"] E --> C
How to Use It
from langchain.embeddings import (
CacheBackedEmbeddings
)
from langchain.storage import (
LocalFileStore
)
from langchain_openai import OpenAIEmbeddings
# 1. Create your base embeddings
base = OpenAIEmbeddings()
# 2. Create a place to store cache
store = LocalFileStore("./cache/")
# 3. Wrap with caching!
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
base,
store,
namespace="openai"
)
Why Namespace Matters
# Different models = different caches
cache_openai = CacheBackedEmbeddings.from_bytes_store(
OpenAIEmbeddings(),
store,
namespace="openai" # Separate!
)
cache_huggingface = CacheBackedEmbeddings.from_bytes_store(
HuggingFaceEmbeddings(),
store,
namespace="huggingface" # Separate!
)
Cache Storage Options
| Storage | Best For |
|---|---|
LocalFileStore |
Simple projects |
RedisStore |
Production apps |
InMemoryStore |
Testing only |
Real Savings
import time
# First call - hits API
start = time.time()
cached_embeddings.embed_documents(texts)
print(f"First: {time.time()-start:.2f}s")
# Second call - from cache!
start = time.time()
cached_embeddings.embed_documents(texts)
print(f"Cached: {time.time()-start:.4f}s")
# Output:
# First: 0.45s
# Cached: 0.0012s ← 375x faster!
📏 Embedding Similarity Metrics
How Do We Compare Embeddings?
Remember our magical library? We need to know: How close are two books?
There are different ways to measure “closeness”:
1. Cosine Similarity (Most Popular!)
Think of two arrows pointing from the center of a circle:
- Same direction = 1.0 (identical!)
- Opposite direction = -1.0 (opposites)
- Perpendicular = 0.0 (unrelated)
graph LR A((Center)) --> B["Happy 😊"] A --> C["Joyful 🎉"] A --> D["Sad 😢"] B -.->|0.95| C B -.->|-0.3| D
from numpy import dot
from numpy.linalg import norm
def cosine_similarity(a, b):
return dot(a, b) / (norm(a) * norm(b))
# Compare two embeddings
score = cosine_similarity(
embed_query("happy"),
embed_query("joyful")
)
print(score) # ~0.92 Very similar!
2. Euclidean Distance
Like measuring with a ruler on a map:
- Smaller = more similar
- Bigger = less similar
from numpy.linalg import norm
def euclidean_distance(a, b):
return norm(a - b)
dist = euclidean_distance(vec1, vec2)
# 0.0 = identical
# Higher = more different
3. Dot Product
Simple multiplication and sum:
- Higher = more similar
- Works best with normalized vectors
from numpy import dot
similarity = dot(vec1, vec2)
Which Should You Use?
| Metric | When to Use |
|---|---|
| Cosine | Most cases! Default choice |
| Euclidean | When magnitude matters |
| Dot Product | Normalized vectors, speed |
Quick Comparison Tool
from langchain_openai import OpenAIEmbeddings
from numpy import dot
from numpy.linalg import norm
embeddings = OpenAIEmbeddings()
def compare(text1, text2):
v1 = embeddings.embed_query(text1)
v2 = embeddings.embed_query(text2)
cos = dot(v1, v2) / (norm(v1) * norm(v2))
return round(cos, 3)
# Try it!
print(compare("I love cats", "Cats are great"))
# ~0.89
print(compare("I love cats", "The stock market"))
# ~0.23
🎯 Putting It All Together
Here’s a complete mini-example combining everything:
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import (
CacheBackedEmbeddings
)
from langchain.storage import LocalFileStore
from numpy import dot
from numpy.linalg import norm
# 1. Configure model
base_embeddings = OpenAIEmbeddings(
model="text-embedding-3-small"
)
# 2. Add caching
store = LocalFileStore("./my_cache/")
embeddings = CacheBackedEmbeddings.from_bytes_store(
base_embeddings,
store,
namespace="demo"
)
# 3. Create embeddings
docs = [
"Python is great for AI",
"JavaScript runs in browsers",
"Machine learning needs data"
]
doc_vectors = embeddings.embed_documents(docs)
# 4. Search with similarity
query = "Which language for AI?"
q_vector = embeddings.embed_query(query)
# 5. Find best match
for i, doc_vec in enumerate(doc_vectors):
score = dot(q_vector, doc_vec) / (
norm(q_vector) * norm(doc_vec)
)
print(f"{docs[i]}: {score:.3f}")
# Output:
# Python is great for AI: 0.891 ← Winner!
# JavaScript runs in browsers: 0.654
# Machine learning needs data: 0.823
🚀 Key Takeaways
| Concept | Remember This |
|---|---|
| Embeddings | Turn text into meaning-numbers |
| Configuration | Pick model by speed/cost/accuracy |
| Creating | embed_query for questions, embed_documents for data |
| Caching | Save time and money — don’t repeat! |
| Similarity | Cosine similarity = your best friend |
You did it! 🎉 You now understand how AI “reads” text by converting it into numbers that capture meaning. This is the foundation of every modern search engine, chatbot, and recommendation system!
