Embeddings

Back

Loading concept...

🧭 Vector Search and RAG: The Magic of Embeddings

The Story of the Librarian Who Understood Meaning

Imagine you’re a librarian in a magical library. But this isn’t an ordinary library — books here don’t have titles or authors on the cover. Instead, each book is stored as a special “location code” that tells you what the book is about, not just what it’s called.

When someone asks for “a story about brave heroes fighting dragons,” you don’t search for those exact words. Instead, you find books whose location codes are close together in your magical map — because similar stories live near each other!

That’s exactly what embeddings do for AI! 🎯


🎪 What Are Embeddings? (Overview)

The Simple Idea

An embedding is like a secret address for words, sentences, or documents.

Instead of storing text as letters (“Hello”), we convert it into a list of numbers:

"Hello" → [0.23, -0.45, 0.89, 0.12, ...]

These numbers capture the meaning — not just the spelling.

Why Numbers?

Think of it like GPS coordinates! 🗺️

Your Address GPS Coordinates
“My House” (40.7128, -74.0060)
“Neighbor’s House” (40.7130, -74.0058)

Close coordinates = close locations!

With embeddings:

  • “Happy” and “Joyful” → nearby numbers
  • “Happy” and “Refrigerator” → far apart numbers

Real Example

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Turn text into numbers!
result = embeddings.embed_query("I love pizza")
print(len(result))  # 1536 numbers!

The computer now “understands” that “I love pizza” is similar to “Pizza is my favorite food” — because their number lists look alike!


⚙️ Embedding Model Configuration

Picking Your Translator

Different embedding models are like different translators. Some are fast, some are accurate, some are cheap!

graph TD A["Choose Model"] --> B{What matters most?} B -->|Speed| C["text-embedding-3-small"] B -->|Accuracy| D["text-embedding-3-large"] B -->|Free/Local| E["HuggingFace Models"] B -->|Privacy| F["Ollama Local"]

OpenAI Configuration

from langchain_openai import OpenAIEmbeddings

# Basic setup
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

# With more options
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    dimensions=1024  # Smaller = faster
)

HuggingFace (Free!)

from langchain_huggingface import (
    HuggingFaceEmbeddings
)

embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

Configuration Tips

Setting What It Does
model Which brain to use
dimensions Size of number list
chunk_size How many at once

Remember: Bigger isn’t always better! A smaller, faster model often works great.


🛠️ Creating Embeddings

Two Types of Creation

Think of embeddings like making IDs:

  1. Query Embeddings — for questions you ask
  2. Document Embeddings — for information you store
graph TD A["Your Text"] --> B{What is it?} B -->|A Question| C["embed_query"] B -->|Info to Store| D["embed_documents"] C --> E["One Vector"] D --> F["List of Vectors"]

Single Query

When someone asks a question:

# User asks: "What's the weather?"
question = "What's the weather?"

# Turn it into numbers
query_vector = embeddings.embed_query(
    question
)

print(type(query_vector))  # list
print(len(query_vector))   # 1536

Multiple Documents

When storing information:

# Your knowledge base
docs = [
    "The sun is a star",
    "Water boils at 100°C",
    "Python is a programming language"
]

# Turn ALL into numbers
doc_vectors = embeddings.embed_documents(
    docs
)

print(len(doc_vectors))     # 3 vectors
print(len(doc_vectors[0]))  # 1536 each

Quick Example: Finding Similar Text

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Store some facts
facts = [
    "Dogs are loyal pets",
    "Cats are independent",
    "Fish live in water"
]
fact_vectors = embeddings.embed_documents(facts)

# Ask a question
question = "Which pet loves its owner?"
q_vector = embeddings.embed_query(question)

# Now compare! (we'll learn how soon)

💾 CacheBackedEmbeddings

The Problem

Creating embeddings costs:

  • ⏱️ Time — API calls take milliseconds
  • 💰 Money — Each call = more charges
  • 🌐 Bandwidth — Network requests

What if you embed the same text twice? Wasteful!

The Solution: Caching!

Imagine a notebook 📓 where you write down every translation you’ve ever done. Next time someone asks for the same translation — just look it up!

graph TD A["Text to Embed"] --> B{In Cache?} B -->|Yes!| C["Return Saved Result"] B -->|No| D["Call API"] D --> E["Save to Cache"] E --> C

How to Use It

from langchain.embeddings import (
    CacheBackedEmbeddings
)
from langchain.storage import (
    LocalFileStore
)
from langchain_openai import OpenAIEmbeddings

# 1. Create your base embeddings
base = OpenAIEmbeddings()

# 2. Create a place to store cache
store = LocalFileStore("./cache/")

# 3. Wrap with caching!
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    base,
    store,
    namespace="openai"
)

Why Namespace Matters

# Different models = different caches
cache_openai = CacheBackedEmbeddings.from_bytes_store(
    OpenAIEmbeddings(),
    store,
    namespace="openai"  # Separate!
)

cache_huggingface = CacheBackedEmbeddings.from_bytes_store(
    HuggingFaceEmbeddings(),
    store,
    namespace="huggingface"  # Separate!
)

Cache Storage Options

Storage Best For
LocalFileStore Simple projects
RedisStore Production apps
InMemoryStore Testing only

Real Savings

import time

# First call - hits API
start = time.time()
cached_embeddings.embed_documents(texts)
print(f"First: {time.time()-start:.2f}s")

# Second call - from cache!
start = time.time()
cached_embeddings.embed_documents(texts)
print(f"Cached: {time.time()-start:.4f}s")

# Output:
# First: 0.45s
# Cached: 0.0012s  ← 375x faster!

📏 Embedding Similarity Metrics

How Do We Compare Embeddings?

Remember our magical library? We need to know: How close are two books?

There are different ways to measure “closeness”:

1. Cosine Similarity (Most Popular!)

Think of two arrows pointing from the center of a circle:

  • Same direction = 1.0 (identical!)
  • Opposite direction = -1.0 (opposites)
  • Perpendicular = 0.0 (unrelated)
graph LR A((Center)) --> B["Happy 😊"] A --> C["Joyful 🎉"] A --> D["Sad 😢"] B -.->|0.95| C B -.->|-0.3| D
from numpy import dot
from numpy.linalg import norm

def cosine_similarity(a, b):
    return dot(a, b) / (norm(a) * norm(b))

# Compare two embeddings
score = cosine_similarity(
    embed_query("happy"),
    embed_query("joyful")
)
print(score)  # ~0.92 Very similar!

2. Euclidean Distance

Like measuring with a ruler on a map:

  • Smaller = more similar
  • Bigger = less similar
from numpy.linalg import norm

def euclidean_distance(a, b):
    return norm(a - b)

dist = euclidean_distance(vec1, vec2)
# 0.0 = identical
# Higher = more different

3. Dot Product

Simple multiplication and sum:

  • Higher = more similar
  • Works best with normalized vectors
from numpy import dot

similarity = dot(vec1, vec2)

Which Should You Use?

Metric When to Use
Cosine Most cases! Default choice
Euclidean When magnitude matters
Dot Product Normalized vectors, speed

Quick Comparison Tool

from langchain_openai import OpenAIEmbeddings
from numpy import dot
from numpy.linalg import norm

embeddings = OpenAIEmbeddings()

def compare(text1, text2):
    v1 = embeddings.embed_query(text1)
    v2 = embeddings.embed_query(text2)

    cos = dot(v1, v2) / (norm(v1) * norm(v2))
    return round(cos, 3)

# Try it!
print(compare("I love cats", "Cats are great"))
# ~0.89

print(compare("I love cats", "The stock market"))
# ~0.23

🎯 Putting It All Together

Here’s a complete mini-example combining everything:

from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import (
    CacheBackedEmbeddings
)
from langchain.storage import LocalFileStore
from numpy import dot
from numpy.linalg import norm

# 1. Configure model
base_embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

# 2. Add caching
store = LocalFileStore("./my_cache/")
embeddings = CacheBackedEmbeddings.from_bytes_store(
    base_embeddings,
    store,
    namespace="demo"
)

# 3. Create embeddings
docs = [
    "Python is great for AI",
    "JavaScript runs in browsers",
    "Machine learning needs data"
]
doc_vectors = embeddings.embed_documents(docs)

# 4. Search with similarity
query = "Which language for AI?"
q_vector = embeddings.embed_query(query)

# 5. Find best match
for i, doc_vec in enumerate(doc_vectors):
    score = dot(q_vector, doc_vec) / (
        norm(q_vector) * norm(doc_vec)
    )
    print(f"{docs[i]}: {score:.3f}")

# Output:
# Python is great for AI: 0.891  ← Winner!
# JavaScript runs in browsers: 0.654
# Machine learning needs data: 0.823

🚀 Key Takeaways

Concept Remember This
Embeddings Turn text into meaning-numbers
Configuration Pick model by speed/cost/accuracy
Creating embed_query for questions, embed_documents for data
Caching Save time and money — don’t repeat!
Similarity Cosine similarity = your best friend

You did it! 🎉 You now understand how AI “reads” text by converting it into numbers that capture meaning. This is the foundation of every modern search engine, chatbot, and recommendation system!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.