How do vector stores find similar content?

Your query becomes a vector. The store finds vectors that are 'close' to yours, like finding nearest neighbors on a map.

Which vector store should I use in LangChain?

Use Chroma to get started (easy setup), FAISS for fast local search, or Pinecone for production apps that need to scale.

Vector Store Basics | LangChain Guide

Q: What is a vector store?

A vector store is a database that saves embeddings (numbers capturing meaning) and finds similar content fast. It searches by meaning, not keywords.

🏛️ Vector Store Basics: Your AI’s Super Library

Imagine you’re building the world’s smartest library. Not one where books sit on dusty shelves—but one where the librarian instantly knows which books are most similar to your thoughts, even if you don’t know the exact title!

🎯 The Big Picture

You want your AI to remember things and find related information fast. That’s what Vector Stores do. They’re like magical filing cabinets that understand meaning, not just keywords.

📚 What is a Vector Store?

The Simple Story

Think of a vector as a secret code that captures what something means.

Example:

The word “puppy” might become [0.8, 0.2, 0.9]
The word “dog” might become [0.7, 0.3, 0.85]
The word “car” might become [0.1, 0.9, 0.1]

Notice how “puppy” and “dog” have similar numbers? That’s because they mean similar things!

A Vector Store is a special database that:

Stores these secret codes (vectors)
Finds similar codes super fast
Returns the original text you stored

Real Life Example

You ask: “What’s a good pet for kids?”

The vector store thinks:

“Hmm, this question is similar to documents about puppies, cats, and hamsters… NOT similar to documents about cars or computers!”

Then it returns the most relevant documents!

🧠 Vector Store Fundamentals

The Three Magic Steps

graph TD
    A["📄 Your Text"] --> B["🔢 Convert to Vector"]
    B --> C["💾 Store in Database"]
    C --> D["🔍 Search by Similarity"]

Step 1: Embedding (Making the Secret Code)

Your text becomes a list of numbers called an embedding.

# LangChain makes this easy!
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vector = embeddings.embed_query("I love pizza")
# Returns: [0.01, -0.02, 0.05, ...]

Step 2: Storing (Putting It in the Library)

The vector + original text go into the store together.

Step 3: Searching (Finding Similar Things)

When you search, your question becomes a vector too. The store finds vectors that are “close” to yours!

Think of it like this:

Your question is a point on a map
Stored documents are other points
The store finds the nearest neighbors!

🛠️ Vector Store Options in LangChain

LangChain works with MANY vector stores. Here are the popular ones:

Quick Comparison

Store	Best For	Setup
Chroma	Getting started	Easy
FAISS	Fast local search	Easy
Pinecone	Production apps	Medium
Weaviate	Complex queries	Medium
Qdrant	Large scale	Medium

🥇 Chroma (Perfect for Learning!)

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

# Create a vector store
vectorstore = Chroma(
    embedding_function=OpenAIEmbeddings()
)

Why Chroma?

Runs locally (no internet needed!)
Zero setup headaches
Great for prototyping

🚀 FAISS (Super Fast!)

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(
    documents,
    OpenAIEmbeddings()
)

Why FAISS?

Made by Facebook AI
Blazingly fast searches
Works on your laptop

☁️ Pinecone (For Real Apps)

from langchain_pinecone import PineconeVectorStore

vectorstore = PineconeVectorStore(
    index_name="my-index",
    embedding=OpenAIEmbeddings()
)

Why Pinecone?

Cloud-hosted (always available)
Scales to millions of vectors
Production-ready

📥 Adding and Indexing Documents

The Journey of a Document

graph TD
    A["📄 Raw Document"] --> B["✂️ Split into Chunks"]
    B --> C["🔢 Create Embeddings"]
    C --> D["💾 Store with Metadata"]
    D --> E["✅ Ready to Search!"]

Method 1: Add Texts Directly

texts = [
    "Dogs are loyal pets.",
    "Cats are independent.",
    "Fish need aquariums."
]

# Add to vector store
vectorstore.add_texts(texts)

Method 2: Add Documents with Metadata

from langchain.schema import Document

docs = [
    Document(
        page_content="Dogs are loyal pets.",
        metadata={"animal": "dog", "type": "pet"}
    ),
    Document(
        page_content="Cats are independent.",
        metadata={"animal": "cat", "type": "pet"}
    )
]

vectorstore.add_documents(docs)

Why Metadata Matters:

Filter searches: “Only show me dog articles!”
Track sources: “Where did this info come from?”
Add context: “When was this written?”

Method 3: Create from Documents (All at Once!)

# Load, split, and index in one go!
vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings()
)

🎯 Chunking: Why Size Matters

Big documents need to be split into smaller pieces:

from langchain.text_splitter import (
    RecursiveCharacterTextSplitter
)

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,      # Characters per chunk
    chunk_overlap=50     # Overlap between chunks
)

chunks = splitter.split_documents(docs)

The Goldilocks Rule:

Too big → Loses focus, wastes tokens
Too small → Loses context, misses meaning
Just right → 500-1000 characters usually works!

🔌 The Indexing API

LangChain’s Indexing API is your smart assistant that:

✅ Avoids duplicates (no wasted storage!)
✅ Tracks what’s been indexed
✅ Updates only what changed
✅ Deletes outdated content

The Problem It Solves

Without the Indexing API:

Re-run your script → Duplicates everywhere!
Update a document → Old version still there!
Delete a source → Ghost data haunts you!

Setting Up the Indexing API

from langchain.indexes import SQLRecordManager
from langchain.indexes import index

# Create a record manager
record_manager = SQLRecordManager(
    namespace="my_docs",
    db_url="sqlite:///records.db"
)

# Initialize it
record_manager.create_schema()

Indexing Modes Explained

# Mode 1: "None" - Just add, ignore duplicates
index(docs, record_manager, vectorstore,
      cleanup=None)

# Mode 2: "Incremental" - Smart updates
index(docs, record_manager, vectorstore,
      cleanup="incremental")

# Mode 3: "Full" - Complete sync
index(docs, record_manager, vectorstore,
      cleanup="full")

Mode	What It Does
`None`	Adds everything (may duplicate)
`Incremental`	Adds new, skips existing
`Full`	Adds new, removes missing

Real-World Example

# First run: indexes 3 documents
docs = [doc1, doc2, doc3]
index(docs, record_manager, vectorstore,
      cleanup="full")
# Result: 3 docs in store

# Second run: doc3 removed, doc4 added
docs = [doc1, doc2, doc4]
index(docs, record_manager, vectorstore,
      cleanup="full")
# Result: doc3 deleted, doc4 added!
# Only doc1, doc2, doc4 remain

Source IDs: Track Your Documents

# Add source tracking
index(
    docs,
    record_manager,
    vectorstore,
    cleanup="full",
    source_id_key="source"  # Uses metadata
)

Now each document knows where it came from!

🔍 Searching Your Vector Store

Once indexed, searching is magical:

# Simple search
results = vectorstore.similarity_search(
    "What pet is best for kids?",
    k=3  # Return top 3 matches
)

# Search with scores
results = vectorstore.similarity_search_with_score(
    "What pet is best for kids?",
    k=3
)

Filter by Metadata

# Only search dog documents
results = vectorstore.similarity_search(
    "training tips",
    k=3,
    filter={"animal": "dog"}
)

🎉 Putting It All Together

Here’s a complete mini-project:

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.schema import Document

# 1. Create documents
docs = [
    Document(
        page_content="Golden Retrievers are friendly.",
        metadata={"animal": "dog"}
    ),
    Document(
        page_content="Siamese cats are vocal.",
        metadata={"animal": "cat"}
    ),
    Document(
        page_content="Goldfish are easy to care for.",
        metadata={"animal": "fish"}
    )
]

# 2. Create vector store
vectorstore = Chroma.from_documents(
    docs,
    OpenAIEmbeddings()
)

# 3. Search!
results = vectorstore.similarity_search(
    "I want a friendly pet",
    k=2
)

# Output: Golden Retriever doc comes first!

🧩 Key Takeaways

Vectors are number-lists that capture meaning
Vector Stores save and search these vectors
Many options: Chroma (easy), FAISS (fast), Pinecone (production)
Add documents with add_texts() or add_documents()
Indexing API prevents duplicates and keeps data fresh
Search finds similar content by meaning, not keywords!

🚀 You’re Ready!

You now understand how to:

✅ Pick the right vector store
✅ Add and index documents properly
✅ Use the Indexing API like a pro
✅ Search by meaning, not just keywords

Next up: Use this with RAG to build AI that answers questions from YOUR data!

Remember: Vector stores are just smart filing cabinets. You put stuff in, they remember the meaning, and find similar things lightning-fast! ⚡

Vector Store Basics

Unable to load concept

Coming Soon...

🏛️ Vector Store Basics: Your AI’s Super Library

🎯 The Big Picture

📚 What is a Vector Store?

The Simple Story

Real Life Example

🧠 Vector Store Fundamentals

The Three Magic Steps

Step 1: Embedding (Making the Secret Code)

Step 2: Storing (Putting It in the Library)

Step 3: Searching (Finding Similar Things)

🛠️ Vector Store Options in LangChain

Quick Comparison

🥇 Chroma (Perfect for Learning!)

🚀 FAISS (Super Fast!)

☁️ Pinecone (For Real Apps)

📥 Adding and Indexing Documents

The Journey of a Document

Method 1: Add Texts Directly

Method 2: Add Documents with Metadata

Method 3: Create from Documents (All at Once!)

🎯 Chunking: Why Size Matters

🔌 The Indexing API

The Problem It Solves

Setting Up the Indexing API

Indexing Modes Explained

Real-World Example

Source IDs: Track Your Documents

🔍 Searching Your Vector Store

Filter by Metadata

🎉 Putting It All Together

🧩 Key Takeaways

🚀 You’re Ready!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue