🚀 LangChain Production: Extending and Testing
The Master Chef’s Kitchen 👨🍳
Imagine you’ve learned to cook amazing dishes. Now you want to open a restaurant! But serving food at home and running a real restaurant are very different things.
Your home kitchen = Development (experimenting, testing recipes) A real restaurant = Production (fast service, consistent quality, handling many customers)
This guide teaches you how to turn your LangChain experiments into a real restaurant that serves thousands of customers perfectly every time!
🏎️ Performance Optimization
What is Performance Optimization?
Think of it like making your bicycle go faster:
- 🚲 A slow bike = Your AI takes too long to answer
- 🏎️ A fast race car = Your AI answers quickly!
Why does speed matter?
- Nobody likes waiting
- Faster = happier users
- Faster = lower costs (you pay for AI time!)
The Three Speed Secrets
1. Caching: Remember What You Already Know
Imagine your teacher asks: “What’s 2 + 2?”
The first time, you think hard. But the 100th time? You just remember: 4!
That’s caching! Store answers you’ve seen before.
from langchain.cache import InMemoryCache
from langchain.globals import set_llm_cache
# Turn on your "memory"
set_llm_cache(InMemoryCache())
# First time: Thinks hard (slow)
response = llm("What is the capital of France?")
# Second time: Remembers! (fast)
response = llm("What is the capital of France?")
2. Batching: Do Many Things at Once
Instead of making 10 trips to the store (one for milk, one for bread, one for eggs…), go once and get everything!
# ❌ Slow: One at a time
for question in questions:
answer = llm(question)
# ✅ Fast: All at once!
answers = llm.batch(questions)
3. Streaming: Show Progress
When downloading a movie, you don’t wait for the whole thing. You watch while it loads!
# Stream words as they come
for chunk in llm.stream("Tell me a story"):
print(chunk, end="") # Shows each word!
Performance Flow
graph TD A["User Question"] --> B{In Cache?} B -->|Yes| C["Return Fast!"] B -->|No| D["Ask AI"] D --> E["Save to Cache"] E --> F["Return Answer"]
🎨 Multimodal Capabilities
What is Multimodal?
Modal = A way to communicate
You talk in many ways:
- 👄 Words (text)
- 👁️ Pictures (images)
- 👂 Sounds (audio)
Multimodal AI understands ALL of these together!
Real Life Examples
| Input | What AI Can Do |
|---|---|
| 📷 Photo of homework | Help solve math problems |
| 🎵 Voice message | Convert to text and respond |
| 📄 Document photo | Read and summarize it |
Using Images with LangChain
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
# Create AI that sees images
llm = ChatOpenAI(model="gpt-4-vision-preview")
# Ask about a picture
message = HumanMessage(
content=[
{"type": "text",
"text": "What's in this picture?"},
{"type": "image_url",
"image_url": "https://example.com/cat.jpg"}
]
)
response = llm.invoke([message])
Multimodal Magic
graph TD A["🖼️ Image"] --> D["Multimodal AI"] B["📝 Text"] --> D C["🎵 Audio"] --> D D --> E["Smart Answer!"]
🧩 Custom Components
What are Custom Components?
Think of LEGO blocks. LangChain gives you standard blocks, but sometimes you need a special piece for your spaceship!
Custom components = Your own LEGO pieces
Types of Custom Components
1. Custom LLM (Your Own Brain)
Maybe you have a special AI model at your company:
from langchain.llms.base import LLM
class MyCompanyLLM(LLM):
def _call(self, prompt, stop=None):
# Your special AI logic here
return "My custom response!"
@property
def _llm_type(self):
return "my_company_llm"
2. Custom Tools (Your Own Superpowers)
Give your AI new abilities:
from langchain.tools import BaseTool
class WeatherTool(BaseTool):
name = "weather"
description = "Gets current weather"
def _run(self, city: str):
# Your weather API code
return f"It's sunny in {city}!"
3. Custom Retrievers (Your Own Library)
Teach AI where to find information:
from langchain.schema import BaseRetriever
class MyRetriever(BaseRetriever):
def _get_relevant_documents(self, query):
# Search your database
return [Document(page_content="...")]
Custom Components Flow
graph TD A["Standard LangChain"] --> B["+ Custom LLM"] B --> C["+ Custom Tools"] C --> D["+ Custom Retriever"] D --> E["Your Unique App!"]
🧪 Testing LangChain Applications
Why Test?
Would you eat at a restaurant that never tastes its food? 🤢
Testing = Tasting your food before serving it!
The Three Levels of Testing
1. Unit Tests: Test Each Ingredient
Test small pieces one at a time:
def test_my_prompt():
prompt = MyPrompt()
result = prompt.format(name="Alice")
# Check the result
assert "Alice" in result
assert "Hello" in result
2. Integration Tests: Test the Recipe
Test how pieces work together:
def test_chain_works():
chain = create_my_chain()
result = chain.invoke({"question": "Hi!"})
# Check it responded
assert result is not None
assert len(result) > 0
3. End-to-End Tests: Test the Whole Meal
Test everything like a real customer:
def test_full_conversation():
app = MyLangChainApp()
# Simulate real user
response = app.chat("What's the weather?")
assert "weather" in response.lower()
Mocking: Fake Ingredients
Sometimes you don’t want to use real AI (it costs money!):
from unittest.mock import Mock
# Create a fake AI
fake_llm = Mock()
fake_llm.invoke.return_value = "Fake response"
# Test with fake
chain = MyChain(llm=fake_llm)
result = chain.invoke("Test")
Testing Pyramid
graph TD A["🔺 Few E2E Tests"] --> B["🔶 Some Integration"] B --> C["🟢 Many Unit Tests"] C --> D["Strong Foundation!"]
📊 Evaluation Strategies
What is Evaluation?
You wrote a book report. How good is it?
- ❌ “It’s probably good” (guessing)
- ✅ “I got 95/100!” (measuring)
Evaluation = Giving your AI a grade!
Five Ways to Evaluate
1. Correctness: Is the Answer Right?
def evaluate_correctness(response, expected):
# Simple check
return expected.lower() in response.lower()
# Test it
result = evaluate_correctness(
response="Paris is the capital",
expected="Paris"
)
print(f"Correct: {result}") # True!
2. Relevance: Does It Answer the Question?
If someone asks about cats, don’t talk about dogs!
from langchain.evaluation import load_evaluator
evaluator = load_evaluator("relevance")
score = evaluator.evaluate_strings(
input="What color is the sky?",
prediction="The sky is blue."
)
print(f"Relevance: {score['score']}")
3. Helpfulness: Is It Actually Useful?
An answer can be correct but not helpful:
- Q: “How do I bake a cake?”
- A: “Use ingredients” ❌ (True but useless!)
4. Harmlessness: Is It Safe?
Make sure AI doesn’t say bad things:
def check_safety(response):
bad_words = ["dangerous", "illegal", "harmful"]
return not any(word in response.lower()
for word in bad_words)
5. Consistency: Same Answer Every Time?
Ask the same question 3 times. You should get similar answers!
def evaluate_consistency(chain, question, n=3):
answers = [chain.invoke(question) for _ in range(n)]
# Check if answers are similar
# (simplified example)
return len(set(answers)) == 1
Using LangSmith for Evaluation
LangSmith is like a report card system:
from langsmith import Client
client = Client()
# Create a test dataset
dataset = client.create_dataset("my_tests")
# Add test cases
client.create_example(
dataset_id=dataset.id,
inputs={"question": "What is 2+2?"},
outputs={"answer": "4"}
)
# Run evaluation
results = client.run_evaluation(
chain=my_chain,
dataset=dataset
)
Evaluation Flow
graph TD A["AI Response"] --> B["Correctness Check"] A --> C["Relevance Check"] A --> D["Safety Check"] B --> E["📊 Final Score"] C --> E D --> E E --> F["Improve AI!"]
🎯 Putting It All Together
Now you know how to:
| Skill | Restaurant Analogy |
|---|---|
| Optimize Performance | Fast kitchen service |
| Use Multimodal | Accept any order format |
| Build Custom Components | Special recipes |
| Test Applications | Taste before serving |
| Evaluate Quality | Customer ratings |
Your Production Checklist
graph TD A["Build App"] --> B["Add Caching"] B --> C["Handle Images/Audio"] C --> D["Create Custom Parts"] D --> E["Write Tests"] E --> F["Set Up Evaluation"] F --> G["🚀 Launch!"]
🌟 Remember
“A great restaurant doesn’t just make good food. It serves it fast, handles special requests, and makes sure every dish is perfect before it leaves the kitchen.”
Your LangChain app is the same! Build it right, test it well, and measure everything.
Now go build something amazing! 🚀
