LangServe wraps your LangChain app into a web service. It automatically creates REST API endpoints for invoke, batch, stream, and a playground.

What is RemoteRunnable in LangServe?

RemoteRunnable lets any Python app call your LangServe API like a local chain. It handles HTTP and JSON automatically.

Why use streaming in production LangServe apps?

Streaming shows words appearing one by one instead of waiting for the full response. It feels faster and provides better UX.

LangServe Deployment | Deploy LangChain as API

🚀 LangServe: Your AI’s Delivery Service

Imagine you baked the world’s most amazing cake. Now you need to deliver it to everyone who wants a slice. LangServe is your delivery truck!

🎯 The Big Picture

You’ve built an awesome AI chain with LangChain. It works perfectly on your computer. But how do you share it with the world?

LangServe is like turning your local bakery into a delivery service. Anyone can order your AI “cakes” through the internet!

graph TD
    A["🧠 Your LangChain App"] --> B["📦 LangServe Wraps It"]
    B --> C["🌐 REST API Created"]
    C --> D["📱 Anyone Can Use It!"]

🏠 LangServe Deployment

What is Deployment?

Think of it like this:

Development = Cooking in your kitchen
Deployment = Opening a restaurant for customers

LangServe helps you open that restaurant!

The Simple Setup

# Install LangServe
# pip install langserve

from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Create your AI chain
prompt = ChatPromptTemplate.from_template(
    "Tell me a joke about {topic}"
)
model = ChatOpenAI()
chain = prompt | model

# Create the restaurant (FastAPI app)
app = FastAPI(
    title="Joke Server",
    description="Get AI jokes!"
)

# Add the menu item (your chain)
add_routes(app, chain, path="/jokes")

What `add_routes` Does

It’s like adding a dish to your menu. This one line gives you:

Endpoint	What It Does
`/jokes/invoke`	Get one response
`/jokes/batch`	Get many responses
`/jokes/stream`	Get response piece by piece
`/jokes/playground`	Test it in a browser!

The Playground is like a tasting booth where anyone can try your AI before using it in their app!

🔗 REST API Creation

What’s a REST API?

Imagine a waiter at a restaurant:

You ask for something (request)
The kitchen makes it (processing)
The waiter brings it back (response)

A REST API is your digital waiter!

How LangServe Creates APIs

# Your chain becomes these endpoints:

# POST /jokes/invoke
# Send: {"input": {"topic": "cats"}}
# Get:  {"output": "Why did the cat sit on
#        the computer? To keep an eye on
#        the mouse!"}

# POST /jokes/batch
# Send: {"inputs": [
#         {"topic": "cats"},
#         {"topic": "dogs"}
#       ]}
# Get:  {"output": ["cat joke", "dog joke"]}

Running Your Server

# At the end of your file:
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Now visit http://localhost:8000/docs to see your API documentation!

graph TD
    A[📱 User's App] -->|POST request| B["🚪 /jokes/invoke"]
    B --> C["⚙️ Your Chain Runs"]
    C --> D["📤 JSON Response"]
    D --> A

📞 RemoteRunnable Client

What’s RemoteRunnable?

Remember when we talked about the restaurant?

LangServe = The restaurant
RemoteRunnable = The phone for delivery orders!

It lets any Python app call your LangServe API like it’s a local chain.

Magic Simplicity

from langserve import RemoteRunnable

# Connect to the restaurant
joke_chain = RemoteRunnable(
    "http://localhost:8000/jokes"
)

# Use it like a normal chain!
result = joke_chain.invoke({"topic": "pizza"})
print(result)

Why This is Amazing

Without RemoteRunnable	With RemoteRunnable
Write HTTP code	Just `.invoke()`
Handle JSON yourself	Automatic!
Parse responses	Already done
Complex!	Simple!

The Full Picture

# On the SERVER (Restaurant)
from langserve import add_routes
add_routes(app, my_chain, path="/ai")

# On the CLIENT (Customer's house)
from langserve import RemoteRunnable
my_chain = RemoteRunnable("http://server:8000/ai")

# Both use the SAME interface!
result = my_chain.invoke({"input": "hello"})

It’s like magic! Your remote AI feels like it’s running locally.

🌊 Streaming in Production

Why Streaming Matters

Imagine ordering pizza:

Without streaming: Wait 30 minutes. Get whole pizza at once.
With streaming: Watch them make it. Get slices as ready!

For AI, streaming means seeing words appear one by one, like someone typing!

Server-Side Streaming

# LangServe automatically supports streaming!
# Just use /stream endpoint

# POST /jokes/stream
# Response comes in chunks:
# {"content": "Why"}
# {"content": " did"}
# {"content": " the"}
# {"content": " chicken..."}

Client-Side Streaming

from langserve import RemoteRunnable

chain = RemoteRunnable("http://localhost:8000/jokes")

# Stream the response!
for chunk in chain.stream({"topic": "robots"}):
    print(chunk, end="", flush=True)
    # Words appear one by one!

Async Streaming (For Web Apps)

async for chunk in chain.astream({"topic": "space"}):
    print(chunk, end="")
    # Non-blocking streaming!

graph LR
    A["🤖 AI Generates"] -->|chunk 1| B["📱 User Sees"]
    A -->|chunk 2| B
    A -->|chunk 3| B
    A -->|chunk 4| B
    style A fill:#e1f5fe
    style B fill:#c8e6c9

Why Users Love Streaming

Feels faster (even if same total time)
More engaging (watch AI “think”)
Better UX (no staring at blank screen)

🏆 Production Best Practices

1. Always Add Authentication

Don’t let strangers eat your cake for free!

from fastapi import Depends, HTTPException
from fastapi.security import APIKeyHeader

API_KEY = "your-secret-key"
api_key_header = APIKeyHeader(name="X-API-Key")

def verify_key(api_key: str = Depends(api_key_header)):
    if api_key != API_KEY:
        raise HTTPException(status_code=403)
    return api_key

# Protected route
add_routes(
    app,
    chain,
    path="/jokes",
    dependencies=[Depends(verify_key)]
)

2. Set Rate Limits

Don’t let one customer order 1000 pizzas!

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

# 10 requests per minute
@limiter.limit("10/minute")
@app.post("/custom-endpoint")
async def custom(request: Request):
    # Your code here
    pass

3. Add Health Checks

Like checking if the restaurant is open:

@app.get("/health")
def health():
    return {"status": "healthy"}

4. Use Environment Variables

Never put secrets in code!

import os

# Bad! Don't do this!
# api_key = "sk-secret123"

# Good! Do this!
api_key = os.getenv("OPENAI_API_KEY")

5. Enable CORS (for web apps)

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourapp.com"],
    allow_methods=["POST"],
    allow_headers=["*"],
)

Production Checklist

Item	Why It Matters
✅ Authentication	Protect your API
✅ Rate Limiting	Prevent abuse
✅ Health Checks	Monitor uptime
✅ Env Variables	Keep secrets safe
✅ CORS Setup	Enable web access
✅ Logging	Debug problems
✅ Error Handling	Graceful failures

🎬 The Complete Example

Here’s a production-ready LangServe app:

import os
from fastapi import FastAPI, Depends
from fastapi.security import APIKeyHeader
from fastapi.middleware.cors import CORSMiddleware
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Setup
app = FastAPI(title="My AI API")

# CORS
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

# Auth
api_key = APIKeyHeader(name="X-API-Key")
def check_key(key: str = Depends(api_key)):
    if key != os.getenv("MY_API_KEY"):
        raise HTTPException(403)

# Chain
prompt = ChatPromptTemplate.from_template(
    "You are helpful. Answer: {question}"
)
model = ChatOpenAI()
chain = prompt | model

# Routes
add_routes(
    app, chain,
    path="/chat",
    dependencies=[Depends(check_key)]
)

# Health
@app.get("/health")
def health():
    return {"status": "ok"}

# Run
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

🌟 Key Takeaways

LangServe turns your chain into a web service
add_routes gives you invoke, batch, stream, and playground
RemoteRunnable lets clients use your API like a local chain
Streaming makes AI feel responsive and engaging
Production needs auth, rate limits, and proper config

graph TD
    A["Build Chain"] --> B["Wrap with LangServe"]
    B --> C["Deploy to Server"]
    C --> D["Clients Use RemoteRunnable"]
    D --> E["Stream Responses"]
    E --> F["Happy Users! 🎉"]

🚀 You’re Ready!

You now know how to:

Deploy LangChain apps with LangServe
Create REST APIs automatically
Connect clients with RemoteRunnable
Stream responses for better UX
Follow production best practices

Your AI is ready to serve the world! 🌍

Deploying with LangServe

Unable to load concept

Coming Soon...

🚀 LangServe: Your AI’s Delivery Service

🎯 The Big Picture

🏠 LangServe Deployment

What is Deployment?

The Simple Setup

What add_routes Does

🔗 REST API Creation

What’s a REST API?

How LangServe Creates APIs

Running Your Server

📞 RemoteRunnable Client

What’s RemoteRunnable?

Magic Simplicity

Why This is Amazing

The Full Picture

🌊 Streaming in Production

Why Streaming Matters

Server-Side Streaming

Client-Side Streaming

Async Streaming (For Web Apps)

Why Users Love Streaming

🏆 Production Best Practices

1. Always Add Authentication

2. Set Rate Limits

3. Add Health Checks

4. Use Environment Variables

5. Enable CORS (for web apps)

Production Checklist

🎬 The Complete Example

🌟 Key Takeaways

🚀 You’re Ready!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue

What `add_routes` Does