Deploying with LangServe

Back

Loading concept...

๐Ÿš€ LangServe: Your AIโ€™s Delivery Service

Imagine you baked the worldโ€™s most amazing cake. Now you need to deliver it to everyone who wants a slice. LangServe is your delivery truck!


๐ŸŽฏ The Big Picture

Youโ€™ve built an awesome AI chain with LangChain. It works perfectly on your computer. But how do you share it with the world?

LangServe is like turning your local bakery into a delivery service. Anyone can order your AI โ€œcakesโ€ through the internet!

graph TD A["๐Ÿง  Your LangChain App"] --> B["๐Ÿ“ฆ LangServe Wraps It"] B --> C["๐ŸŒ REST API Created"] C --> D["๐Ÿ“ฑ Anyone Can Use It!"]

๐Ÿ  LangServe Deployment

What is Deployment?

Think of it like this:

  • Development = Cooking in your kitchen
  • Deployment = Opening a restaurant for customers

LangServe helps you open that restaurant!

The Simple Setup

# Install LangServe
# pip install langserve

from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Create your AI chain
prompt = ChatPromptTemplate.from_template(
    "Tell me a joke about {topic}"
)
model = ChatOpenAI()
chain = prompt | model

# Create the restaurant (FastAPI app)
app = FastAPI(
    title="Joke Server",
    description="Get AI jokes!"
)

# Add the menu item (your chain)
add_routes(app, chain, path="/jokes")

What add_routes Does

Itโ€™s like adding a dish to your menu. This one line gives you:

Endpoint What It Does
/jokes/invoke Get one response
/jokes/batch Get many responses
/jokes/stream Get response piece by piece
/jokes/playground Test it in a browser!

The Playground is like a tasting booth where anyone can try your AI before using it in their app!


๐Ÿ”— REST API Creation

Whatโ€™s a REST API?

Imagine a waiter at a restaurant:

  • You ask for something (request)
  • The kitchen makes it (processing)
  • The waiter brings it back (response)

A REST API is your digital waiter!

How LangServe Creates APIs

# Your chain becomes these endpoints:

# POST /jokes/invoke
# Send: {"input": {"topic": "cats"}}
# Get:  {"output": "Why did the cat sit on
#        the computer? To keep an eye on
#        the mouse!"}

# POST /jokes/batch
# Send: {"inputs": [
#         {"topic": "cats"},
#         {"topic": "dogs"}
#       ]}
# Get:  {"output": ["cat joke", "dog joke"]}

Running Your Server

# At the end of your file:
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Now visit http://localhost:8000/docs to see your API documentation!

graph TD A[๐Ÿ“ฑ User's App] -->|POST request| B["๐Ÿšช /jokes/invoke"] B --> C["โš™๏ธ Your Chain Runs"] C --> D["๐Ÿ“ค JSON Response"] D --> A

๐Ÿ“ž RemoteRunnable Client

Whatโ€™s RemoteRunnable?

Remember when we talked about the restaurant?

  • LangServe = The restaurant
  • RemoteRunnable = The phone for delivery orders!

It lets any Python app call your LangServe API like itโ€™s a local chain.

Magic Simplicity

from langserve import RemoteRunnable

# Connect to the restaurant
joke_chain = RemoteRunnable(
    "http://localhost:8000/jokes"
)

# Use it like a normal chain!
result = joke_chain.invoke({"topic": "pizza"})
print(result)

Why This is Amazing

Without RemoteRunnable With RemoteRunnable
Write HTTP code Just .invoke()
Handle JSON yourself Automatic!
Parse responses Already done
Complex! Simple!

The Full Picture

# On the SERVER (Restaurant)
from langserve import add_routes
add_routes(app, my_chain, path="/ai")

# On the CLIENT (Customer's house)
from langserve import RemoteRunnable
my_chain = RemoteRunnable("http://server:8000/ai")

# Both use the SAME interface!
result = my_chain.invoke({"input": "hello"})

Itโ€™s like magic! Your remote AI feels like itโ€™s running locally.


๐ŸŒŠ Streaming in Production

Why Streaming Matters

Imagine ordering pizza:

  • Without streaming: Wait 30 minutes. Get whole pizza at once.
  • With streaming: Watch them make it. Get slices as ready!

For AI, streaming means seeing words appear one by one, like someone typing!

Server-Side Streaming

# LangServe automatically supports streaming!
# Just use /stream endpoint

# POST /jokes/stream
# Response comes in chunks:
# {"content": "Why"}
# {"content": " did"}
# {"content": " the"}
# {"content": " chicken..."}

Client-Side Streaming

from langserve import RemoteRunnable

chain = RemoteRunnable("http://localhost:8000/jokes")

# Stream the response!
for chunk in chain.stream({"topic": "robots"}):
    print(chunk, end="", flush=True)
    # Words appear one by one!

Async Streaming (For Web Apps)

async for chunk in chain.astream({"topic": "space"}):
    print(chunk, end="")
    # Non-blocking streaming!
graph LR A["๐Ÿค– AI Generates"] -->|chunk 1| B["๐Ÿ“ฑ User Sees"] A -->|chunk 2| B A -->|chunk 3| B A -->|chunk 4| B style A fill:#e1f5fe style B fill:#c8e6c9

Why Users Love Streaming

  • Feels faster (even if same total time)
  • More engaging (watch AI โ€œthinkโ€)
  • Better UX (no staring at blank screen)

๐Ÿ† Production Best Practices

1. Always Add Authentication

Donโ€™t let strangers eat your cake for free!

from fastapi import Depends, HTTPException
from fastapi.security import APIKeyHeader

API_KEY = "your-secret-key"
api_key_header = APIKeyHeader(name="X-API-Key")

def verify_key(api_key: str = Depends(api_key_header)):
    if api_key != API_KEY:
        raise HTTPException(status_code=403)
    return api_key

# Protected route
add_routes(
    app,
    chain,
    path="/jokes",
    dependencies=[Depends(verify_key)]
)

2. Set Rate Limits

Donโ€™t let one customer order 1000 pizzas!

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

# 10 requests per minute
@limiter.limit("10/minute")
@app.post("/custom-endpoint")
async def custom(request: Request):
    # Your code here
    pass

3. Add Health Checks

Like checking if the restaurant is open:

@app.get("/health")
def health():
    return {"status": "healthy"}

4. Use Environment Variables

Never put secrets in code!

import os

# Bad! Don't do this!
# api_key = "sk-secret123"

# Good! Do this!
api_key = os.getenv("OPENAI_API_KEY")

5. Enable CORS (for web apps)

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourapp.com"],
    allow_methods=["POST"],
    allow_headers=["*"],
)

Production Checklist

Item Why It Matters
โœ… Authentication Protect your API
โœ… Rate Limiting Prevent abuse
โœ… Health Checks Monitor uptime
โœ… Env Variables Keep secrets safe
โœ… CORS Setup Enable web access
โœ… Logging Debug problems
โœ… Error Handling Graceful failures

๐ŸŽฌ The Complete Example

Hereโ€™s a production-ready LangServe app:

import os
from fastapi import FastAPI, Depends
from fastapi.security import APIKeyHeader
from fastapi.middleware.cors import CORSMiddleware
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Setup
app = FastAPI(title="My AI API")

# CORS
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

# Auth
api_key = APIKeyHeader(name="X-API-Key")
def check_key(key: str = Depends(api_key)):
    if key != os.getenv("MY_API_KEY"):
        raise HTTPException(403)

# Chain
prompt = ChatPromptTemplate.from_template(
    "You are helpful. Answer: {question}"
)
model = ChatOpenAI()
chain = prompt | model

# Routes
add_routes(
    app, chain,
    path="/chat",
    dependencies=[Depends(check_key)]
)

# Health
@app.get("/health")
def health():
    return {"status": "ok"}

# Run
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

๐ŸŒŸ Key Takeaways

  1. LangServe turns your chain into a web service
  2. add_routes gives you invoke, batch, stream, and playground
  3. RemoteRunnable lets clients use your API like a local chain
  4. Streaming makes AI feel responsive and engaging
  5. Production needs auth, rate limits, and proper config
graph TD A["Build Chain"] --> B["Wrap with LangServe"] B --> C["Deploy to Server"] C --> D["Clients Use RemoteRunnable"] D --> E["Stream Responses"] E --> F["Happy Users! ๐ŸŽ‰"]

๐Ÿš€ Youโ€™re Ready!

You now know how to:

  • Deploy LangChain apps with LangServe
  • Create REST APIs automatically
  • Connect clients with RemoteRunnable
  • Stream responses for better UX
  • Follow production best practices

Your AI is ready to serve the world! ๐ŸŒ

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.