TensorFlow Serving: Your AI Restaurant Kitchen 🍳
Imagine you’ve baked the most delicious cake ever. Now, hundreds of people want to taste it. You can’t bake a new cake for each person—that would take forever! Instead, you need a super-fast kitchen that can serve slices to everyone quickly.
TensorFlow Serving is that super-fast kitchen for your AI models. Let’s learn how to serve your “AI cake” to the world!
🏠 The Big Picture: From Training to Serving
graph TD A["🧠 Train Your Model"] --> B["📦 Save as SavedModel"] B --> C["🍽️ Load into TF Serving"] C --> D["🌐 People Request via REST/gRPC"] D --> E["⚡ Get Predictions Back!"]
Think of it like this:
- Training = Learning the recipe
- SavedModel = Writing down the recipe in a special cookbook
- TF Serving = The professional kitchen
- REST/gRPC = Different ways customers can order
📚 Part 1: TF Serving Overview
What Is TensorFlow Serving?
TF Serving is like a super-efficient waiter for your AI models.
Without TF Serving:
- Your model sits on your laptop
- Only YOU can use it
- One person at a time
With TF Serving:
- Your model runs on a server
- ANYONE can use it
- Thousands of people at once!
Real-World Example
Imagine Netflix. When you open the app, it suggests movies for you. Behind the scenes:
- Your request goes to Netflix’s servers
- TF Serving runs a recommendation model
- You get personalized suggestions in milliseconds!
Key Features (Why It’s Special)
| Feature | What It Means |
|---|---|
| Fast | Handles millions of requests |
| Flexible | Serves many models at once |
| Reliable | Never takes a break |
| Smart Updates | Swap models without downtime |
📦 Part 2: SavedModel Signatures
What Is a SavedModel?
A SavedModel is your model packed in a special suitcase. It contains:
- The model’s “brain” (weights)
- The instructions (graph)
- The signature (how to use it)
What Are Signatures?
A signature is like a menu at a restaurant. It tells TF Serving:
- What inputs to expect (your order)
- What outputs to give back (your food)
graph TD A["📥 Input: Image"] --> B["🧠 Model"] B --> C["📤 Output: Is it a Cat?"]
Creating a Signature
When you save your model, you define what goes in and what comes out:
# Save model with signature
tf.saved_model.save(
model,
"my_model",
signatures={
"serving_default":
model.predict_function
}
)
Common Signature Types
| Signature Name | What It Does |
|---|---|
serving_default |
Main prediction function |
classify |
Returns class labels |
regress |
Returns numbers |
Example: Image Classifier Signature
Input: A picture (like a photo of a cat) Output: Probabilities for each class
# Your signature tells the server:
# "Give me an image, I'll tell you
# what's in it!"
inputs = {"image": image_tensor}
outputs = {"predictions": probs}
⚙️ Part 3: Serving Configuration
What Is Serving Configuration?
Configuration is like giving your waiter instructions:
- Which dishes to serve (models)
- Where to find recipes (model paths)
- How to handle special requests
The Config File
TF Serving uses a simple text file to know what to do:
model_config_list {
config {
name: "my_model"
base_path: "/models/my_model"
model_platform: "tensorflow"
}
}
Let’s break it down:
| Part | What It Means |
|---|---|
name |
What to call your model |
base_path |
Where the model lives |
model_platform |
Always “tensorflow” |
Serving Multiple Models
Want to serve TWO models? Easy!
model_config_list {
config {
name: "cat_detector"
base_path: "/models/cats"
}
config {
name: "dog_detector"
base_path: "/models/dogs"
}
}
Now your server can detect cats AND dogs!
Version Control
Your model improves over time. TF Serving can serve different versions:
/models/my_model/
├── 1/ ← Version 1
├── 2/ ← Version 2 (newer!)
└── 3/ ← Version 3 (latest)
By default, TF Serving uses the latest version (highest number).
Starting TF Serving
tensorflow_model_server \
--model_config_file=/config.txt \
--rest_api_port=8501
This starts your “AI kitchen” on port 8501!
🌐 Part 4: REST and gRPC Serving
Two Ways to Order
Think of REST and gRPC as two different ordering systems:
| REST | gRPC |
|---|---|
| Like ordering by phone | Like using a walkie-talkie |
| Uses JSON (text) | Uses binary (faster) |
| Works everywhere | Needs special setup |
| Easier to debug | Harder to read |
| Slower but simpler | Faster but complex |
REST API: The Simple Way
REST is like texting your order to the restaurant.
Making a Prediction:
curl -X POST \
http://localhost:8501/v1/models/my_model:predict \
-d '{"instances": [[1.0, 2.0, 3.0]]}'
What You Get Back:
{
"predictions": [[0.9, 0.1]]
}
REST Endpoints (URLs)
TF Serving gives you automatic URLs:
| URL | What It Does |
|---|---|
/v1/models/{name} |
Model status |
/v1/models/{name}:predict |
Make prediction |
/v1/models/{name}:classify |
Classify data |
gRPC: The Fast Way
gRPC is like having a direct phone line to the kitchen. Faster, but you need special equipment.
import grpc
from tensorflow_serving.apis import (
predict_pb2,
prediction_service_pb2_grpc
)
# Connect to server
channel = grpc.insecure_channel(
'localhost:8500'
)
stub = prediction_service_pb2_grpc\
.PredictionServiceStub(channel)
# Make request
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.inputs['input'].CopyFrom(
tf.make_tensor_proto(data)
)
# Get prediction!
result = stub.Predict(request)
When to Use Which?
graph TD A{What do you need?} --> B["Easy setup?"] A --> C["Maximum speed?"] B --> D["Use REST"] C --> E["Use gRPC"]
Use REST when:
- Building a website
- Testing quickly
- Working with JavaScript
Use gRPC when:
- Speed is critical
- Building backend services
- Handling millions of requests
🎯 Putting It All Together
Let’s trace a complete example:
Step 1: Save Your Model
# Train your model first...
model.fit(x_train, y_train)
# Save it for serving
tf.saved_model.save(
model,
"/models/my_model/1"
)
Step 2: Create Config
model_config_list {
config {
name: "my_model"
base_path: "/models/my_model"
}
}
Step 3: Start Server
tensorflow_model_server \
--model_config_file=config.txt \
--rest_api_port=8501 \
--grpc_port=8500
Step 4: Make Predictions!
Via REST:
curl localhost:8501/v1/models/my_model:predict \
-d '{"instances": [[1,2,3]]}'
Via gRPC:
result = stub.Predict(request)
print(result.outputs['predictions'])
🌟 Key Takeaways
- TF Serving = Professional kitchen for your AI models
- SavedModel = Your model in a special package with clear instructions
- Signatures = The “menu” telling what inputs/outputs your model uses
- Config = Instructions for what models to serve and where to find them
- REST = Simple, text-based API (like texting)
- gRPC = Fast, binary API (like a direct phone line)
🎉 You Did It!
You now understand how to take an AI model from your laptop to serving the whole world!
Think about it: Every time you ask Siri a question, get a Netflix recommendation, or see a spam filter catch bad emails—TF Serving (or something like it) is working behind the scenes.
Now YOU know how to build that same magic! 🚀
