Docker Monitoring & Health: Keeping Your Containers Alive and Happy 🏥
The Story of the Container Hospital
Imagine you’re running a hospital for tiny robot workers (containers). Each robot does a specific job—one serves food, another cleans, another answers phones. But how do you know if they’re healthy? What if one falls sick? How do you find out what went wrong?
That’s exactly what monitoring and health is about in Docker! We’re going to learn how to:
- Listen to what our containers are saying (logging)
- Watch them from a control room (monitoring)
- Give them regular health checkups (healthchecks)
1. Logging Best Practices 📝
What is Logging?
Think of logs like a diary your container writes. Every time something happens—a visitor arrives, a task completes, an error occurs—the container writes it down.
The Golden Rules of Container Logging
Rule 1: Write to STDOUT and STDERR
Your container should talk to the screen, not write to hidden files!
# ✅ Good: App prints to console
CMD ["node", "app.js"]
# The app inside does:
# console.log("User logged in")
# console.error("Database connection failed")
Rule 2: Use JSON Format
Write logs like a neat list, not messy paragraphs:
{
"time": "2024-01-15T10:30:00",
"level": "info",
"message": "User logged in",
"userId": 123
}
Rule 3: Include Context
Always answer: Who? What? When? Where?
{
"timestamp": "2024-01-15T10:30:00Z",
"service": "auth-service",
"container_id": "abc123",
"message": "Login successful",
"user": "alice"
}
Quick Commands to See Logs
# See all logs from a container
docker logs my-container
# Follow logs live (like watching TV)
docker logs -f my-container
# See last 50 lines
docker logs --tail 50 my-container
# See logs with timestamps
docker logs -t my-container
2. Centralized Logging Setup 🎯
The Problem
Imagine having 100 robot workers, each writing their own diary in different rooms. Finding what went wrong becomes a nightmare!
The Solution: One Central Library
graph TD A["Container 1"] -->|sends logs| D["Central Log Server"] B["Container 2"] -->|sends logs| D C["Container 3"] -->|sends logs| D D -->|search & analyze| E["You!"]
Setting Up with Docker Logging Drivers
Tell Docker to send logs somewhere central:
# Send logs to a syslog server
docker run -d \
--log-driver=syslog \
--log-opt syslog-address=tcp://logs.example.com:514 \
my-app
Popular Centralized Solutions
| Tool | Think of it as… |
|---|---|
| ELK Stack | A giant searchable library |
| Fluentd | A mail carrier collecting logs |
| Loki | A simple notebook system |
Docker Compose Example
version: '3.8'
services:
web:
image: nginx
logging:
driver: "fluentd"
options:
fluentd-address: "localhost:24224"
tag: "web.nginx"
3. Container Monitoring Overview 📊
What is Monitoring?
Logging tells you what happened. Monitoring tells you how things are right now.
It’s like the difference between:
- Logging: Reading yesterday’s diary
- Monitoring: Looking at a live dashboard
What Do We Monitor?
graph TD A["Container Metrics"] --> B["CPU Usage"] A --> C["Memory Usage"] A --> D["Network Traffic"] A --> E["Disk I/O"] A --> F["Container Count"]
Quick Monitoring with Docker
# See live stats (like a health monitor)
docker stats
# Output shows:
# CONTAINER CPU % MEM USAGE NET I/O BLOCK I/O
# web-app 2.5% 150MB 10KB/5KB 0B/0B
# database 15.0% 512MB 1MB/500KB 10MB/5MB
The Monitoring Stack
| Component | Job |
|---|---|
| Prometheus | Collects and stores metrics |
| cAdvisor | Watches containers specifically |
| Grafana | Shows pretty dashboards |
| Alertmanager | Sends you warnings |
4. Prometheus Integration 🔥
What is Prometheus?
Prometheus is like a reporter that visits your containers regularly, asks “How are you?”, and writes down the answers.
How Prometheus Works
graph TD A["Prometheus"] -->|scrapes every 15s| B["Container 1"] A -->|scrapes every 15s| C["Container 2"] A -->|scrapes every 15s| D["cAdvisor"] A -->|stores| E["Time Series Database"] E -->|displays| F["Grafana Dashboard"]
Setting Up Prometheus
prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'docker'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'my-app'
static_configs:
- targets: ['my-app:9090']
Running Prometheus in Docker
# docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
Basic Prometheus Queries
# How much CPU is my container using?
container_cpu_usage_seconds_total{name="web"}
# How much memory?
container_memory_usage_bytes{name="web"}
# How many containers are running?
count(container_last_seen)
5. cAdvisor (Container Advisor) 🔍
What is cAdvisor?
cAdvisor is like a fitness tracker for containers. It watches each container and reports:
- How hard is it working? (CPU)
- How much memory is it using?
- Is it talking to the network?
Running cAdvisor
docker run -d \
--name=cadvisor \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
gcr.io/cadvisor/cadvisor
What cAdvisor Shows
Visit http://localhost:8080 to see:
| Metric | What It Means |
|---|---|
| CPU | How busy the container is |
| Memory | RAM being used |
| Network | Data in and out |
| Filesystem | Disk usage |
cAdvisor + Prometheus
cAdvisor automatically exposes metrics that Prometheus can collect:
scrape_configs:
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
6. Docker Healthchecks 💓
What is a Healthcheck?
A healthcheck is like a doctor’s visit for your container. Docker asks: “Are you okay?” and the container answers yes or no.
Adding Healthcheck to Dockerfile
FROM nginx:alpine
# Check every 30 seconds if nginx is responding
HEALTHCHECK --interval=30s \
--timeout=10s \
--start-period=5s \
--retries=3 \
CMD curl -f http://localhost/ || exit 1
Healthcheck in Docker Compose
version: '3.8'
services:
web:
image: nginx
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
interval: 30s
timeout: 10s
retries: 3
start_period: 5s
Types of Health Tests
# HTTP check (web servers)
CMD curl -f http://localhost:8080/health
# TCP check (databases)
CMD nc -z localhost 5432
# Custom script
CMD /app/health-check.sh
# Command check
CMD ["pg_isready", "-U", "postgres"]
7. Health Status and Intervals ⏱️
The Three Health States
graph LR A["starting"] -->|passes check| B["healthy"] A -->|fails check| C["unhealthy"] B -->|fails check| C C -->|passes check| B
| Status | What It Means |
|---|---|
| starting | Container just started, waiting for first check |
| healthy | All good! Container is working |
| unhealthy | Something’s wrong! |
Understanding Intervals
HEALTHCHECK --interval=30s \
--timeout=10s \
--start-period=5s \
--retries=3 \
CMD curl -f http://localhost/
| Setting | Meaning | Example |
|---|---|---|
| interval | How often to check | Every 30 seconds |
| timeout | How long to wait for answer | Give up after 10 seconds |
| start-period | Grace period after start | Wait 5 seconds before first check |
| retries | Failures before “unhealthy” | 3 fails = unhealthy |
Checking Health Status
# See health status
docker ps
# Shows: STATUS column with (healthy) or (unhealthy)
# Detailed health info
docker inspect --format='{{.State.Health.Status}}' my-container
# See health check history
docker inspect --format='{{json .State.Health}}' my-container | jq
Using Health in Compose Dependencies
version: '3.8'
services:
db:
image: postgres
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
web:
image: my-web-app
depends_on:
db:
condition: service_healthy
This means: “Don’t start web until db is healthy!”
Quick Summary 🎯
| Topic | Think of it as… |
|---|---|
| Logging | Container’s diary |
| Centralized Logging | One library for all diaries |
| Monitoring | Live health dashboard |
| Prometheus | The data collector |
| cAdvisor | Container fitness tracker |
| Healthcheck | Doctor’s visit |
| Health Status | Starting → Healthy → Unhealthy |
Your Container Health Journey 🚀
You now know how to:
- ✅ Make containers write good logs
- ✅ Collect all logs in one place
- ✅ Watch containers in real-time
- ✅ Set up Prometheus for metrics
- ✅ Use cAdvisor for container stats
- ✅ Add healthchecks to containers
- ✅ Understand health status and timing
Your containers are no longer mysterious black boxes. You can see inside them, hear what they’re saying, and know when they need help!
Remember: A well-monitored container is a happy container! 🐳💚
