Kubernetes Cluster Operations: Your Mission Control Center 🚀
Imagine you’re the captain of a massive spaceship fleet. Each spaceship is a node, and you need to keep them flying together in perfect harmony. That’s exactly what cluster operations is all about!
The Big Picture: What is Cluster Operations?
Think of a Kubernetes cluster like a team of robots working together in a factory. Sometimes you need to:
- Add new robots (nodes) to the team
- Send robots for repairs (maintenance)
- Upgrade the robots’ brains (software updates)
- Make backup copies of the factory’s memory (etcd backup)
kubeadm is your magic remote control that makes all this happen!
🔧 Kubeadm Installation
What is kubeadm?
kubeadm is like the instruction manual + toolbox that helps you build a Kubernetes cluster from scratch. It’s the official way to set things up!
Installing kubeadm (The Recipe)
Before you can use kubeadm, you need three tools:
- kubelet - The worker that runs on every node
- kubeadm - The builder tool
- kubectl - Your command line remote control
# Step 1: Add Kubernetes repo
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# Step 2: Add to sources
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
# Step 3: Install the trio
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
Pro tip: Always disable swap on your nodes! Kubernetes doesn’t like sharing memory with swap.
sudo swapoff -a
🎬 Kubeadm Init and Join
The Two Magic Words
kubeadm init = “Start a new cluster! I’m the boss node (control plane)!”
kubeadm join = “Hey boss, can I join your team?”
Starting Your First Cluster (init)
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
When this finishes, you’ll see a special token. Save it! It’s like a secret password for other nodes to join.
graph TD A["Run kubeadm init"] --> B["Control Plane Created"] B --> C["Get Join Token"] C --> D["Share Token with Workers"] D --> E["Workers Run kubeadm join"] E --> F["Cluster Ready!"]
Joining Worker Nodes
On each worker node, run the join command you got:
sudo kubeadm join 192.168.1.100:6443 \
--token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:abc123...
Lost your token? No worries! Create a new one:
kubeadm token create --print-join-command
⚙️ Kubeadm Configuration
Custom Settings for Your Cluster
Instead of typing long commands, you can write a config file - like a recipe card for your cluster!
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.30.0
networking:
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
controlPlaneEndpoint: "cluster.example.com:6443"
Use it like this:
sudo kubeadm init --config=cluster-config.yaml
Common Configuration Options
| Setting | What it does |
|---|---|
kubernetesVersion |
Which Kubernetes version to use |
podSubnet |
IP range for pods |
serviceSubnet |
IP range for services |
controlPlaneEndpoint |
Address of your control plane |
⬆️ Cluster Upgrades
Why Upgrade?
Just like updating apps on your phone, Kubernetes gets better over time. New features, security fixes, and performance improvements!
The Golden Rule
Always upgrade one minor version at a time!
- ✅ 1.29 → 1.30 (Good!)
- ❌ 1.28 → 1.30 (Too big a jump!)
Upgrade Steps
graph TD A["Upgrade kubeadm"] --> B["Upgrade Control Plane"] B --> C["Drain Workers One by One"] C --> D["Upgrade kubelet on Workers"] D --> E["Uncordon Workers"] E --> F["Verify Everything Works"]
Step 1: Upgrade kubeadm on control plane
sudo apt-get update
sudo apt-get install -y kubeadm=1.30.0-00
Step 2: Plan and apply the upgrade
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.30.0
Step 3: Upgrade kubelet and kubectl
sudo apt-get install -y kubelet=1.30.0-00 kubectl=1.30.0-00
sudo systemctl daemon-reload
sudo systemctl restart kubelet
🔧 Node Maintenance
Taking Care of Your Nodes
Sometimes nodes need a break - hardware fixes, OS updates, or troubleshooting. Here’s how to do it safely!
The Three States of a Node
graph TD A["Normal Node"] -->|kubectl cordon| B["Cordoned - No New Pods"] B -->|kubectl drain| C["Drained - Empty Node"] C -->|Do Maintenance| D["Maintenance Done"] D -->|kubectl uncordon| A
🚧 Draining and Cordoning Nodes
Cordoning: “No New Guests Please!”
Think of it like putting up a “No Vacancy” sign at a hotel. Current guests stay, but no new ones come in.
kubectl cordon node-worker-1
Check it:
kubectl get nodes
# You'll see SchedulingDisabled
Draining: “Everybody Out!”
This is like evacuating the hotel. All pods move to other nodes.
kubectl drain node-worker-1 \
--ignore-daemonsets \
--delete-emptydir-data
Why --ignore-daemonsets? DaemonSets are special pods that run on every node. They’ll come back after maintenance anyway!
Uncordoning: “Welcome Back!”
After maintenance, open the doors again:
kubectl uncordon node-worker-1
Real Example: Kernel Update
# 1. Cordon the node
kubectl cordon node-worker-1
# 2. Drain all pods
kubectl drain node-worker-1 \
--ignore-daemonsets \
--delete-emptydir-data
# 3. SSH to node, do the update
ssh node-worker-1
sudo apt-get update && sudo apt-get upgrade -y
sudo reboot
# 4. After reboot, uncordon
kubectl uncordon node-worker-1
💾 etcd Backup and Restore
What is etcd?
etcd is the brain of your cluster - it remembers everything! All your deployments, secrets, configs - everything lives here.
If etcd dies and you have no backup, you lose your entire cluster!
Creating a Backup
ETCDCTL_API=3 etcdctl snapshot save \
/backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Verify your backup:
ETCDCTL_API=3 etcdctl snapshot status \
/backup/etcd-snapshot.db --write-out=table
Restoring from Backup
When disaster strikes, here’s how to bring it back:
# 1. Stop etcd
sudo systemctl stop etcd
# 2. Restore the snapshot
ETCDCTL_API=3 etcdctl snapshot restore \
/backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd-restored
# 3. Update etcd to use new directory
# (Edit /etc/kubernetes/manifests/etcd.yaml)
# 4. Restart etcd
sudo systemctl start etcd
Backup Schedule Tip
Set up a cron job for automatic backups:
# Every 6 hours, backup etcd
0 */6 * * * /usr/local/bin/backup-etcd.sh
🌐 etcd Cluster Management
Why Multiple etcd Nodes?
One etcd is risky - if it fails, game over! With 3 or 5 etcd members, your cluster survives failures.
The magic number: Always use odd numbers (3, 5, 7)
Why? They vote on decisions. Odd numbers prevent ties!
Checking Cluster Health
ETCDCTL_API=3 etcdctl endpoint health \
--endpoints=https://etcd1:2379,https://etcd2:2379,https://etcd3:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Adding a New etcd Member
# 1. Add member
ETCDCTL_API=3 etcdctl member add etcd-new \
--peer-urls=https://192.168.1.104:2380
# 2. Start etcd on new node with --initial-cluster-state=existing
Removing a Failed Member
# 1. List members to get ID
ETCDCTL_API=3 etcdctl member list
# 2. Remove by ID
ETCDCTL_API=3 etcdctl member remove abc123def456
graph TD A["3-Node etcd Cluster"] --> B{Node Fails?} B -->|1 node fails| C["Cluster Still Works!"] B -->|2 nodes fail| D["Cluster STOPS - Lost Quorum"] C --> E["Replace Failed Node"] E --> A
🎯 Quick Reference Commands
| Task | Command |
|---|---|
| Initialize cluster | kubeadm init |
| Join cluster | kubeadm join <token> |
| Create join token | kubeadm token create --print-join-command |
| Upgrade cluster | kubeadm upgrade apply v1.30.0 |
| Cordon node | kubectl cordon <node> |
| Drain node | kubectl drain <node> --ignore-daemonsets |
| Uncordon node | kubectl uncordon <node> |
| Backup etcd | etcdctl snapshot save <file> |
| Restore etcd | etcdctl snapshot restore <file> |
| Check etcd health | etcdctl endpoint health |
🌟 You’ve Got This!
Running a Kubernetes cluster is like being a conductor of an orchestra. Each node is a musician, kubeadm is your baton, and etcd is your sheet music.
Remember:
- 🔧 kubeadm builds and manages your cluster
- 🚀 init starts it, join grows it
- ⬆️ Upgrades go one step at a time
- 🚧 Drain before maintenance, uncordon after
- 💾 Backup etcd - your cluster’s memory!
Now you’re ready to orchestrate at scale! 🎵
