Disruption Management

Back

Loading concept...

Kubernetes Disruption Management: Keeping Your Pods Safe & Sound 🛡️

The Playground Supervisor Story

Imagine you’re running a busy playground with kids playing on swings, slides, and seesaws. You’re the supervisor. Sometimes, you need to move kids around:

  • The slide needs fixing → You gently ask kids to move to the swings (voluntary)
  • A sudden storm comes → You rush everyone inside immediately (involuntary)
  • You have a rule: At least 3 kids must always be playing outside (disruption budget)

Kubernetes works exactly the same way with your pods!


1. Eviction and Node Pressure

What is Node Pressure?

Think of a node like a toy box. It can only hold so many toys (pods). When the box gets too full, something has to come out!

Node pressure happens when:

  • 📦 Memory is running low (too many big toys)
  • đź’ľ Disk space is filling up (no room for new toys)
  • 🔢 Too many processes (too many toys making noise)

What is Eviction?

Eviction = Kubernetes politely (or not so politely) removing a pod from a node.

When a node feels “pressure,” it starts evicting pods to survive. It’s like your toy box pushing out some toys to avoid breaking!

graph TD A["Node Running Smoothly"] --> B{Memory Running Low?} B -->|Yes| C["Node Under Pressure!"] B -->|No| A C --> D["Start Evicting Pods"] D --> E["Pick Low-Priority Pods First"] E --> F["Remove Pod from Node"] F --> G["Node Recovers"]

Real Example

Your node has 8GB memory. Pods are using 7.5GB. A new pod needs 1GB.

What happens?

  1. Node detects memory pressure
  2. Kubernetes looks for pods to evict
  3. Picks pods based on priority
  4. Evicts them to free memory
  5. New pod can now run!

Simple Rule: Kubernetes always tries to keep nodes healthy, even if it means removing some pods.


2. Pod Eviction and Disruption

Types of Pod Eviction

There are two ways pods get evicted:

Type Cause Speed
API-Initiated Someone asked Kubernetes to remove it Graceful
Node-Pressure Node is struggling Can be sudden

The Graceful Goodbye

When a pod is evicted gracefully:

  1. ⏰ Pod gets a warning (termination grace period)
  2. 📤 Pod finishes what it’s doing
  3. đź’ľ Pod saves its work
  4. đź‘‹ Pod shuts down cleanly

Default grace period: 30 seconds

Disruption = Anything That Stops Your Pod

Disruption is the fancy word for “something stopped my pod from running.”

Think of it like this:

  • Your swing stopped → That’s a disruption
  • You chose to stop → Voluntary disruption
  • Someone pushed you off → Involuntary disruption

3. Voluntary vs Involuntary Disruptions

Voluntary Disruptions (Planned Events)

You (or your team) caused this on purpose.

Example Why It Happens
kubectl drain node Preparing node for maintenance
Deleting a deployment You don’t need those pods anymore
Updating pod template Rolling out new version
Node upgrade Installing new Kubernetes version
Cluster scaling down Saving costs

Key Point: You have control. You can plan for these!

Involuntary Disruptions (Unplanned Events)

The universe happened to your pods.

Example Why It Happens
Hardware failure Server died
Kernel panic Operating system crashed
Node disappears Network issues
Out of memory Node pressure eviction
Cloud provider issues VM deleted by accident

Key Point: You can’t prevent these, but you CAN prepare for them!

graph TD A["Disruption Happens"] --> B{Was it planned?} B -->|Yes| C["Voluntary"] B -->|No| D["Involuntary"] C --> E["You control timing"] C --> F["Can use PDB"] D --> G["Unexpected event"] D --> H["Must be resilient"]

Why Does This Matter?

Voluntary disruptions respect your rules.

When you drain a node, Kubernetes:

  • âś… Checks your Pod Disruption Budgets
  • âś… Waits for new pods to be ready
  • âś… Moves pods one at a time

Involuntary disruptions don’t care about rules.

When a node dies:

  • ❌ No warning
  • ❌ No graceful shutdown
  • ❌ Pods just disappear

4. Pod Disruption Budgets (PDB)

The Safety Rule

Remember our playground rule? “At least 3 kids must always be playing.”

Pod Disruption Budget = Your rule for how many pods must stay running during voluntary disruptions.

How to Create a PDB

You have two ways to set your safety rule:

Option 1: Minimum Available

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

“At least 2 pods must always be running”

Option 2: Maximum Unavailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app

“At most 1 pod can be down at a time”

PDB in Action

Scenario: You have 5 pods. PDB says minAvailable: 3.

Action Allowed?
Evict 1 pod (4 remain) âś… Yes
Evict 2 pods (3 remain) âś… Yes
Evict 3 pods (2 remain) ❌ No! Below minimum
graph TD A["5 Pods Running"] --> B["Try to evict 1"] B --> C{3+ pods remain?} C -->|Yes, 4 remain| D["✅ Eviction Allowed"] D --> E["4 Pods Running"] E --> F["Try to evict 2 more"] F --> G{3+ pods remain?} G -->|No, only 2| H["❌ Eviction Blocked"] G -->|Yes, exactly 3| I["✅ Only 1 more allowed"]

When PDB Doesn’t Help

Important: PDB only works for voluntary disruptions!

Disruption Type PDB Protects?
kubectl drain âś… Yes
Node upgrade âś… Yes
Deployment update âś… Yes
Node crash ❌ No
Out of memory ❌ No
Hardware failure ❌ No

For involuntary disruptions: Use multiple replicas spread across nodes!


Putting It All Together

The Complete Picture

graph TD A["Your Pods Running"] --> B{Disruption Type?} B -->|Voluntary| C["Check PDB"] B -->|Involuntary| D["No Protection"] C --> E{Budget Allows?} E -->|Yes| F["Pod Evicted Gracefully"] E -->|No| G["Eviction Blocked"] D --> H["Pod Dies Immediately"] F --> I["New Pod Scheduled"] H --> I G --> J["Wait & Retry"]

Best Practices Checklist

1. Always Set PDBs for Important Apps

  • Web servers: minAvailable: 2
  • Databases: maxUnavailable: 1
  • Critical services: At least 50% available

2. Spread Pods Across Nodes

  • Use podAntiAffinity
  • Don’t put all eggs in one basket

3. Set Resource Requests

  • Helps Kubernetes make smart eviction choices
  • Prevents unexpected memory pressure

4. Plan for the Worst

  • Assume nodes will die
  • Test with chaos engineering
  • Monitor pod disruptions

Summary: The Key Takeaways

Concept Simple Definition
Node Pressure Node running out of resources (memory/disk)
Eviction Removing a pod from a node
Voluntary Disruption Planned pod removal (you control it)
Involuntary Disruption Unplanned pod removal (accidents happen)
PDB Your rule for minimum running pods

Remember This

PDBs are your seatbelt for voluntary disruptions.

They won’t save you in a crash (involuntary), but they’ll protect you during normal driving (planned maintenance).


You Did It! 🎉

You now understand:

  • âś… Why nodes evict pods (pressure!)
  • âś… How pod eviction works (graceful or sudden)
  • âś… The difference between voluntary and involuntary disruptions
  • âś… How to protect your apps with Pod Disruption Budgets

Next time someone says “We need to drain that node,” you’ll know exactly what happens to your pods!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.