Common Error Scenarios

Back

Loading concept...

🔧 Kubernetes Troubleshooting: When Pods Go Wrong

The Story of a Pod Hospital đŸ„

Imagine Kubernetes is a hospital for little robot workers called Pods. These robots want to do their jobs—running your apps—but sometimes they get sick! When a robot (Pod) gets sick, it shows error symptoms. Your job? Be the Pod Doctor and heal them!

Let’s meet the six most common sicknesses that happen to Pods:


🔄 CrashLoopBackOff: The Robot That Keeps Falling Down

What’s Happening?

Think of a toy robot that tries to stand up, falls down, tries again, falls down again
 over and over. That’s CrashLoopBackOff!

Your Pod starts, crashes, Kubernetes restarts it, and it crashes again. This loop keeps going with longer and longer waits between restarts.

Why Does This Happen?

graph TD A["Pod Starts"] --> B["App Crashes"] B --> C["Kubernetes Waits"] C --> D["Kubernetes Restarts Pod"] D --> A style A fill:#4ECDC4 style B fill:#FF6B6B style C fill:#FFE66D style D fill:#4ECDC4

Common Causes:

  • 🐛 Bug in your code - The app has an error and exits
  • 📩 Missing files - App can’t find something it needs
  • 🔑 Wrong secrets - Database password is incorrect
  • đŸ’Ÿ Can’t connect - Database or service unreachable

How to Fix It

Step 1: Check the logs

kubectl logs pod-name
kubectl logs pod-name --previous

Step 2: Look at events

kubectl describe pod pod-name

Step 3: Find the real error

  • Read the last lines before the crash
  • Fix the bug in your application
  • Make sure all secrets and configs are correct

Real Example

# Pod keeps crashing because
# it can't find DATABASE_URL
containers:
- name: my-app
  env:
  - name: DATABASE_URL
    value: ""  # Empty! That's the problem!

The Fix: Add the correct database URL!


📩 ImagePullBackOff: Can’t Get the Robot Parts

What’s Happening?

Imagine ordering robot parts from a store, but:

  • The store doesn’t exist
  • You gave the wrong address
  • You don’t have permission to buy

ImagePullBackOff means Kubernetes can’t download the container image your Pod needs!

Why Does This Happen?

graph TD A["Pod Needs Image"] --> B{Can Find Image?} B -->|No| C["ImagePullBackOff"] B -->|Yes| D{Has Permission?} D -->|No| C D -->|Yes| E["Pod Runs!"] style C fill:#FF6B6B style E fill:#4ECDC4

Common Causes:

  • ❌ Typo in image name - nignx instead of nginx
  • đŸ·ïž Wrong tag - nginx:v999 doesn’t exist
  • 🔐 Private registry - Need login credentials
  • 🌐 Network problems - Can’t reach the registry

How to Fix It

Step 1: Check the image name

kubectl describe pod pod-name | grep Image

Step 2: Test manually

docker pull your-image:tag

Step 3: Check for typos

# Wrong:
image: nignx:latest
# Correct:
image: nginx:latest

Step 4: Add image pull secrets (for private registries)

spec:
  imagePullSecrets:
  - name: my-registry-secret

⏳ Pending Pod Issues: The Robot Waiting in Line

What’s Happening?

Your robot is ready to work but there’s no desk available! The Pod is created but stays in “Pending” state—waiting, waiting, waiting


Why Does This Happen?

graph TD A["Pod Created"] --> B{Resources Available?} B -->|No CPU/Memory| C["Pending - No Resources"] B -->|No Matching Node| D["Pending - No Node"] B -->|PVC Not Ready| E["Pending - Volume Issue"] C --> F["Pod Waits..."] D --> F E --> F style F fill:#FFE66D

Common Causes:

  • đŸ’» Not enough CPU or memory - Cluster is full
  • đŸ·ïž Node selector mismatch - No node has the required label
  • đŸ’Ÿ Volume not ready - PersistentVolumeClaim pending
  • đŸš« Taints and tolerations - Pod not allowed on available nodes

How to Fix It

Step 1: See why it’s pending

kubectl describe pod pod-name

Look at the Events section at the bottom!

Step 2: Check resources

kubectl describe nodes | grep -A 5 "Allocated"

Step 3: Solutions

For no resources:

# Reduce your requests
resources:
  requests:
    memory: "64Mi"   # Ask for less
    cpu: "100m"

For node selector issues:

# Check available labels
kubectl get nodes --show-labels

đŸ’„ OOMKilled: The Robot Ate Too Much Memory

What’s Happening?

Imagine giving a robot a small backpack, but it tries to stuff a giant teddy bear inside. The backpack explodes!

OOMKilled = Out Of Memory Killed. Your app used more memory than allowed, so Kubernetes stopped it.

Why Does This Happen?

graph TD A["App Uses Memory"] --> B{Under Limit?} B -->|Yes| C["App Runs Happy"] B -->|No - Over Limit| D["OOMKilled!"] D --> E["Pod Restarts"] style C fill:#4ECDC4 style D fill:#FF6B6B

Common Causes:

  • 📊 Memory limit too low - App needs more than you allowed
  • 🐛 Memory leak - App keeps using more and more memory
  • 📈 Traffic spike - Sudden load uses extra memory

How to Fix It

Step 1: Confirm the problem

kubectl describe pod pod-name | grep OOMKilled
kubectl get pod pod-name -o yaml | grep -A3 lastState

Step 2: Check current limits

resources:
  limits:
    memory: "128Mi"  # Too small?
  requests:
    memory: "64Mi"

Step 3: Increase memory (if needed)

resources:
  limits:
    memory: "512Mi"  # Give more room
  requests:
    memory: "256Mi"

Step 4: Fix memory leaks

  • Profile your application
  • Check for objects that never get cleaned up

🔧 CreateContainerConfigError: Wrong Robot Instructions

What’s Happening?

You’re giving the robot assembly instructions, but some parts are missing or the instructions have errors. The robot can’t even start!

CreateContainerConfigError means Kubernetes can’t configure the container properly before starting it.

Why Does This Happen?

graph TD A["Pod Starting"] --> B{Config Valid?} B -->|Secret Missing| C["ConfigError"] B -->|ConfigMap Missing| C B -->|Mount Error| C B -->|All Good| D["Container Starts"] style C fill:#FF6B6B style D fill:#4ECDC4

Common Causes:

  • 🔑 Secret doesn’t exist - Referenced secret not found
  • 📄 ConfigMap missing - Referenced ConfigMap not found
  • 📁 Key not found - Secret/ConfigMap exists but key is wrong
  • 🔒 Wrong permissions - Can’t access the resource

How to Fix It

Step 1: Find the exact error

kubectl describe pod pod-name

Look for messages like:

  • secret "my-secret" not found
  • configmap "my-config" not found

Step 2: Check if resources exist

kubectl get secrets
kubectl get configmaps

Step 3: Create missing resources

# Create a secret
kubectl create secret generic my-secret \
  --from-literal=password=mypassword

# Create a ConfigMap
kubectl create configmap my-config \
  --from-literal=key=value

Step 4: Verify the key names

# Make sure this key actually exists
env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: my-secret
      key: password  # Does this key exist?

đŸšȘ Pod Stuck Terminating: The Robot Won’t Leave

What’s Happening?

It’s closing time, but one robot refuses to leave the building! You told the Pod to stop, but it’s stuck in “Terminating” state forever.

Why Does This Happen?

graph TD A["Delete Pod"] --> B["Send SIGTERM"] B --> C{App Responds?} C -->|Yes| D["Pod Stops"] C -->|No| E["Wait Grace Period"] E --> F["Send SIGKILL"] F --> G{Still Stuck?} G -->|Finalizers| H["Terminating Forever"] G -->|Volume Issues| H style D fill:#4ECDC4 style H fill:#FF6B6B

Common Causes:

  • ⏰ App ignores shutdown signal - Doesn’t handle SIGTERM
  • 🔗 Finalizers blocking - Cleanup tasks stuck
  • đŸ’Ÿ Volume unmount issues - Can’t detach storage
  • 🌐 Network problems - Webhook timeouts

How to Fix It

Step 1: Check what’s blocking

kubectl describe pod pod-name
kubectl get pod pod-name -o yaml | grep finalizers

Step 2: Wait for grace period (default 30 seconds)

Step 3: Force delete (use carefully!)

kubectl delete pod pod-name --grace-period=0 --force

Step 4: Remove finalizers (last resort)

kubectl patch pod pod-name \
  -p '{"metadata":{"finalizers":null}}'

⚠ Warning: Force deleting can leave resources behind. Always try to fix the root cause first!


🎯 Quick Diagnosis Flowchart

graph TD A["Pod Not Working"] --> B{What's the Status?} B -->|CrashLoopBackOff| C["Check logs for app errors"] B -->|ImagePullBackOff| D["Verify image name & auth"] B -->|Pending| E["Check resources & node selectors"] B -->|OOMKilled| F["Increase memory limits"] B -->|CreateContainerConfigError| G["Check secrets & configmaps"] B -->|Terminating| H["Check finalizers & volumes"] style A fill:#667eea style C fill:#4ECDC4 style D fill:#4ECDC4 style E fill:#4ECDC4 style F fill:#4ECDC4 style G fill:#4ECDC4 style H fill:#4ECDC4

🏆 You’re Now a Pod Doctor!

Remember these golden rules:

  1. Always start with kubectl describe pod - It tells the story
  2. Check logs with kubectl logs - See what your app says
  3. Don’t panic! - Every error has a solution
  4. Learn the patterns - Most issues fall into these 6 categories

You’ve got this! Every Kubernetes expert started by fixing these same errors. Each bug you fix makes you stronger! đŸ’Ș

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.