What is a Device Plugin in Kubernetes?

A device plugin tells Kubernetes about special hardware like GPUs. It discovers, registers, and allocates hardware resources to pods that request them.

How does Node Feature Discovery work?

NFD scans each node to detect hardware capabilities like GPUs, SSDs, and CPU features. It adds labels to nodes so the scheduler knows what each can do.

How do I request a GPU in a Kubernetes pod?

Add a resource limit like nvidia.com/gpu: 1 to your pod spec. The device plugin allocates an available GPU to your container automatically.

Kubernetes GPU & Device Plugins | Hardware Guide

GPU and Special Hardware in Kubernetes

The Story of the Super Workshop 🏭

Imagine you have a giant workshop with many worker tables (these are your Kubernetes nodes). Most tables are good for regular work—cutting paper, writing, drawing. But sometimes you need special tools: a super-powerful laser cutter, a 3D printer, or a microscope.

These special tools are like GPUs and special hardware in Kubernetes. Not every table has them, and you need a smart way to:

Tell Kubernetes which tables have special tools (Node Feature Discovery)
Let your projects use those tools properly (Device Plugins)

Let’s explore this magical workshop!

What is a GPU? 🎮

A GPU (Graphics Processing Unit) is like a super-brain that’s really good at doing many small tasks at once.

Simple Example:

Your regular brain (CPU): Solves one hard math problem at a time
GPU brain: Solves 1000 easy math problems ALL AT ONCE!

Why Do We Need GPUs?

Task	CPU (Regular Brain)	GPU (Super Brain)
Training AI	🐢 Slow (days)	🚀 Fast (hours)
Video editing	😴 Sluggish	⚡ Smooth
Scientific math	📚 One by one	🎆 Thousands together

Device Plugins: The Tool Librarians 📚

The Problem

Kubernetes is smart, but it doesn’t automatically know about special hardware. It’s like having a librarian who knows about books but not about the 3D printer in the corner.

The Solution: Device Plugins!

A Device Plugin is like a special helper that tells Kubernetes:

“Hey! This node has 2 GPUs ready to use!”

graph TD
    A["🖥️ Node with GPU"] --> B["Device Plugin"]
    B --> C["📢 Tells Kubernetes"]
    C --> D["✅ GPU Available!"]
    D --> E["🚀 Pods Can Use GPU"]

How Device Plugins Work

Step 1: Discovery The device plugin finds all the GPUs on the node.

Step 2: Registration It tells the kubelet: “I manage GPUs!”

Step 3: Allocation When a pod asks for a GPU, the plugin gives it one.

Real Example: NVIDIA Device Plugin

This is the most popular GPU plugin. It lets your pods use NVIDIA graphics cards.

# Installing NVIDIA device plugin
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin
spec:
  selector:
    matchLabels:
      name: nvidia-plugin
  template:
    spec:
      containers:
      - name: nvidia-plugin
        image: nvidia/k8s-device-plugin

Requesting a GPU in Your Pod

Once the plugin is running, asking for a GPU is easy!

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
  - name: cuda-app
    image: my-ml-app
    resources:
      limits:
        nvidia.com/gpu: 1

🎯 Key Point: The nvidia.com/gpu: 1 line is like saying “I need 1 special tool from the GPU shelf!”

Node Feature Discovery: The Detective 🔍

The Problem

Your cluster has 100 nodes. Some have GPUs. Some have fast SSDs. Some have special Intel features. How does Kubernetes know what each node can do?

Enter: Node Feature Discovery (NFD)!

NFD is like a detective that visits every node and creates a detailed report of its special abilities.

graph TD
    A["🔍 NFD Visits Node"] --> B["Checks Hardware"]
    B --> C["Finds: GPU ✅"]
    B --> D["Finds: Fast SSD ✅"]
    B --> E["Finds: Intel AVX ✅"]
    C --> F["🏷️ Adds Labels to Node"]
    D --> F
    E --> F
    F --> G["Scheduler Knows Everything!"]

What NFD Discovers

Category	Examples
CPU	Intel, AMD, number of cores, special instructions
Memory	How much RAM, memory speed
Storage	SSD, NVMe, rotational drives
Network	Speed, SR-IOV capability
GPU	NVIDIA, AMD, model, memory
Custom	Your own special features!

NFD Labels: The Name Tags

After NFD runs, your nodes get labels like name tags:

feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
feature.node.kubernetes.io/pci-1234.present=true
feature.node.kubernetes.io/system-os_release.ID=ubuntu

Installing Node Feature Discovery

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nfd-worker
  namespace: node-feature-discovery
spec:
  selector:
    matchLabels:
      app: nfd-worker
  template:
    spec:
      containers:
      - name: nfd-worker
        image: registry.k8s.io/nfd/node-feature-discovery:v0.14.0
        args:
          - "-feature-sources=all"

Using NFD Labels for Scheduling

Now you can tell Kubernetes: “Run this pod ONLY on nodes with GPUs!”

apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  nodeSelector:
    feature.node.kubernetes.io/pci-10de.present: "true"
  containers:
  - name: trainer
    image: my-ml-trainer

🧠 Fun Fact: 10de is NVIDIA’s vendor code. NFD found it automatically!

How They Work Together 🤝

Device Plugins and NFD are best friends!

graph TD
    A["Node Feature Discovery"] -->|Finds GPUs| B["Adds Labels"]
    B --> C["Scheduler Sees Labels"]
    D["Device Plugin"] -->|Registers GPUs| E["Kubelet Knows Count"]
    E --> F["Pods Can Request GPUs"]
    C --> G["Smart Scheduling!"]
    F --> G

The Complete Flow

NFD scans the node and adds labels
Device Plugin tells kubelet about GPU count
You write a pod asking for GPU
Scheduler finds nodes with GPU label
Kubelet allocates an actual GPU to your pod
Your pod runs with GPU power! 🚀

Common Device Plugins 🔌

Plugin	Hardware	What It Does
NVIDIA	GPU	Exposes NVIDIA graphics cards
AMD	GPU	Exposes AMD graphics cards
Intel	GPU/FPGA	Intel accelerators
SR-IOV	Network	Fast network cards
RDMA	Network	Ultra-fast networking

Practice Example: ML Training Setup

Let’s set up a cluster for machine learning!

Step 1: Install NFD

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/node-feature-discovery/v0.14.0/deployment/nfd.yaml

Step 2: Install NVIDIA Device Plugin

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml

Step 3: Check Your Nodes

kubectl get nodes -o json | jq '.items[].metadata.labels' | grep feature

Step 4: Run Your ML Pod

apiVersion: v1
kind: Pod
metadata:
  name: pytorch-training
spec:
  nodeSelector:
    feature.node.kubernetes.io/pci-10de.present: "true"
  containers:
  - name: pytorch
    image: pytorch/pytorch:latest
    resources:
      limits:
        nvidia.com/gpu: 2
    command: ["python", "train.py"]

Key Takeaways 🎯

Device Plugins = Librarians that manage special hardware
NFD = Detective that discovers what each node can do
Labels = Name tags that help scheduling
Together = Smart placement of GPU workloads!

Remember This Analogy:

🏭 Workshop = Cluster 🪑 Tables = Nodes 🔧 Special Tools = GPUs/Hardware 📋 Tool Inventory = Device Plugin 🔍 Inspector = Node Feature Discovery 🏷️ Labels = What tools each table has

Quick Reference

Request 1 GPU:

resources:
  limits:
    nvidia.com/gpu: 1

Target GPU Nodes:

nodeSelector:
  feature.node.kubernetes.io/pci-10de.present: "true"

Check Available GPUs:

kubectl describe node <node-name> | grep nvidia

You now understand how Kubernetes finds and uses special hardware! 🎉

The detective (NFD) discovers what’s special about each node, and the librarian (Device Plugin) makes sure your pods can use those special tools. Together, they make GPU workloads on Kubernetes magical! ✨

GPU and Special Hardware

Unable to load concept

Coming Soon...

GPU and Special Hardware in Kubernetes

The Story of the Super Workshop 🏭

What is a GPU? 🎮

Simple Example:

Why Do We Need GPUs?

Device Plugins: The Tool Librarians 📚

The Problem

The Solution: Device Plugins!

How Device Plugins Work

Real Example: NVIDIA Device Plugin

Requesting a GPU in Your Pod

Node Feature Discovery: The Detective 🔍

The Problem

Enter: Node Feature Discovery (NFD)!

What NFD Discovers

NFD Labels: The Name Tags

Installing Node Feature Discovery

Using NFD Labels for Scheduling

How They Work Together 🤝

The Complete Flow

Common Device Plugins 🔌

Practice Example: ML Training Setup

Step 1: Install NFD

Step 2: Install NVIDIA Device Plugin

Step 3: Check Your Nodes

Step 4: Run Your ML Pod

Key Takeaways 🎯

Remember This Analogy:

Quick Reference

Request 1 GPU:

Target GPU Nodes:

Check Available GPUs:

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue