GPU and Device Management in PyTorch
The Story: Your Tensor’s Magical Moving Day 🚚
Imagine your tensor is a tiny worker bee 🐝 living in a big city. The city has different neighborhoods:
- CPU Land – A quiet suburb where everyone can live (your regular computer memory)
- CUDA City – A super-fast downtown with skyscrapers (NVIDIA GPU)
- MPS Village – A cozy Apple neighborhood (Apple Silicon GPU)
- Meta Cloud – A magical place where things exist but take no space (for planning)
Your tensor worker bee can live in any of these places, but to work together, bees must be in the same neighborhood!
1. Tensor Device Placement
What is a “Device”?
A device is simply where your tensor lives in your computer’s memory.
Think of it like choosing which room to put your toys in:
- Some toys go in the living room (CPU)
- Some toys go in the game room (GPU)
Creating Tensors on Specific Devices
import torch
# Lives in CPU Land (default)
cpu_tensor = torch.tensor([1, 2, 3])
# Lives in CUDA City (GPU)
gpu_tensor = torch.tensor([1, 2, 3],
device='cuda')
# Lives in MPS Village (Apple GPU)
mps_tensor = torch.tensor([1, 2, 3],
device='mps')
Checking Where Your Tensor Lives
my_tensor = torch.tensor([1, 2, 3])
print(my_tensor.device)
# Output: cpu
gpu_tensor = my_tensor.to('cuda')
print(gpu_tensor.device)
# Output: cuda:0
💡 Simple Rule: The
:0means “first GPU”. If you have two GPUs, they’recuda:0andcuda:1.
2. Moving Tensors Between Devices
The Golden Rule ⚠️
Tensors must be on the SAME device to work together!
This is like saying: “Two people can only high-five if they’re in the same room.”
# ❌ THIS BREAKS!
cpu_tensor = torch.tensor([1, 2, 3])
gpu_tensor = torch.tensor([4, 5, 6],
device='cuda')
# result = cpu_tensor + gpu_tensor
# ERROR! Different devices!
# ✅ THIS WORKS!
cpu_tensor = cpu_tensor.to('cuda')
result = cpu_tensor + gpu_tensor
# Both in CUDA City now!
Three Ways to Move Your Tensor
my_tensor = torch.tensor([1, 2, 3])
# Method 1: .to() - Most flexible
gpu_tensor = my_tensor.to('cuda')
# Method 2: .cuda() - Quick shortcut
gpu_tensor = my_tensor.cuda()
# Method 3: .cpu() - Back to CPU
back_to_cpu = gpu_tensor.cpu()
The “Copy vs Move” Secret
original = torch.tensor([1, 2, 3])
moved = original.to('cuda')
# original still exists on CPU!
# moved is a NEW copy on GPU!
🎯 Key Insight:
.to()creates a COPY. Your original tensor stays where it was.
3. CUDA Device Management
Is CUDA Available?
Before moving to GPU, always ask: “Is the GPU home open?”
# Check if CUDA is available
if torch.cuda.is_available():
print("GPU ready!")
device = 'cuda'
else:
print("Using CPU")
device = 'cpu'
How Many GPUs Do I Have?
gpu_count = torch.cuda.device_count()
print(f"You have {gpu_count} GPUs!")
Which GPU Am I Using?
# Get current GPU index
current = torch.cuda.current_device()
print(f"Using GPU #{current}")
# Get GPU name
name = torch.cuda.get_device_name(0)
print(f"GPU name: {name}")
Memory Management
GPUs have limited memory. Here’s how to check:
# See memory usage (in bytes)
allocated = torch.cuda.memory_allocated()
reserved = torch.cuda.memory_reserved()
print(f"Used: {allocated / 1e9:.2f} GB")
print(f"Reserved: {reserved / 1e9:.2f} GB")
# Clear unused memory
torch.cuda.empty_cache()
4. Multi-GPU Setup Basics
Choosing a Specific GPU
# Method 1: Specify in .to()
tensor_on_gpu1 = my_tensor.to('cuda:1')
# Method 2: Set default device
torch.cuda.set_device(1)
# Now all new CUDA tensors go to GPU 1
The Device-Agnostic Pattern ✨
Write code that works everywhere:
# Smart device selection
device = torch.device(
'cuda' if torch.cuda.is_available()
else 'cpu'
)
# Create tensor on best available device
my_tensor = torch.tensor([1, 2, 3],
device=device)
model = MyModel().to(device)
graph TD A[Start] --> B{CUDA Available?} B -->|Yes| C[Use cuda] B -->|No| D{MPS Available?} D -->|Yes| E[Use mps] D -->|No| F[Use cpu]
5. MPS for Apple Silicon
What is MPS?
MPS = Metal Performance Shaders
It’s Apple’s way to use the GPU on M1/M2/M3 chips!
Checking MPS Availability
# Is MPS available?
if torch.backends.mps.is_available():
device = torch.device('mps')
print("Using Apple GPU!")
else:
device = torch.device('cpu')
Using MPS
# Create tensor on Apple GPU
mps_tensor = torch.tensor([1, 2, 3],
device='mps')
# Move existing tensor
cpu_tensor = torch.tensor([4, 5, 6])
mps_tensor = cpu_tensor.to('mps')
The Universal Device Selector
def get_device():
if torch.cuda.is_available():
return torch.device('cuda')
elif torch.backends.mps.is_available():
return torch.device('mps')
else:
return torch.device('cpu')
device = get_device()
6. Meta Device
What is the Meta Device?
The meta device is like a blueprint.
Imagine you want to build a huge LEGO castle:
- Instead of buying all the bricks first…
- You draw a plan showing how big it will be
- Meta tensors are just the plan, not the actual bricks!
Why Use Meta?
- Save Memory: Plan big models without using RAM
- Check Shapes: See if your model fits before building it
Creating Meta Tensors
# Create a "ghost" tensor - shape only!
meta_tensor = torch.empty(
1000, 1000,
device='meta'
)
# It has shape but no memory!
print(meta_tensor.shape) # (1000, 1000)
print(meta_tensor.device) # meta
# But NO actual numbers inside!
Practical Example: Testing Model Size
# Test if a huge model fits
with torch.device('meta'):
huge_model = BigModel()
# Check parameter count without
# using any real memory!
params = sum(
p.numel()
for p in huge_model.parameters()
)
print(f"Model has {params:,} parameters")
Quick Reference Table
| Device | When to Use | Check Availability |
|---|---|---|
cpu |
Always works | Always available |
cuda |
NVIDIA GPU | torch.cuda.is_available() |
mps |
Apple M-chip | torch.backends.mps.is_available() |
meta |
Planning only | Always available |
The Complete Smart Device Pattern
import torch
def smart_device():
"""Pick the best available device."""
if torch.cuda.is_available():
return 'cuda'
if torch.backends.mps.is_available():
return 'mps'
return 'cpu'
# Use it everywhere!
device = smart_device()
data = torch.randn(100, device=device)
model = MyModel().to(device)
Summary: Your Tensor’s Home Address 🏠
graph TD T[Your Tensor] --> Q{Where to live?} Q --> CPU[CPU: Safe, Slow] Q --> CUDA[CUDA: Fast, NVIDIA] Q --> MPS[MPS: Fast, Apple] Q --> META[Meta: Planning Only] CPU --> RULE[Same device = Can work together!] CUDA --> RULE MPS --> RULE
Remember:
- ✅ Check device availability before using
- ✅ Keep tensors on the same device for operations
- ✅ Use
.to(device)to move tensors - ✅ Write device-agnostic code for portability
- ✅ Use meta device for planning big models
You’re now ready to manage your tensors across any device! 🚀