🎯 Automatic Differentiation: PyTorch’s Autograd Core

The Magic Behind Learning Machines

Imagine you’re a detective trying to find hidden treasure. You have a map, but the map only tells you “you’re getting warmer” or “you’re getting colder.” How do you find the treasure? You take small steps in different directions and see which way makes you “warmer.”

That’s exactly what neural networks do! And the tool that tells them “warmer” or “colder”? That’s Autograd – PyTorch’s automatic differentiation engine.

🎨 Our Analogy: The River Flow

Throughout this guide, think of calculations as water flowing through rivers:

Numbers flow downstream like water
Gradients (learning signals) flow upstream like salmon swimming back home
The river system itself is the computational graph

1️⃣ What is Automatic Differentiation?

The Big Idea

When you throw a ball, your brain automatically calculates how hard and at what angle. You don’t solve physics equations – your brain just knows.

Automatic differentiation gives computers this same power. Instead of manually calculating how to adjust parameters, PyTorch figures it out automatically!

Simple Example

import torch

# Create a number we want to learn from
x = torch.tensor(3.0, requires_grad=True)

# Do some math
y = x * x + 2 * x + 1  # y = x² + 2x + 1

# Ask: "How does y change when x changes?"
y.backward()

# The answer!
print(x.grad)  # Output: 8.0

What happened?

When x = 3, the gradient is 2x + 2 = 8
PyTorch calculated this automatically!
No manual calculus needed 🎉

2️⃣ Computational Graphs

Rivers Have a Map

Every calculation in PyTorch creates an invisible map called a computational graph. Think of it as drawing the rivers where your numbers flow.

graph TD
    A[x = 3] --> B[x × x = 9]
    A --> C[2 × x = 6]
    B --> D[9 + 6 = 15]
    C --> D
    D --> E[15 + 1 = 16]
    style A fill:#e1f5fe
    style E fill:#c8e6c9

Why Does This Matter?

The graph remembers every step of your calculation. When it’s time to learn (backpropagation), PyTorch follows the map backward to figure out how to improve.

Example: Building a Graph

import torch

a = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

c = a + b        # Node 1: addition
d = a * b        # Node 2: multiplication
e = c * d        # Node 3: final result

# Graph built automatically!
e.backward()

print(a.grad)  # How 'a' affects 'e'
print(b.grad)  # How 'b' affects 'e'

3️⃣ requires_grad and Gradients

Telling PyTorch: “Watch This!”

Not every number needs gradients. Your input data? No gradients needed. Your learnable weights? Absolutely!

requires_grad=True is like putting a GPS tracker on a number – PyTorch will watch where it goes and remember how to trace back.

The Flag System

Setting	Meaning
`requires_grad=True`	“Track this! I want to learn from it”
`requires_grad=False`	“Ignore this, just pass through”

Example: Tracking vs Not Tracking

import torch

# Tracked tensor (learnable)
weights = torch.tensor([1.0, 2.0],
                       requires_grad=True)

# Untracked tensor (input data)
inputs = torch.tensor([3.0, 4.0])

# Only weights will receive gradients
output = (weights * inputs).sum()
output.backward()

print(weights.grad)  # [3.0, 4.0]
print(inputs.grad)   # None (not tracked!)

4️⃣ Leaf and Non-Leaf Tensors

The Family Tree of Numbers

In our river analogy:

Leaf tensors = The mountain springs (sources)
Non-leaf tensors = Points where rivers merge

What’s a Leaf Tensor?

A leaf tensor is one you create directly. It’s the starting point – the “leaves” of your computational family tree.

import torch

# ✅ Leaf tensor (you created it directly)
x = torch.tensor(5.0, requires_grad=True)
print(x.is_leaf)  # True

# ❌ Non-leaf tensor (result of operation)
y = x * 2
print(y.is_leaf)  # False

Why Does This Matter?

Only leaf tensors keep their gradients by default!

Non-leaf gradients are thrown away to save memory. If you need them, use retain_grad():

x = torch.tensor(3.0, requires_grad=True)
y = x * 2  # Non-leaf

y.retain_grad()  # "Keep my gradient!"
z = y * 3
z.backward()

print(x.grad)  # 6.0 (always kept)
print(y.grad)  # 3.0 (kept because we asked)

5️⃣ Backpropagation Mechanics

Salmon Swimming Upstream

Remember our river analogy? Backpropagation is like salmon swimming upstream – going backward through the river system to reach the mountain springs.

The Chain Rule Made Simple

Imagine a chain of dominoes:

Push the last one → it knocks the one before it
Each domino’s “knock strength” multiplies

That’s the chain rule! Gradients multiply as they flow backward.

graph LR
    A[Input x] -->|forward| B[Hidden h]
    B -->|forward| C[Output y]
    C -->|backward| B
    B -->|backward| A
    style C fill:#ffcdd2
    style A fill:#c8e6c9

Step-by-Step Example

import torch

x = torch.tensor(2.0, requires_grad=True)

# Forward pass: y = (x + 1)²
h = x + 1        # h = 3
y = h * h        # y = 9

# Backward pass
y.backward()

# Gradient flows: dy/dh = 2h = 6
#                dh/dx = 1
#                dy/dx = 6 × 1 = 6
print(x.grad)  # 6.0 ✓

6️⃣ Zeroing Gradients

Cleaning the Whiteboard

Here’s a common trap for beginners: PyTorch accumulates gradients!

Each time you call .backward(), gradients ADD to existing ones. It’s like writing on a whiteboard without erasing – things get messy fast.

The Problem

import torch

x = torch.tensor(3.0, requires_grad=True)

# First calculation
y1 = x * 2
y1.backward()
print(x.grad)  # 2.0 ✓

# Second calculation (OOPS!)
y2 = x * 3
y2.backward()
print(x.grad)  # 5.0 (2 + 3, accumulated!)

The Solution: Zero Your Gradients!

import torch

x = torch.tensor(3.0, requires_grad=True)

# First calculation
y1 = x * 2
y1.backward()
print(x.grad)  # 2.0

# CLEAN THE WHITEBOARD
x.grad.zero_()

# Second calculation (correct!)
y2 = x * 3
y2.backward()
print(x.grad)  # 3.0 ✓

In Neural Network Training

# Standard training loop pattern
optimizer.zero_grad()  # Erase old gradients
loss.backward()        # Calculate new gradients
optimizer.step()       # Update weights

7️⃣ Gradient Accumulation

Sometimes You WANT to Accumulate!

Wait – if accumulation is usually bad, why does PyTorch do it?

Answer: Sometimes you NEED it! Especially when your GPU can’t fit a big batch.

The Mini-Batch Trick

Instead of processing 32 images at once (too big for GPU), process 8 images 4 times and accumulate!

graph TD
    A[Batch 1: 8 images] -->|grad| E[Accumulated Gradient]
    B[Batch 2: 8 images] -->|grad| E
    C[Batch 3: 8 images] -->|grad| E
    D[Batch 4: 8 images] -->|grad| E
    E -->|update| F[Update Weights Once]
    style E fill:#fff9c4
    style F fill:#c8e6c9

Example: Accumulating Over Mini-Batches

import torch

model = MyModel()
optimizer = torch.optim.SGD(model.parameters())
accumulation_steps = 4

optimizer.zero_grad()  # Start fresh

for i, (inputs, targets) in enumerate(dataloader):
    outputs = model(inputs)
    loss = criterion(outputs, targets)

    # Gradients accumulate automatically!
    loss.backward()

    # Update only every 4 batches
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()  # Reset for next cycle

🎯 Quick Summary

Concept	What It Does	Key Point
Automatic Differentiation	Calculates gradients automatically	No manual calculus needed
Computational Graph	Records operations	Built during forward pass
requires_grad	Flags tensors for tracking	Only tracked tensors get gradients
Leaf Tensors	User-created tensors	Keep gradients by default
Backpropagation	Flows gradients backward	Uses chain rule
Zeroing Gradients	Clears accumulated gradients	Essential before each backward
Gradient Accumulation	Sums gradients across batches	Useful for large effective batches

🏆 You Did It!

You now understand the heart of PyTorch’s learning engine!

Think of it this way:

🏔️ Your parameters are mountain springs (leaf tensors)
🌊 Forward pass = Water flowing downhill (computation)
🐟 Backward pass = Salmon swimming upstream (gradients)
🧹 Zero gradients = Cleaning up for the next journey

Every time you train a neural network, this beautiful dance happens automatically. PyTorch builds the graph, traces the path, and tells each parameter exactly how to improve.

Now go build something amazing! 🚀

Loading story...

No Story Available

This concept doesn't have a story yet.

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Quiz Available

This concept doesn't have a quiz yet.

Autograd Core

Unable to load concept

Coming Soon...

🎯 Automatic Differentiation: PyTorch’s Autograd Core

The Magic Behind Learning Machines

🎨 Our Analogy: The River Flow

1️⃣ What is Automatic Differentiation?

The Big Idea

Simple Example

2️⃣ Computational Graphs

Rivers Have a Map

Why Does This Matter?

Example: Building a Graph

3️⃣ requires_grad and Gradients

Telling PyTorch: “Watch This!”

The Flag System

Example: Tracking vs Not Tracking

4️⃣ Leaf and Non-Leaf Tensors

The Family Tree of Numbers

What’s a Leaf Tensor?

Why Does This Matter?

5️⃣ Backpropagation Mechanics

Salmon Swimming Upstream

The Chain Rule Made Simple

Step-by-Step Example

6️⃣ Zeroing Gradients

Cleaning the Whiteboard

The Problem

The Solution: Zero Your Gradients!

In Neural Network Training

7️⃣ Gradient Accumulation

Sometimes You WANT to Accumulate!

The Mini-Batch Trick

Example: Accumulating Over Mini-Batches

🎯 Quick Summary

🏆 You Did It!

No Story Available

Story - Premium Content

Interactive - Premium Content

No Interactive Content

Cheatsheet - Premium Content

No Cheatsheet Available

Quiz - Premium Content

No Quiz Available

Report an Issue