Autograd Core

Loading concept...

🎯 Automatic Differentiation: PyTorch’s Autograd Core

The Magic Behind Learning Machines

Imagine you’re a detective trying to find hidden treasure. You have a map, but the map only tells you “you’re getting warmer” or “you’re getting colder.” How do you find the treasure? You take small steps in different directions and see which way makes you “warmer.”

That’s exactly what neural networks do! And the tool that tells them “warmer” or “colder”? That’s Autograd – PyTorch’s automatic differentiation engine.


🎨 Our Analogy: The River Flow

Throughout this guide, think of calculations as water flowing through rivers:

  • Numbers flow downstream like water
  • Gradients (learning signals) flow upstream like salmon swimming back home
  • The river system itself is the computational graph

1️⃣ What is Automatic Differentiation?

The Big Idea

When you throw a ball, your brain automatically calculates how hard and at what angle. You don’t solve physics equations – your brain just knows.

Automatic differentiation gives computers this same power. Instead of manually calculating how to adjust parameters, PyTorch figures it out automatically!

Simple Example

import torch

# Create a number we want to learn from
x = torch.tensor(3.0, requires_grad=True)

# Do some math
y = x * x + 2 * x + 1  # y = x² + 2x + 1

# Ask: "How does y change when x changes?"
y.backward()

# The answer!
print(x.grad)  # Output: 8.0

What happened?

  • When x = 3, the gradient is 2x + 2 = 8
  • PyTorch calculated this automatically!
  • No manual calculus needed 🎉

2️⃣ Computational Graphs

Rivers Have a Map

Every calculation in PyTorch creates an invisible map called a computational graph. Think of it as drawing the rivers where your numbers flow.

graph TD A[x = 3] --> B[x × x = 9] A --> C[2 × x = 6] B --> D[9 + 6 = 15] C --> D D --> E[15 + 1 = 16] style A fill:#e1f5fe style E fill:#c8e6c9

Why Does This Matter?

The graph remembers every step of your calculation. When it’s time to learn (backpropagation), PyTorch follows the map backward to figure out how to improve.

Example: Building a Graph

import torch

a = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

c = a + b        # Node 1: addition
d = a * b        # Node 2: multiplication
e = c * d        # Node 3: final result

# Graph built automatically!
e.backward()

print(a.grad)  # How 'a' affects 'e'
print(b.grad)  # How 'b' affects 'e'

3️⃣ requires_grad and Gradients

Telling PyTorch: “Watch This!”

Not every number needs gradients. Your input data? No gradients needed. Your learnable weights? Absolutely!

requires_grad=True is like putting a GPS tracker on a number – PyTorch will watch where it goes and remember how to trace back.

The Flag System

Setting Meaning
requires_grad=True “Track this! I want to learn from it”
requires_grad=False “Ignore this, just pass through”

Example: Tracking vs Not Tracking

import torch

# Tracked tensor (learnable)
weights = torch.tensor([1.0, 2.0],
                       requires_grad=True)

# Untracked tensor (input data)
inputs = torch.tensor([3.0, 4.0])

# Only weights will receive gradients
output = (weights * inputs).sum()
output.backward()

print(weights.grad)  # [3.0, 4.0]
print(inputs.grad)   # None (not tracked!)

4️⃣ Leaf and Non-Leaf Tensors

The Family Tree of Numbers

In our river analogy:

  • Leaf tensors = The mountain springs (sources)
  • Non-leaf tensors = Points where rivers merge

What’s a Leaf Tensor?

A leaf tensor is one you create directly. It’s the starting point – the “leaves” of your computational family tree.

import torch

# ✅ Leaf tensor (you created it directly)
x = torch.tensor(5.0, requires_grad=True)
print(x.is_leaf)  # True

# ❌ Non-leaf tensor (result of operation)
y = x * 2
print(y.is_leaf)  # False

Why Does This Matter?

Only leaf tensors keep their gradients by default!

Non-leaf gradients are thrown away to save memory. If you need them, use retain_grad():

x = torch.tensor(3.0, requires_grad=True)
y = x * 2  # Non-leaf

y.retain_grad()  # "Keep my gradient!"
z = y * 3
z.backward()

print(x.grad)  # 6.0 (always kept)
print(y.grad)  # 3.0 (kept because we asked)

5️⃣ Backpropagation Mechanics

Salmon Swimming Upstream

Remember our river analogy? Backpropagation is like salmon swimming upstream – going backward through the river system to reach the mountain springs.

The Chain Rule Made Simple

Imagine a chain of dominoes:

  • Push the last one → it knocks the one before it
  • Each domino’s “knock strength” multiplies

That’s the chain rule! Gradients multiply as they flow backward.

graph LR A[Input x] -->|forward| B[Hidden h] B -->|forward| C[Output y] C -->|backward| B B -->|backward| A style C fill:#ffcdd2 style A fill:#c8e6c9

Step-by-Step Example

import torch

x = torch.tensor(2.0, requires_grad=True)

# Forward pass: y = (x + 1)²
h = x + 1        # h = 3
y = h * h        # y = 9

# Backward pass
y.backward()

# Gradient flows: dy/dh = 2h = 6
#                dh/dx = 1
#                dy/dx = 6 × 1 = 6
print(x.grad)  # 6.0 ✓

6️⃣ Zeroing Gradients

Cleaning the Whiteboard

Here’s a common trap for beginners: PyTorch accumulates gradients!

Each time you call .backward(), gradients ADD to existing ones. It’s like writing on a whiteboard without erasing – things get messy fast.

The Problem

import torch

x = torch.tensor(3.0, requires_grad=True)

# First calculation
y1 = x * 2
y1.backward()
print(x.grad)  # 2.0 ✓

# Second calculation (OOPS!)
y2 = x * 3
y2.backward()
print(x.grad)  # 5.0 (2 + 3, accumulated!)

The Solution: Zero Your Gradients!

import torch

x = torch.tensor(3.0, requires_grad=True)

# First calculation
y1 = x * 2
y1.backward()
print(x.grad)  # 2.0

# CLEAN THE WHITEBOARD
x.grad.zero_()

# Second calculation (correct!)
y2 = x * 3
y2.backward()
print(x.grad)  # 3.0 ✓

In Neural Network Training

# Standard training loop pattern
optimizer.zero_grad()  # Erase old gradients
loss.backward()        # Calculate new gradients
optimizer.step()       # Update weights

7️⃣ Gradient Accumulation

Sometimes You WANT to Accumulate!

Wait – if accumulation is usually bad, why does PyTorch do it?

Answer: Sometimes you NEED it! Especially when your GPU can’t fit a big batch.

The Mini-Batch Trick

Instead of processing 32 images at once (too big for GPU), process 8 images 4 times and accumulate!

graph TD A[Batch 1: 8 images] -->|grad| E[Accumulated Gradient] B[Batch 2: 8 images] -->|grad| E C[Batch 3: 8 images] -->|grad| E D[Batch 4: 8 images] -->|grad| E E -->|update| F[Update Weights Once] style E fill:#fff9c4 style F fill:#c8e6c9

Example: Accumulating Over Mini-Batches

import torch

model = MyModel()
optimizer = torch.optim.SGD(model.parameters())
accumulation_steps = 4

optimizer.zero_grad()  # Start fresh

for i, (inputs, targets) in enumerate(dataloader):
    outputs = model(inputs)
    loss = criterion(outputs, targets)

    # Gradients accumulate automatically!
    loss.backward()

    # Update only every 4 batches
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()  # Reset for next cycle

🎯 Quick Summary

Concept What It Does Key Point
Automatic Differentiation Calculates gradients automatically No manual calculus needed
Computational Graph Records operations Built during forward pass
requires_grad Flags tensors for tracking Only tracked tensors get gradients
Leaf Tensors User-created tensors Keep gradients by default
Backpropagation Flows gradients backward Uses chain rule
Zeroing Gradients Clears accumulated gradients Essential before each backward
Gradient Accumulation Sums gradients across batches Useful for large effective batches

🏆 You Did It!

You now understand the heart of PyTorch’s learning engine!

Think of it this way:

  • 🏔️ Your parameters are mountain springs (leaf tensors)
  • 🌊 Forward pass = Water flowing downhill (computation)
  • 🐟 Backward pass = Salmon swimming upstream (gradients)
  • 🧹 Zero gradients = Cleaning up for the next journey

Every time you train a neural network, this beautiful dance happens automatically. PyTorch builds the graph, traces the path, and tells each parameter exactly how to improve.

Now go build something amazing! 🚀

Loading story...

No Story Available

This concept doesn't have a story yet.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Interactive Preview

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet Preview

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz Preview

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Quiz Available

This concept doesn't have a quiz yet.