What are loss functions in PyTorch?

Loss functions measure how far your model's predictions are from correct answers. The smaller the loss, the better your model performs.

What is HuberLoss and when should I use it?

HuberLoss combines MSE and MAE. It acts like MSE for small errors and MAE for big ones. Use it when you have noisy data with outliers.

When should I use cross-entropy loss?

Use CrossEntropyLoss for multi-class classification (3+ classes). For binary yes/no problems, use BCEWithLogitsLoss instead.

PyTorch Loss Functions | Training Guide

PyTorch Loss Functions: Your GPS for Training Neural Networks

The Big Picture: What Are Loss Functions?

Imagine you’re learning to throw darts at a target. Every time you throw, someone tells you how far you missed. That feedback helps you adjust your next throw. Loss functions do exactly this for neural networks!

Loss = The distance between what your model predicted and what the right answer actually was.

The smaller the loss, the closer your model is to being right. Training a neural network is like playing a game where you try to make this number as tiny as possible.

graph TD
    A[🎯 Your Model's Prediction] --> B["📏 Loss Function"]
    C["✅ Correct Answer"] --> B
    B --> D["📊 Loss Value"]
    D --> E["🔧 Model Adjusts Itself"]
    E --> A

Loss Functions Overview

Think of loss functions as different types of report cards. Some measure how well you did on math problems (regression). Others check if you picked the right category (classification). Each one is designed for a specific type of task.

The Three Big Questions:

What type of problem am I solving?
- Predicting a number? → Use Regression Loss
- Picking a category? → Use Classification Loss
- Comparing things? → Use Distance/Similarity Loss
How sensitive should the loss be to mistakes?
- Some losses punish big mistakes HARD
- Others are more forgiving
Do I need something special?
- Sometimes, you need a custom loss for unique problems

Regression Losses: Measuring “How Far Off?”

Regression is when your model predicts a number (like predicting house prices or temperature).

MSE Loss (Mean Squared Error)

The Story: Imagine measuring how far each dart landed from the bullseye, then squaring that distance. Big misses get MUCH bigger penalties!

import torch.nn as nn

loss_fn = nn.MSELoss()

prediction = torch.tensor([2.5, 0.0, 2.0])
target = torch.tensor([3.0, -0.5, 2.0])

loss = loss_fn(prediction, target)
# Calculates: ((2.5-3)² + (0-(-0.5))² + (2-2)²) / 3

When to use: Standard choice for most regression problems.

Watch out: Very sensitive to outliers (one big mistake can dominate).

MAE Loss (L1 Loss)

The Story: Just measure the actual distance without squaring. More forgiving for big mistakes!

loss_fn = nn.L1Loss()

prediction = torch.tensor([2.5, 0.0, 2.0])
target = torch.tensor([3.0, -0.5, 2.0])

loss = loss_fn(prediction, target)
# Calculates: (|2.5-3| + |0-(-0.5)| + |2-2|) / 3

When to use: When your data has outliers (weird extreme values).

HuberLoss: The Best of Both Worlds

The Story: Imagine a referee who is strict for small mistakes but becomes more lenient for huge blunders. That’s HuberLoss!

For small errors: Acts like MSE (squared penalty)
For big errors: Acts like MAE (linear penalty)

graph TD
    A["Error Size?"] --> B{Small Error}
    A --> C{Big Error}
    B --> D["📈 Square it - MSE style"]
    C --> E["📏 Linear - MAE style"]
    D --> F["Smooth &amp; Precise"]
    E --> G["Robust to Outliers"]

loss_fn = nn.HuberLoss(delta=1.0)

# delta controls the switch point
# - Below delta: behaves like MSE
# - Above delta: behaves like MAE

When to use: When you want MSE’s precision but MAE’s robustness. Great for noisy data!

Classification Losses: “Did You Pick the Right One?”

Classification is when your model chooses from categories (like “cat” vs “dog” or spam vs not-spam).

Cross-Entropy Loss

The Story: Your model says “I’m 90% sure this is a cat.” If it’s actually a cat, small penalty. If it’s actually a dog, BIG penalty for being so confident and wrong!

loss_fn = nn.CrossEntropyLoss()

# Model outputs (raw scores, not probabilities)
predictions = torch.tensor([[2.0, 1.0, 0.1]])
# True class (index 0 = first class)
targets = torch.tensor([0])

loss = loss_fn(predictions, targets)

Key insight: Punishes confident wrong answers more than uncertain wrong answers.

Binary Cross-Entropy (BCE)

The Story: For yes/no questions. Is it spam? Is the patient sick?

loss_fn = nn.BCEWithLogitsLoss()

# Raw model output (before sigmoid)
prediction = torch.tensor([0.8, -0.5, 1.2])
# True labels (0 or 1)
target = torch.tensor([1.0, 0.0, 1.0])

loss = loss_fn(prediction, target)

NLL Loss (Negative Log Likelihood)

The Story: Works with log-probabilities. Often used with LogSoftmax.

m = nn.LogSoftmax(dim=1)
loss_fn = nn.NLLLoss()

input = torch.randn(3, 5)  # 3 samples, 5 classes
target = torch.tensor([1, 0, 4])

loss = loss_fn(m(input), target)

Specialized Losses: For Special Tasks

Margin Losses (for Ranking)

The Story: “Make sure the correct answer scores higher than wrong answers by at least this much margin.”

# Hinge Embedding Loss
loss_fn = nn.HingeEmbeddingLoss(margin=1.0)

# MarginRanking Loss - for ranking tasks
loss_fn = nn.MarginRankingLoss(margin=1.0)

input1 = torch.tensor([1.0, 2.0, 3.0])
input2 = torch.tensor([0.5, 2.5, 2.0])
target = torch.tensor([1, -1, 1])  # 1: input1 should rank higher

loss = loss_fn(input1, input2, target)

Triplet Margin Loss

The Story: Given three things - an anchor, something similar, and something different - make sure similar things are closer than different things!

graph LR
    A["🎯 Anchor"] -->|Should be CLOSE| B["✅ Positive"]
    A -->|Should be FAR| C["❌ Negative"]

loss_fn = nn.TripletMarginLoss(margin=1.0)

anchor = torch.randn(10, 128)    # 10 samples, 128 features
positive = torch.randn(10, 128)  # Similar to anchor
negative = torch.randn(10, 128)  # Different from anchor

loss = loss_fn(anchor, positive, negative)

When to use: Face recognition, image similarity, recommendation systems.

KL Divergence Loss

The Story: Measures how different two probability distributions are. Like asking “how surprised would I be if I expected distribution A but got distribution B?”

loss_fn = nn.KLDivLoss(reduction='batchmean')

# Log-probabilities from your model
log_probs = torch.log_softmax(torch.randn(3, 5), dim=1)
# Target distribution (probabilities)
target_probs = torch.softmax(torch.randn(3, 5), dim=1)

loss = loss_fn(log_probs, target_probs)

When to use: Knowledge distillation, VAEs, comparing distributions.

Distance and Similarity Losses

Cosine Embedding Loss

The Story: Are two vectors pointing in the same direction? Useful when direction matters more than magnitude.

loss_fn = nn.CosineEmbeddingLoss()

vec1 = torch.randn(5, 10)
vec2 = torch.randn(5, 10)
# +1 means they should be similar
# -1 means they should be different
label = torch.tensor([1, -1, 1, 1, -1])

loss = loss_fn(vec1, vec2, label)

Pairwise Distance

pdist = nn.PairwiseDistance(p=2)  # L2 distance

x1 = torch.tensor([[0., 0.], [1., 1.]])
x2 = torch.tensor([[1., 1.], [2., 2.]])

distances = pdist(x1, x2)
# Result: tensor([1.4142, 1.4142])

Custom Loss Functions: Build Your Own!

Sometimes, pre-built losses don’t fit your problem. Here’s how to create your own!

Method 1: Simple Function

def my_custom_loss(pred, target):
    # Example: weighted MSE
    weights = torch.tensor([1.0, 2.0, 3.0])
    squared_diff = (pred - target) ** 2
    return (weights * squared_diff).mean()

Method 2: Class-Based (More Flexible)

class FocalLoss(nn.Module):
    def __init__(self, gamma=2.0, alpha=0.25):
        super().__init__()
        self.gamma = gamma
        self.alpha = alpha

    def forward(self, pred, target):
        ce_loss = F.cross_entropy(pred, target,
                                   reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1-pt)**self.gamma * ce_loss
        return focal_loss.mean()

# Usage
loss_fn = FocalLoss(gamma=2.0)
loss = loss_fn(predictions, targets)

When to Create Custom Loss:

Imbalanced data → Weight classes differently
Multiple objectives → Combine several losses
Domain-specific needs → Physics constraints, business rules

# Example: Combining losses
class CombinedLoss(nn.Module):
    def __init__(self, w1=0.5, w2=0.5):
        super().__init__()
        self.mse = nn.MSELoss()
        self.l1 = nn.L1Loss()
        self.w1 = w1
        self.w2 = w2

    def forward(self, pred, target):
        return self.w1 * self.mse(pred, target) + \
               self.w2 * self.l1(pred, target)

Quick Decision Guide

graph LR
    A[What's your task?] --> B{Regression?}
    A --> C{Classification?}
    A --> D{Similarity/Ranking?}

    B --> E{Clean data?}
    E -->|Yes| F["MSELoss"]
    E -->|Outliers| G["HuberLoss or L1Loss"]

    C --> H{How many classes?}
    H -->|2 classes| I["BCEWithLogitsLoss"]
    H -->|3+ classes| J["CrossEntropyLoss"]

    D --> K{Task type?}
    K -->|Face/Image matching| L["TripletMarginLoss"]
    K -->|Direction similarity| M["CosineEmbeddingLoss"]
    K -->|Ranking items| N["MarginRankingLoss"]

Summary: Your Loss Function Toolbox

Problem Type	Loss Function	One-Line Description
Regression	MSELoss	Squares errors, punishes big mistakes
Regression	L1Loss/MAE	Absolute errors, robust to outliers
Regression	HuberLoss	Hybrid: MSE for small, MAE for big
Binary Classification	BCEWithLogitsLoss	Yes/No questions
Multi-class	CrossEntropyLoss	Pick one from many
Similarity	TripletMarginLoss	Learn embeddings
Direction	CosineEmbeddingLoss	Same direction = similar
Distribution	KLDivLoss	Compare probabilities
Custom	Your own class	When nothing else fits

You’ve Got This!

Loss functions might seem like just math, but they’re actually your communication tool with the neural network. You’re telling it: “Here’s what matters to me. Here’s how I’ll judge your mistakes.”

Choose wisely, and your model will learn exactly what you want it to learn!

Loss Functions

Unable to load concept

Coming Soon...

PyTorch Loss Functions: Your GPS for Training Neural Networks

The Big Picture: What Are Loss Functions?

Loss Functions Overview

The Three Big Questions:

Regression Losses: Measuring “How Far Off?”

MSE Loss (Mean Squared Error)

MAE Loss (L1 Loss)

HuberLoss: The Best of Both Worlds

Classification Losses: “Did You Pick the Right One?”

Cross-Entropy Loss

Binary Cross-Entropy (BCE)

NLL Loss (Negative Log Likelihood)

Specialized Losses: For Special Tasks

Margin Losses (for Ranking)

Triplet Margin Loss

KL Divergence Loss

Distance and Similarity Losses

Cosine Embedding Loss

Pairwise Distance

Custom Loss Functions: Build Your Own!

Method 1: Simple Function

Method 2: Class-Based (More Flexible)

When to Create Custom Loss:

Quick Decision Guide

Summary: Your Loss Function Toolbox

You’ve Got This!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue