Loss Functions

Back

Loading concept...

PyTorch Loss Functions: Your GPS for Training Neural Networks

The Big Picture: What Are Loss Functions?

Imagine you’re learning to throw darts at a target. Every time you throw, someone tells you how far you missed. That feedback helps you adjust your next throw. Loss functions do exactly this for neural networks!

Loss = The distance between what your model predicted and what the right answer actually was.

The smaller the loss, the closer your model is to being right. Training a neural network is like playing a game where you try to make this number as tiny as possible.

graph TD A[🎯 Your Model's Prediction] --> B["📏 Loss Function"] C["✅ Correct Answer"] --> B B --> D["📊 Loss Value"] D --> E["🔧 Model Adjusts Itself"] E --> A

Loss Functions Overview

Think of loss functions as different types of report cards. Some measure how well you did on math problems (regression). Others check if you picked the right category (classification). Each one is designed for a specific type of task.

The Three Big Questions:

  1. What type of problem am I solving?

    • Predicting a number? → Use Regression Loss
    • Picking a category? → Use Classification Loss
    • Comparing things? → Use Distance/Similarity Loss
  2. How sensitive should the loss be to mistakes?

    • Some losses punish big mistakes HARD
    • Others are more forgiving
  3. Do I need something special?

    • Sometimes, you need a custom loss for unique problems

Regression Losses: Measuring “How Far Off?”

Regression is when your model predicts a number (like predicting house prices or temperature).

MSE Loss (Mean Squared Error)

The Story: Imagine measuring how far each dart landed from the bullseye, then squaring that distance. Big misses get MUCH bigger penalties!

import torch.nn as nn

loss_fn = nn.MSELoss()

prediction = torch.tensor([2.5, 0.0, 2.0])
target = torch.tensor([3.0, -0.5, 2.0])

loss = loss_fn(prediction, target)
# Calculates: ((2.5-3)² + (0-(-0.5))² + (2-2)²) / 3

When to use: Standard choice for most regression problems.

Watch out: Very sensitive to outliers (one big mistake can dominate).

MAE Loss (L1 Loss)

The Story: Just measure the actual distance without squaring. More forgiving for big mistakes!

loss_fn = nn.L1Loss()

prediction = torch.tensor([2.5, 0.0, 2.0])
target = torch.tensor([3.0, -0.5, 2.0])

loss = loss_fn(prediction, target)
# Calculates: (|2.5-3| + |0-(-0.5)| + |2-2|) / 3

When to use: When your data has outliers (weird extreme values).


HuberLoss: The Best of Both Worlds

The Story: Imagine a referee who is strict for small mistakes but becomes more lenient for huge blunders. That’s HuberLoss!

  • For small errors: Acts like MSE (squared penalty)
  • For big errors: Acts like MAE (linear penalty)
graph TD A["Error Size?"] --> B{Small Error} A --> C{Big Error} B --> D["📈 Square it - MSE style"] C --> E["📏 Linear - MAE style"] D --> F["Smooth & Precise"] E --> G["Robust to Outliers"]
loss_fn = nn.HuberLoss(delta=1.0)

# delta controls the switch point
# - Below delta: behaves like MSE
# - Above delta: behaves like MAE

When to use: When you want MSE’s precision but MAE’s robustness. Great for noisy data!


Classification Losses: “Did You Pick the Right One?”

Classification is when your model chooses from categories (like “cat” vs “dog” or spam vs not-spam).

Cross-Entropy Loss

The Story: Your model says “I’m 90% sure this is a cat.” If it’s actually a cat, small penalty. If it’s actually a dog, BIG penalty for being so confident and wrong!

loss_fn = nn.CrossEntropyLoss()

# Model outputs (raw scores, not probabilities)
predictions = torch.tensor([[2.0, 1.0, 0.1]])
# True class (index 0 = first class)
targets = torch.tensor([0])

loss = loss_fn(predictions, targets)

Key insight: Punishes confident wrong answers more than uncertain wrong answers.

Binary Cross-Entropy (BCE)

The Story: For yes/no questions. Is it spam? Is the patient sick?

loss_fn = nn.BCEWithLogitsLoss()

# Raw model output (before sigmoid)
prediction = torch.tensor([0.8, -0.5, 1.2])
# True labels (0 or 1)
target = torch.tensor([1.0, 0.0, 1.0])

loss = loss_fn(prediction, target)

NLL Loss (Negative Log Likelihood)

The Story: Works with log-probabilities. Often used with LogSoftmax.

m = nn.LogSoftmax(dim=1)
loss_fn = nn.NLLLoss()

input = torch.randn(3, 5)  # 3 samples, 5 classes
target = torch.tensor([1, 0, 4])

loss = loss_fn(m(input), target)

Specialized Losses: For Special Tasks

Margin Losses (for Ranking)

The Story: “Make sure the correct answer scores higher than wrong answers by at least this much margin.”

# Hinge Embedding Loss
loss_fn = nn.HingeEmbeddingLoss(margin=1.0)

# MarginRanking Loss - for ranking tasks
loss_fn = nn.MarginRankingLoss(margin=1.0)

input1 = torch.tensor([1.0, 2.0, 3.0])
input2 = torch.tensor([0.5, 2.5, 2.0])
target = torch.tensor([1, -1, 1])  # 1: input1 should rank higher

loss = loss_fn(input1, input2, target)

Triplet Margin Loss

The Story: Given three things - an anchor, something similar, and something different - make sure similar things are closer than different things!

graph LR A["🎯 Anchor"] -->|Should be CLOSE| B["✅ Positive"] A -->|Should be FAR| C["❌ Negative"]
loss_fn = nn.TripletMarginLoss(margin=1.0)

anchor = torch.randn(10, 128)    # 10 samples, 128 features
positive = torch.randn(10, 128)  # Similar to anchor
negative = torch.randn(10, 128)  # Different from anchor

loss = loss_fn(anchor, positive, negative)

When to use: Face recognition, image similarity, recommendation systems.

KL Divergence Loss

The Story: Measures how different two probability distributions are. Like asking “how surprised would I be if I expected distribution A but got distribution B?”

loss_fn = nn.KLDivLoss(reduction='batchmean')

# Log-probabilities from your model
log_probs = torch.log_softmax(torch.randn(3, 5), dim=1)
# Target distribution (probabilities)
target_probs = torch.softmax(torch.randn(3, 5), dim=1)

loss = loss_fn(log_probs, target_probs)

When to use: Knowledge distillation, VAEs, comparing distributions.


Distance and Similarity Losses

Cosine Embedding Loss

The Story: Are two vectors pointing in the same direction? Useful when direction matters more than magnitude.

loss_fn = nn.CosineEmbeddingLoss()

vec1 = torch.randn(5, 10)
vec2 = torch.randn(5, 10)
# +1 means they should be similar
# -1 means they should be different
label = torch.tensor([1, -1, 1, 1, -1])

loss = loss_fn(vec1, vec2, label)

Pairwise Distance

pdist = nn.PairwiseDistance(p=2)  # L2 distance

x1 = torch.tensor([[0., 0.], [1., 1.]])
x2 = torch.tensor([[1., 1.], [2., 2.]])

distances = pdist(x1, x2)
# Result: tensor([1.4142, 1.4142])

Custom Loss Functions: Build Your Own!

Sometimes, pre-built losses don’t fit your problem. Here’s how to create your own!

Method 1: Simple Function

def my_custom_loss(pred, target):
    # Example: weighted MSE
    weights = torch.tensor([1.0, 2.0, 3.0])
    squared_diff = (pred - target) ** 2
    return (weights * squared_diff).mean()

Method 2: Class-Based (More Flexible)

class FocalLoss(nn.Module):
    def __init__(self, gamma=2.0, alpha=0.25):
        super().__init__()
        self.gamma = gamma
        self.alpha = alpha

    def forward(self, pred, target):
        ce_loss = F.cross_entropy(pred, target,
                                   reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = self.alpha * (1-pt)**self.gamma * ce_loss
        return focal_loss.mean()

# Usage
loss_fn = FocalLoss(gamma=2.0)
loss = loss_fn(predictions, targets)

When to Create Custom Loss:

  1. Imbalanced data → Weight classes differently
  2. Multiple objectives → Combine several losses
  3. Domain-specific needs → Physics constraints, business rules
# Example: Combining losses
class CombinedLoss(nn.Module):
    def __init__(self, w1=0.5, w2=0.5):
        super().__init__()
        self.mse = nn.MSELoss()
        self.l1 = nn.L1Loss()
        self.w1 = w1
        self.w2 = w2

    def forward(self, pred, target):
        return self.w1 * self.mse(pred, target) + \
               self.w2 * self.l1(pred, target)

Quick Decision Guide

graph LR A[What's your task?] --> B{Regression?} A --> C{Classification?} A --> D{Similarity/Ranking?} B --> E{Clean data?} E -->|Yes| F["MSELoss"] E -->|Outliers| G["HuberLoss or L1Loss"] C --> H{How many classes?} H -->|2 classes| I["BCEWithLogitsLoss"] H -->|3+ classes| J["CrossEntropyLoss"] D --> K{Task type?} K -->|Face/Image matching| L["TripletMarginLoss"] K -->|Direction similarity| M["CosineEmbeddingLoss"] K -->|Ranking items| N["MarginRankingLoss"]

Summary: Your Loss Function Toolbox

Problem Type Loss Function One-Line Description
Regression MSELoss Squares errors, punishes big mistakes
Regression L1Loss/MAE Absolute errors, robust to outliers
Regression HuberLoss Hybrid: MSE for small, MAE for big
Binary Classification BCEWithLogitsLoss Yes/No questions
Multi-class CrossEntropyLoss Pick one from many
Similarity TripletMarginLoss Learn embeddings
Direction CosineEmbeddingLoss Same direction = similar
Distribution KLDivLoss Compare probabilities
Custom Your own class When nothing else fits

You’ve Got This!

Loss functions might seem like just math, but they’re actually your communication tool with the neural network. You’re telling it: “Here’s what matters to me. Here’s how I’ll judge your mistakes.”

Choose wisely, and your model will learn exactly what you want it to learn!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.