Model Training

Back

Loading concept...

๐ŸŽ“ Training Your Neural Network: Teaching a Robot Dog New Tricks


๐Ÿ• The Big Picture: What is Model Training?

Imagine you just got a robot dog. It doesnโ€™t know anything yetโ€”not how to sit, fetch, or even recognize your voice. Training is how you teach it!

In TensorFlow, your neural network is like that robot dog. It starts knowing nothing. Through training, it learns patterns from examples you show it, getting smarter with each practice session.

One Analogy for Everything: Throughout this guide, think of training a model like teaching a pet. You show examples, give feedback, and repeat until it learns!


๐Ÿ“ฆ Model Training with fit() โ€” The Magic Teaching Button

What is fit()?

The fit() method is your โ€œStart Teachingโ€ button. You press it, and TensorFlow begins showing your model examples, checking its answers, and helping it improve.

The Simplest Example

# Your robot dog learns from 1000 examples
# It practices 10 times over all examples
model.fit(
    training_data,   # Examples to learn from
    training_labels, # Correct answers
    epochs=10        # Practice 10 rounds
)

What happens inside fit()?

graph TD A["๐ŸŽฏ Show Example"] --> B["๐Ÿค” Model Guesses"] B --> C["โœ… Check Answer"] C --> D["๐Ÿ“Š Calculate Error"] D --> E["๐Ÿ”ง Adjust Brain"] E --> A
  1. Show the model one example
  2. Model guesses the answer
  3. Check if itโ€™s right or wrong
  4. Calculate how wrong it was
  5. Adjust the modelโ€™s โ€œbrainโ€ to do better
  6. Repeat thousands of times!

Real Code You Can Use

import tensorflow as tf

# Create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Prepare it for training
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# TEACH IT! ๐ŸŽ“
history = model.fit(
    x_train,      # Pictures/data
    y_train,      # Correct labels
    epochs=5,     # 5 practice rounds
    batch_size=32 # Learn 32 examples at once
)

๐Ÿ’ก Why batch_size=32? Imagine teaching 32 students at once instead of one-by-one. Itโ€™s faster and helps the robot dog learn more general patterns!


โš™๏ธ Training Configuration โ€” Setting Up the Classroom

Before training starts, you need to configure your model. This is like setting up the classroom rules before teaching begins.

The Three Essential Settings

model.compile(
    optimizer='adam',        # HOW to learn
    loss='mse',              # HOW to measure mistakes
    metrics=['accuracy']     # WHAT to track
)

1๏ธโƒฃ Optimizer โ€” The Learning Coach

The optimizer decides how the model improves after each mistake.

Optimizer Think of it asโ€ฆ Best for
'adam' Smart adaptive coach Most cases โœ…
'sgd' Simple but steady coach When you want control
'rmsprop' Good with sequences Time-series data
# Custom optimizer with learning rate
model.compile(
    optimizer=tf.keras.optimizers.Adam(
        learning_rate=0.001  # How big each step is
    ),
    loss='mse',
    metrics=['accuracy']
)

๐ŸŽฏ Learning Rate = How big the corrections are. Too big? It overshoots. Too small? It takes forever.

2๏ธโƒฃ Loss Function โ€” The Mistake Measurer

This tells the model how wrong it was.

Problem Type Loss Function Plain English
Yes/No answers binary_crossentropy โ€œHow sure and wrong were you?โ€
Multiple choices categorical_crossentropy โ€œWhich choice did you mess up?โ€
Number prediction mse (Mean Squared Error) โ€œHow far off was your number?โ€
# For predicting categories (dog, cat, bird)
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

3๏ธโƒฃ Metrics โ€” The Report Card

Metrics are what you watch during training. They donโ€™t affect learningโ€”they just help you see progress!

model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['accuracy', 'mae']  # Track both!
)

๐Ÿ“ž Built-in Callbacks โ€” Automatic Helpers

Callbacks are like automatic assistants that watch training and take action when something happens.

What Can Callbacks Do?

graph TD A["๐ŸŽ“ Training Starts"] --> B{Callback Watches} B --> C["๐Ÿ“Š Save Best Model"] B --> D["โน๏ธ Stop if Stuck"] B --> E["๐Ÿ“‰ Adjust Learning Rate"] B --> F["๐Ÿ“ Log to TensorBoard"]

The Most Useful Built-in Callbacks

1. ModelCheckpoint โ€” Save Your Best Work!

checkpoint = tf.keras.callbacks.ModelCheckpoint(
    'best_model.keras',    # Where to save
    monitor='val_loss',    # What to watch
    save_best_only=True    # Only save if better
)

model.fit(
    x_train, y_train,
    epochs=50,
    callbacks=[checkpoint]  # โ† Add it here!
)

๐Ÿ† This saves your model whenever it beats its personal best!

2. EarlyStopping โ€” Stop When Youโ€™re Done

Why keep training if the model stopped improving?

early_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',  # What to watch
    patience=5,          # Wait 5 epochs before stopping
    restore_best_weights=True  # Go back to best version
)

model.fit(
    x_train, y_train,
    epochs=100,  # Max 100, but might stop early
    callbacks=[early_stop]
)

โฑ๏ธ Patience=5 means: โ€œIf nothing improves for 5 rounds, stop!โ€

3. ReduceLROnPlateau โ€” Slow Down When Stuck

Sometimes the model is taking steps that are too big. This callback shrinks them automatically.

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,      # Cut learning rate in half
    patience=3,      # After 3 stuck epochs
    min_lr=0.00001   # Never go below this
)

4. TensorBoard โ€” Watch Training Live!

tensorboard = tf.keras.callbacks.TensorBoard(
    log_dir='./logs'
)
# Run: tensorboard --logdir=./logs

Using Multiple Callbacks Together

callbacks_list = [
    checkpoint,
    early_stop,
    reduce_lr,
    tensorboard
]

model.fit(
    x_train, y_train,
    epochs=100,
    validation_split=0.2,
    callbacks=callbacks_list  # All helpers active!
)

๐Ÿ› ๏ธ Custom Callbacks โ€” Build Your Own Helper

Sometimes built-in callbacks arenโ€™t enough. You can create your own!

The Callback Recipe

class MyCallback(tf.keras.callbacks.Callback):

    def on_epoch_end(self, epoch, logs=None):
        # This runs after EVERY epoch
        print(f"Epoch {epoch}: loss = {logs['loss']}")

Available Trigger Points

Method When it runs
on_train_begin Training starts
on_train_end Training finishes
on_epoch_begin Each epoch starts
on_epoch_end Each epoch finishes
on_batch_begin Each batch starts
on_batch_end Each batch finishes

Example: Stop if Accuracy Reaches Target

class AccuracyThreshold(tf.keras.callbacks.Callback):
    def __init__(self, threshold=0.95):
        super().__init__()
        self.threshold = threshold

    def on_epoch_end(self, epoch, logs=None):
        acc = logs.get('accuracy')
        if acc and acc >= self.threshold:
            print(f"\n๐ŸŽ‰ Reached {self.threshold*100}%!")
            self.model.stop_training = True

# Use it!
model.fit(
    x_train, y_train,
    epochs=100,
    callbacks=[AccuracyThreshold(0.99)]
)

Example: Custom Logging to File

class FileLogger(tf.keras.callbacks.Callback):
    def __init__(self, filename):
        super().__init__()
        self.filename = filename

    def on_epoch_end(self, epoch, logs=None):
        with open(self.filename, 'a') as f:
            f.write(f"Epoch {epoch}: {logs}\n")

๐ŸŽฎ Custom Training Loops โ€” Total Control Mode

The fit() method is great, but sometimes you need complete control. Thatโ€™s when you write your own training loop!

Why Use Custom Training Loops?

  • ๐Ÿ”ฌ Research: Try new training techniques
  • ๐ŸŽฏ Special Logic: Different loss for different samples
  • ๐Ÿ“Š Detailed Logging: Track exactly what you want
  • ๐Ÿงช Experiments: Mix multiple models together

The Basic Recipe

# 1. Prepare ingredients
optimizer = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()

# 2. Create dataset
dataset = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train)
).batch(32)

# 3. Training loop!
for epoch in range(10):
    print(f"Epoch {epoch + 1}")

    for x_batch, y_batch in dataset:

        # Record operations for gradient
        with tf.GradientTape() as tape:
            predictions = model(x_batch, training=True)
            loss = loss_fn(y_batch, predictions)

        # Calculate how to improve
        gradients = tape.gradient(loss, model.trainable_variables)

        # Apply improvements
        optimizer.apply_gradients(
            zip(gradients, model.trainable_variables)
        )

Understanding GradientTape

graph TD A["๐Ÿ“ผ Start Recording"] --> B["๐Ÿงฎ Do Math Operations"] B --> C["๐Ÿ“Š Calculate Loss"] C --> D["โน๏ธ Stop Recording"] D --> E["๐Ÿ” Ask: How to Improve?"] E --> F["๐Ÿ“ Get Gradients"] F --> G["๐Ÿ”ง Apply Changes"]

GradientTape is like a video recorder for math. It watches every calculation, then can โ€œrewindโ€ to figure out how each weight affected the final answer.

Full Example with Metrics

# Setup
optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
train_acc = tf.keras.metrics.SparseCategoricalAccuracy()
val_acc = tf.keras.metrics.SparseCategoricalAccuracy()

@tf.function  # Makes it faster!
def train_step(x, y):
    with tf.GradientTape() as tape:
        predictions = model(x, training=True)
        loss = loss_fn(y, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(
        zip(gradients, model.trainable_variables)
    )

    train_acc.update_state(y, predictions)
    return loss

# Training loop
for epoch in range(10):
    train_acc.reset_states()

    for x_batch, y_batch in train_dataset:
        loss = train_step(x_batch, y_batch)

    print(f"Epoch {epoch+1}")
    print(f"  Loss: {loss:.4f}")
    print(f"  Accuracy: {train_acc.result():.2%}")

Adding Validation

@tf.function
def test_step(x, y):
    predictions = model(x, training=False)
    loss = loss_fn(y, predictions)
    val_acc.update_state(y, predictions)
    return loss

# In your training loop:
for epoch in range(10):
    # Training
    for x_batch, y_batch in train_dataset:
        train_step(x_batch, y_batch)

    # Validation
    val_acc.reset_states()
    for x_batch, y_batch in val_dataset:
        test_step(x_batch, y_batch)

    print(f"Val Accuracy: {val_acc.result():.2%}")

๐ŸŽฏ Quick Reference: When to Use What

Situation Use
Normal training model.fit()
Need auto-save/early stop model.fit() + callbacks
Custom logging Custom callback
Complex training logic Custom training loop
Research / experiments Custom training loop
Multi-GPU / distributed Custom training loop

๐ŸŒŸ You Did It!

Youโ€™ve learned the complete training toolkit for TensorFlow:

โœ… fit() โ€” The easy button for training โœ… Training Configuration โ€” Optimizer, loss, and metrics โœ… Built-in Callbacks โ€” Automatic helpers โœ… Custom Callbacks โ€” Build your own helpers โœ… Custom Training Loops โ€” Total control

Now your neural network can learn anything you teach it. Like a robot dog that finally knows all the tricks! ๐Ÿ•๐ŸŽ“

Remember: Start simple with fit(). Add callbacks when needed. Only use custom loops when you need complete control!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.