Recurrent Neural Networks

Back

Loading concept...

Sequence Models: Recurrent Neural Networks (RNNs)

The Memory Machine Story

Imagine you’re reading a book. Each word you read makes sense because you remember what came before. You don’t forget the beginning of a sentence by the time you reach the end!

Regular neural networks have no memory. They see one thing, make a decision, and forget everything. Like a goldfish!

RNNs are different. They remember. They carry a little “memory bag” from one step to the next. That’s what makes them special for understanding sequences—like sentences, music, or stock prices.


RNN Architecture Overview

What Does an RNN Look Like?

Think of an RNN as a conveyor belt with a worker.

  • Each item (like a word) comes along the belt
  • The worker looks at the item
  • The worker also remembers what they saw before
  • They make a decision based on BOTH
graph TD A["Word 1: I"] --> B["RNN Cell"] B --> C["Hidden State h1"] C --> D["Word 2: love"] D --> E["RNN Cell"] E --> F["Hidden State h2"] F --> G["Word 3: cats"] G --> H["RNN Cell"] H --> I["Output: Positive!"]

The Basic Formula

At each step, the RNN does this:

New Memory = f(Old Memory + New Input)

That’s it! Simple but powerful.

Real Example:

  • You hear “I” → Brain thinks: “Subject starting”
  • You hear “love” → Brain thinks: “Oh, positive emotion!”
  • You hear “cats” → Brain thinks: “Complete thought: person loves cats!”

Sequence Modeling Concept

Why Sequences Matter

Not everything in life happens at once. Things happen one after another:

  • Words in a sentence
  • Notes in a song
  • Prices over days
  • Steps in a recipe

The order matters!

“Dog bites man” vs “Man bites dog” — same words, totally different meaning!

What is Sequence Modeling?

Sequence modeling is teaching a computer to understand patterns that unfold over time.

graph TD A["Input Sequence"] --> B["Process Step 1"] B --> C["Process Step 2"] C --> D["Process Step 3"] D --> E["Output/Prediction"]

Everyday Examples:

Task Input Sequence Output
Translation English words French words
Autocomplete “How are…” “you?”
Weather Past 7 days Tomorrow’s forecast
Music Notes so far Next note

The Key Insight

To predict what comes next, you need to know what came before. An RNN’s job is to compress all the past into a useful summary.


Hidden State: The Memory Bag

What is a Hidden State?

The hidden state is the RNN’s memory. It’s like a secret note the RNN passes to itself.

Imagine passing notes in class:

  1. Friend 1 writes: “Party at 5”
  2. Passes note to Friend 2
  3. Friend 2 adds: “Bring snacks”
  4. Passes to Friend 3
  5. Friend 3 adds: “I’ll bring music”

The note grows with each person. That’s the hidden state!

How It Works

h_t = tanh(W_h * h_{t-1} + W_x * x_t)

In plain English:

  • h_t = new hidden state (new memory)
  • h_{t-1} = previous hidden state (old memory)
  • x_t = current input (what you see now)
  • W = weights (what’s important to remember)
  • tanh = squashes everything nicely

Visual Example

Reading: “The cat sat on the ___”

Step Word Hidden State (Memory)
1 “The” “Something’s starting”
2 “cat” “A cat is involved”
3 “sat” “Cat is sitting”
4 “on” “Sitting on something”
5 “the” “Location coming…”
? ??? Probably “mat”!

The hidden state carries forward everything needed to predict “mat”.


The Vanishing Gradient Problem

The Fading Memory Problem

Here’s where RNNs struggle. Imagine that note-passing game, but across 100 friends. By the time it reaches the end, the original message might be completely forgotten!

This is the vanishing gradient problem.

What Happens?

When training an RNN, we send error signals backward through time. But with each step back, the signal gets weaker. Like a whisper getting quieter and quieter.

graph TD A["Error Signal 100%"] --> B["Step -1: 50%"] B --> C["Step -2: 25%"] C --> D["Step -3: 12.5%"] D --> E["Step -4: 6.25%"] E --> F["Step -5: Nearly 0!"]

Why Does This Happen?

Math time (simple version):

  • Gradients are multiplied at each step
  • If the multiplier is < 1, things shrink
  • After many steps: tiny × tiny × tiny = basically zero

Real Impact:

  • The RNN can’t learn from things far in the past
  • Long-term dependencies are lost
  • “I grew up in France… I speak fluent ___” → RNN forgets “France”!

The Core Problem

Sequence Length Memory Quality
5 words Great!
10 words Good
50 words Struggling
100+ words Almost blind to early words

The Exploding Gradient Problem

The Opposite Disaster

If vanishing gradients are a whisper fading away, exploding gradients are a whisper turning into a SCREAM!

Instead of the error signal getting too small, it gets way too big.

What Happens?

graph TD A["Error Signal 1"] --> B["Step -1: 2"] B --> C["Step -2: 4"] C --> D["Step -3: 8"] D --> E["Step -4: 16"] E --> F["Step -5: Infinity!"]

Numbers explode. The model goes crazy. Weights become astronomically large or turn into “NaN” (not a number).

Symptoms

  • Training loss suddenly jumps to infinity
  • Model weights become huge random numbers
  • The network is “broken”

Why Does This Happen?

Same math, different numbers:

  • If the gradient multiplier is > 1
  • After many steps: big × big × big = HUGE
  • Eventually: numbers overflow

A Simple Fix: Gradient Clipping

Unlike vanishing gradients (which are harder to fix), exploding gradients have a simple solution:

Just put a limit!

if gradient > max_value:
    gradient = max_value

It’s like putting a speed limiter on a car. No matter how hard you press the pedal, you can’t go faster than the limit.


Quick Comparison

Problem What Happens Effect Solution
Vanishing Gradients → 0 Forgets distant past LSTM, GRU (special architectures)
Exploding Gradients → ∞ Training breaks Gradient clipping

The Big Picture

RNNs are powerful because they have memory. But that memory comes with challenges:

  1. Architecture: Loop structure lets information persist
  2. Sequences: Perfect for ordered data
  3. Hidden State: The “memory bag” carrying context
  4. Vanishing Gradients: Long-term memory fades
  5. Exploding Gradients: Training can blow up

The Next Step:

Scientists invented LSTM and GRU networks to fix the vanishing gradient problem. They’re like RNNs with better memory systems — like upgrading from a small notepad to a filing cabinet!


Remember This

RNNs are neural networks with a loop.

They pass a hidden state from step to step, giving them memory. But that memory can fade (vanishing gradients) or go wild (exploding gradients).

The hidden state is the heart of an RNN — it’s what makes sequences understandable.


You now understand how machines learn to read, listen, and predict sequences. Not bad for a few minutes of reading!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.