What is a Recurrent Neural Network (RNN)?

An RNN is a neural network with a loop that carries memory from step to step. It processes sequences by remembering what came before.

What is a hidden state in RNNs?

The hidden state is the RNN's memory. It passes information from one step to the next, like a note being passed and updated by each person.

What is the vanishing gradient problem?

Error signals get weaker as they travel back through time during training. After many steps, the signal nearly disappears, preventing long-term learning.

Recurrent Neural Networks | Machine Learning

Sequence Models: Recurrent Neural Networks (RNNs)

The Memory Machine Story

Imagine you’re reading a book. Each word you read makes sense because you remember what came before. You don’t forget the beginning of a sentence by the time you reach the end!

Regular neural networks have no memory. They see one thing, make a decision, and forget everything. Like a goldfish!

RNNs are different. They remember. They carry a little “memory bag” from one step to the next. That’s what makes them special for understanding sequences—like sentences, music, or stock prices.

RNN Architecture Overview

What Does an RNN Look Like?

Think of an RNN as a conveyor belt with a worker.

Each item (like a word) comes along the belt
The worker looks at the item
The worker also remembers what they saw before
They make a decision based on BOTH

graph TD
    A["Word 1: I"] --> B["RNN Cell"]
    B --> C["Hidden State h1"]
    C --> D["Word 2: love"]
    D --> E["RNN Cell"]
    E --> F["Hidden State h2"]
    F --> G["Word 3: cats"]
    G --> H["RNN Cell"]
    H --> I["Output: Positive!"]

The Basic Formula

At each step, the RNN does this:

New Memory = f(Old Memory + New Input)

That’s it! Simple but powerful.

Real Example:

You hear “I” → Brain thinks: “Subject starting”
You hear “love” → Brain thinks: “Oh, positive emotion!”
You hear “cats” → Brain thinks: “Complete thought: person loves cats!”

Sequence Modeling Concept

Why Sequences Matter

Not everything in life happens at once. Things happen one after another:

Words in a sentence
Notes in a song
Prices over days
Steps in a recipe

The order matters!

“Dog bites man” vs “Man bites dog” — same words, totally different meaning!

What is Sequence Modeling?

Sequence modeling is teaching a computer to understand patterns that unfold over time.

graph TD
    A["Input Sequence"] --> B["Process Step 1"]
    B --> C["Process Step 2"]
    C --> D["Process Step 3"]
    D --> E["Output/Prediction"]

Everyday Examples:

Task	Input Sequence	Output
Translation	English words	French words
Autocomplete	“How are…”	“you?”
Weather	Past 7 days	Tomorrow’s forecast
Music	Notes so far	Next note

The Key Insight

To predict what comes next, you need to know what came before. An RNN’s job is to compress all the past into a useful summary.

Hidden State: The Memory Bag

What is a Hidden State?

The hidden state is the RNN’s memory. It’s like a secret note the RNN passes to itself.

Imagine passing notes in class:

Friend 1 writes: “Party at 5”
Passes note to Friend 2
Friend 2 adds: “Bring snacks”
Passes to Friend 3
Friend 3 adds: “I’ll bring music”

The note grows with each person. That’s the hidden state!

How It Works

h_t = tanh(W_h * h_{t-1} + W_x * x_t)

In plain English:

h_t = new hidden state (new memory)
h_{t-1} = previous hidden state (old memory)
x_t = current input (what you see now)
W = weights (what’s important to remember)
tanh = squashes everything nicely

Visual Example

Reading: “The cat sat on the ___”

Step	Word	Hidden State (Memory)
1	“The”	“Something’s starting”
2	“cat”	“A cat is involved”
3	“sat”	“Cat is sitting”
4	“on”	“Sitting on something”
5	“the”	“Location coming…”
?	???	Probably “mat”!

The hidden state carries forward everything needed to predict “mat”.

The Vanishing Gradient Problem

The Fading Memory Problem

Here’s where RNNs struggle. Imagine that note-passing game, but across 100 friends. By the time it reaches the end, the original message might be completely forgotten!

This is the vanishing gradient problem.

What Happens?

When training an RNN, we send error signals backward through time. But with each step back, the signal gets weaker. Like a whisper getting quieter and quieter.

graph TD
    A["Error Signal 100%"] --> B["Step -1: 50%"]
    B --> C["Step -2: 25%"]
    C --> D["Step -3: 12.5%"]
    D --> E["Step -4: 6.25%"]
    E --> F["Step -5: Nearly 0!"]

Why Does This Happen?

Math time (simple version):

Gradients are multiplied at each step
If the multiplier is < 1, things shrink
After many steps: tiny × tiny × tiny = basically zero

Real Impact:

The RNN can’t learn from things far in the past
Long-term dependencies are lost
“I grew up in France… I speak fluent ___” → RNN forgets “France”!

The Core Problem

Sequence Length	Memory Quality
5 words	Great!
10 words	Good
50 words	Struggling
100+ words	Almost blind to early words

The Exploding Gradient Problem

The Opposite Disaster

If vanishing gradients are a whisper fading away, exploding gradients are a whisper turning into a SCREAM!

Instead of the error signal getting too small, it gets way too big.

What Happens?

graph TD
    A["Error Signal 1"] --> B["Step -1: 2"]
    B --> C["Step -2: 4"]
    C --> D["Step -3: 8"]
    D --> E["Step -4: 16"]
    E --> F["Step -5: Infinity!"]

Numbers explode. The model goes crazy. Weights become astronomically large or turn into “NaN” (not a number).

Symptoms

Training loss suddenly jumps to infinity
Model weights become huge random numbers
The network is “broken”

Why Does This Happen?

Same math, different numbers:

If the gradient multiplier is > 1
After many steps: big × big × big = HUGE
Eventually: numbers overflow

A Simple Fix: Gradient Clipping

Unlike vanishing gradients (which are harder to fix), exploding gradients have a simple solution:

Just put a limit!

if gradient > max_value:
    gradient = max_value

It’s like putting a speed limiter on a car. No matter how hard you press the pedal, you can’t go faster than the limit.

Quick Comparison

Problem	What Happens	Effect	Solution
Vanishing	Gradients → 0	Forgets distant past	LSTM, GRU (special architectures)
Exploding	Gradients → ∞	Training breaks	Gradient clipping

The Big Picture

RNNs are powerful because they have memory. But that memory comes with challenges:

Architecture: Loop structure lets information persist
Sequences: Perfect for ordered data
Hidden State: The “memory bag” carrying context
Vanishing Gradients: Long-term memory fades
Exploding Gradients: Training can blow up

The Next Step:

Scientists invented LSTM and GRU networks to fix the vanishing gradient problem. They’re like RNNs with better memory systems — like upgrading from a small notepad to a filing cabinet!

Remember This

RNNs are neural networks with a loop.

They pass a hidden state from step to step, giving them memory. But that memory can fade (vanishing gradients) or go wild (exploding gradients).

The hidden state is the heart of an RNN — it’s what makes sequences understandable.

You now understand how machines learn to read, listen, and predict sequences. Not bad for a few minutes of reading!

Recurrent Neural Networks

Unable to load concept

Coming Soon...

Sequence Models: Recurrent Neural Networks (RNNs)

The Memory Machine Story

RNN Architecture Overview

What Does an RNN Look Like?

The Basic Formula

Sequence Modeling Concept

Why Sequences Matter

What is Sequence Modeling?

The Key Insight

Hidden State: The Memory Bag

What is a Hidden State?

How It Works

Visual Example

The Vanishing Gradient Problem

The Fading Memory Problem

What Happens?

Why Does This Happen?

The Core Problem

The Exploding Gradient Problem

The Opposite Disaster

What Happens?

Symptoms

Why Does This Happen?

A Simple Fix: Gradient Clipping

Quick Comparison

The Big Picture

Remember This

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue