Time Series Analysis in Pandas: Your Time-Traveling Adventure!
Imagine youâre a weather wizard with a magical diary. Every day, you write down the temperature. Now, what if you could:
- See the average temperature each week instead of every day?
- Smooth out the bumpy numbers to see trends?
- Peek into the future or look back in time?
Thatâs exactly what Time Series Analysis does! Letâs go on this adventure together.
What is Time Series Data?
Time series data is like a diary with dates. Each entry has:
- A date/time (when it happened)
- A value (what happened)
import pandas as pd
# Your weather diary!
dates = pd.date_range('2024-01-01',
periods=10, freq='D')
temps = [22, 24, 23, 25, 27, 26, 28, 30, 29, 31]
weather = pd.Series(temps, index=dates)
print(weather)
Output:
2024-01-01 22
2024-01-02 24
2024-01-03 23
...
2024-01-10 31
1. Resampling Time Series
What is Resampling?
Think of resampling like changing the zoom level on a map.
- Zoomed in = daily data (lots of detail)
- Zoomed out = monthly data (big picture)
graph TD A["Daily Data"] --> B{Resample} B --> C["Weekly Summary"] B --> D["Monthly Summary"] B --> E["Yearly Summary"]
The Magic Word: .resample()
# Change daily temps to weekly
weekly_temps = weather.resample('W').mean()
print(weekly_temps)
The âWâ means âWeekâ! Other options:
'D'= Day'M'= Month'H'= Hour'Y'= Year
2. Resample with Aggregation
What is Aggregation?
Aggregation is combining many values into one. Like counting all your candies and saying âI have 50 candiesâ instead of listing each one!
# Different ways to combine weekly data
weekly_max = weather.resample('W').max() # Hottest day
weekly_min = weather.resample('W').min() # Coldest day
weekly_sum = weather.resample('W').sum() # Total
weekly_count = weather.resample('W').count() # How many days
Multiple Aggregations at Once
# Get many stats at once!
weekly_stats = weather.resample('W').agg([
'mean', 'min', 'max', 'std'
])
print(weekly_stats)
This gives you a report card for each week!
3. Downsampling
What is Downsampling?
Downsampling is like making a summary. You start with LOTS of data and end with LESS data.
Story Time: You have a diary with notes every hour. Your mom asks, âHow was your day?â You donât read 24 entries - you give a summary!
graph TD A["Hourly Data: 24 points"] --> B["Downsample"] B --> C["Daily Summary: 1 point"] style A fill:#ffcccc style C fill:#ccffcc
Example: Hourly to Daily
# Create hourly temperature data
hours = pd.date_range('2024-01-01',
periods=48, freq='H')
hourly_temps = pd.Series(
[20 + i % 10 for i in range(48)],
index=hours
)
# Downsample to daily average
daily_avg = hourly_temps.resample('D').mean()
print(daily_avg)
Result: 48 hourly readings become 2 daily averages!
4. Upsampling
What is Upsampling?
Upsampling is the opposite of downsampling. You start with LESS data and create MORE data.
Story Time: You only measured temperature at noon. But you want to guess the temperature at 3 PM too!
graph TD A["Daily Data: 7 points"] --> B["Upsample"] B --> C["Hourly Data: 168 points"] style A fill:#ccffcc style C fill:#ffcccc
Example: Daily to Hourly
# Daily temperatures
daily = pd.Series([20, 25, 22],
index=pd.date_range('2024-01-01',
periods=3, freq='D'))
# Upsample to hourly
hourly = daily.resample('H').asfreq()
print(hourly.head(10))
Uh oh! We get NaN (empty values) for the new hours!
Filling the Gaps
# Forward fill: use previous value
hourly_ffill = daily.resample('H').ffill()
# Interpolate: smooth guess between values
hourly_interp = daily.resample('H').interpolate()
| Method | What it does |
|---|---|
.ffill() |
Copy the last known value forward |
.bfill() |
Copy the next known value backward |
.interpolate() |
Draw a smooth line between values |
5. Rolling Window Calculations
What is a Rolling Window?
Imagine looking through a sliding window at your data. The window shows only a few values at a time, and it slides forward!
Story Time: You want to know âWhat was the average temperature for the LAST 3 days?â As each new day arrives, you forget the oldest day and include the new one!
graph TD A["Day 1, 2, 3"] --> B["Average = 23"] C["Day 2, 3, 4"] --> D["Average = 24"] E["Day 3, 4, 5"] --> F["Average = 25"]
The Magic: .rolling()
# 3-day rolling average
temps = pd.Series([22, 24, 23, 25, 27, 26, 28])
rolling_avg = temps.rolling(window=3).mean()
print(rolling_avg)
Output:
0 NaN # Not enough data yet
1 NaN # Still not enough
2 23.00 # (22+24+23)/3
3 24.00 # (24+23+25)/3
4 25.00 # (23+25+27)/3
...
The first two values are NaN because we need 3 days to calculate!
6. Rolling Window Aggregations
Different Rolling Calculations
Just like resampling, you can apply different calculations to your rolling window!
temps = pd.Series([22, 24, 23, 25, 27, 26, 28, 30])
# Different rolling stats
rolling_sum = temps.rolling(3).sum()
rolling_max = temps.rolling(3).max()
rolling_min = temps.rolling(3).min()
rolling_std = temps.rolling(3).std()
Real-World Example: Stock Prices
# 7-day moving average (smooth the bumps!)
stock_prices = pd.Series([100, 102, 98, 105,
103, 107, 110, 108])
smooth_price = stock_prices.rolling(3).mean()
This helps you see the trend without the daily noise!
7. Expanding Window Calculations
What is Expanding Window?
Unlike rolling (fixed window), expanding includes ALL data from the start up to now!
Story Time: On day 1, you calculate the average of 1 day. On day 5, you calculate the average of ALL 5 days. The window keeps expanding!
graph TD A["Day 1"] --> B["Avg of 1 day"] C["Day 1-2"] --> D["Avg of 2 days"] E["Day 1-3"] --> F["Avg of 3 days"] G["Day 1-4"] --> H["Avg of 4 days"]
Example
temps = pd.Series([22, 24, 23, 25, 27])
# Running average (all data so far)
expanding_avg = temps.expanding().mean()
print(expanding_avg)
Output:
0 22.00 # 22/1
1 23.00 # (22+24)/2
2 23.00 # (22+24+23)/3
3 23.50 # (22+24+23+25)/4
4 24.20 # (22+24+23+25+27)/5
Rolling vs Expanding
| Rolling | Expanding |
|---|---|
| Fixed window size | Growing window |
| Forgets old data | Remembers all data |
| Good for recent trends | Good for cumulative stats |
8. Exponential Weighted Functions
What are Exponential Weights?
Some values matter MORE than others! Recent data is more important than old data.
Story Time: If someone asks âHow are you feeling?â, yesterdayâs feelings matter more than last monthâs feelings!
The Magic: .ewm()
temps = pd.Series([22, 24, 23, 25, 27, 26, 28])
# Exponentially weighted average
# span=3 means: focus mostly on last 3 values
ewm_avg = temps.ewm(span=3).mean()
print(ewm_avg)
Parameters Explained
| Parameter | What it means |
|---|---|
span |
How many periods to focus on |
alpha |
Smoothing factor (0 to 1) |
halflife |
Time for weight to drop to half |
# Different ways to set the weighting
temps.ewm(span=3).mean() # Using span
temps.ewm(alpha=0.5).mean() # Using alpha
temps.ewm(halflife=2).mean() # Using halflife
Higher alpha = more weight on recent data!
9. Shifting Data
What is Shifting?
Shifting moves your data forward or backward in time. Itâs like time travel for your data!
Story Time: âWhat was the temperature YESTERDAY?â Thatâs a shift of 1 day backward!
graph LR A["Today: 25"] --> B["Yesterday: 24"] B --> C["Day Before: 23"]
Example: .shift()
temps = pd.Series([22, 24, 23, 25, 27],
index=pd.date_range('2024-01-01',
periods=5, freq='D'))
# Shift forward (lag)
yesterday = temps.shift(1)
print(yesterday)
Output:
2024-01-01 NaN # No yesterday!
2024-01-02 22.0 # Was 22 on Jan 1
2024-01-03 24.0 # Was 24 on Jan 2
2024-01-04 23.0
2024-01-05 25.0
Practical Use: Calculate Change
# How much did temperature change each day?
temp_change = temps - temps.shift(1)
print(temp_change)
Output:
2024-01-01 NaN
2024-01-02 2.0 # 24 - 22 = +2
2024-01-03 -1.0 # 23 - 24 = -1
2024-01-04 2.0 # 25 - 23 = +2
2024-01-05 2.0 # 27 - 25 = +2
Shift Backward (Future Peek)
# Peek at TOMORROW's temperature
tomorrow = temps.shift(-1)
Negative shift = look into the future!
Quick Reference Table
| Task | Method | Example |
|---|---|---|
| Change frequency | .resample() |
df.resample('W').mean() |
| Combine values | .agg() |
df.resample('M').agg(['sum', 'mean']) |
| Reduce data | Downsample | hourly.resample('D').mean() |
| Increase data | Upsample | daily.resample('H').interpolate() |
| Moving average | .rolling() |
df.rolling(7).mean() |
| Cumulative stat | .expanding() |
df.expanding().sum() |
| Weighted average | .ewm() |
df.ewm(span=10).mean() |
| Time travel | .shift() |
df.shift(1) |
Your Time Series Superpowers!
Youâve just learned how to:
- Resample - Change the zoom level of your data
- Aggregate - Combine values in smart ways
- Downsample - Summarize detailed data
- Upsample - Fill in missing details
- Rolling - Calculate moving statistics
- Expanding - Track cumulative growth
- EWM - Give more weight to recent data
- Shift - Compare with past or future values
Now go analyze some time series data like a pro! Youâve got this!
