🎯 NumPy: The Number Detective’s Toolkit
The Story of the Number Detective
Imagine you’re a detective with a giant box of colorful marbles. Each marble has a number written on it. Your job? Find secrets hidden in those numbers!
NumPy is your magnifying glass—it helps you discover things like:
- Which marble has the biggest number?
- What’s the average of all marbles?
- How do numbers add up as you go?
Let’s become number detectives together! 🔍
🧮 Aggregation Functions: Squishing Many Into One
What’s aggregation? It’s like squishing a whole bag of gummy bears into one super gummy bear that tells you something about ALL of them.
The Big Three: Sum, Mean, Product
import numpy as np
marbles = np.array([2, 4, 6, 8])
# Sum: Add all marbles together
total = np.sum(marbles) # 20
# Mean: The "fair share" number
average = np.mean(marbles) # 5.0
# Product: Multiply all together
multiplied = np.prod(marbles) # 384
Think of it this way:
- Sum = How many candies if you dump all bags together?
- Mean = If everyone gets equal candies, how many each?
- Product = What if each number was a multiplier in a game?
🏆 Min, Max & Arg Functions: Finding Champions
Finding the Biggest and Smallest
scores = np.array([85, 92, 78, 96, 88])
# Who got the highest score?
best = np.max(scores) # 96
# Who got the lowest?
lowest = np.min(scores) # 78
But WHERE Are They? (Arg Functions!)
Sometimes you don’t just want the value—you want the position!
# WHERE is the highest score?
best_position = np.argmax(scores) # 3
# WHERE is the lowest score?
worst_position = np.argmin(scores) # 2
Real-life example: In a race, max() tells you the winning time. argmax() tells you which runner won!
📈 Cumulative Operations: Running Totals
Imagine you’re counting coins as you find them, one by one. That’s cumulative!
Cumulative Sum
coins = np.array([5, 3, 7, 2])
running_total = np.cumsum(coins)
# [5, 8, 15, 17]
# 5 → 5+3=8 → 8+7=15 → 15+2=17
Cumulative Product
multipliers = np.array([2, 3, 2])
running_product = np.cumprod(multipliers)
# [2, 6, 12]
# 2 → 2×3=6 → 6×2=12
Like a snowball rolling downhill—it keeps getting bigger!
🎚️ The Axis Parameter: Rows vs Columns
When your numbers live in a grid (2D array), you can squish them in different directions!
grid = np.array([
[1, 2, 3],
[4, 5, 6]
])
graph TD A[axis=0<br>Squish DOWN<br>columns] --> B["Sum: [5, 7, 9]"] C[axis=1<br>Squish RIGHT<br>rows] --> D["Sum: [6, 15]"] E[No axis<br>Squish ALL] --> F["Sum: 21"]
# Sum down columns (axis=0)
np.sum(grid, axis=0) # [5, 7, 9]
# Sum across rows (axis=1)
np.sum(grid, axis=1) # [6, 15]
# Sum everything (no axis)
np.sum(grid) # 21
Memory trick:
axis=0→ Vertical crush (columns survive)axis=1→ Horizontal crush (rows survive)
📐 Keepdims Parameter: Keeping Shape
When you squish numbers, your array gets smaller. But sometimes you need it to stay the same shape!
grid = np.array([
[1, 2, 3],
[4, 5, 6]
])
# Without keepdims: shape becomes (2,)
row_sums = np.sum(grid, axis=1)
# [6, 15] - just a flat list
# With keepdims: shape stays (2, 1)
row_sums_kept = np.sum(grid, axis=1,
keepdims=True)
# [[6],
# [15]] - still looks like a column!
Why does this matter? When you want to do math between the result and the original grid, shapes need to match!
📊 Statistical Functions: Understanding Your Data
Standard Deviation & Variance
These tell you how spread out your numbers are.
test_scores = np.array([70, 75, 80, 85, 90])
# Variance: Average of squared differences
variance = np.var(test_scores) # 50.0
# Std Dev: Square root of variance
std_dev = np.std(test_scores) # 7.07
Analogy: Imagine kids standing in a line.
- Low std dev = Everyone’s about the same height
- High std dev = Some are very short, some very tall
Other Helpful Stats
data = np.array([3, 1, 4, 1, 5, 9, 2, 6])
np.median(data) # 3.5 (middle value)
np.ptp(data) # 8 (peak-to-peak range)
📏 Percentiles & Quantiles: Dividing the Data
Percentiles
“What value is bigger than X% of the data?”
scores = np.array([10, 20, 30, 40, 50,
60, 70, 80, 90, 100])
# 50th percentile = median
np.percentile(scores, 50) # 55.0
# 25th percentile = first quarter
np.percentile(scores, 25) # 32.5
# 90th percentile = top 10%
np.percentile(scores, 90) # 91.0
Quantiles
Same idea, but using 0-1 instead of 0-100:
# 0.5 quantile = 50th percentile
np.quantile(scores, 0.5) # 55.0
# 0.25 quantile = 25th percentile
np.quantile(scores, 0.25) # 32.5
Real-world use: “You scored in the 95th percentile!” means you did better than 95% of people.
⚖️ Weighted Average: Not All Numbers Are Equal
Sometimes certain values matter more than others!
Regular Average vs Weighted Average
grades = np.array([90, 80, 70])
# Regular average: all equal
np.mean(grades) # 80.0
# Weighted: some count more!
weights = np.array([3, 2, 1])
# 90 counts 3x, 80 counts 2x, 70 counts 1x
np.average(grades, weights=weights) # 83.33
Real-life example:
- Homework = 20% of grade
- Tests = 50% of grade
- Final = 30% of grade
scores = np.array([85, 78, 92])
weights = np.array([0.2, 0.5, 0.3])
final_grade = np.average(scores,
weights=weights)
# 82.6
🎮 Quick Reference Card
| Function | What It Does | Example |
|---|---|---|
np.sum() |
Add all | [1,2,3] → 6 |
np.mean() |
Average | [2,4,6] → 4 |
np.max() |
Biggest | [5,9,3] → 9 |
np.argmax() |
Position of biggest | [5,9,3] → 1 |
np.cumsum() |
Running total | [1,2,3] → [1,3,6] |
np.std() |
Spread | Measures variation |
np.percentile() |
Ranking position | Top X% |
np.average() |
Weighted mean | With importance |
🚀 You’re Now a Number Detective!
You’ve learned to:
- ✅ Squish numbers with aggregations
- ✅ Find champions with min/max
- ✅ Track running totals with cumsum
- ✅ Control direction with axis
- ✅ Keep shapes with keepdims
- ✅ Measure spread with statistics
- ✅ Rank with percentiles
- ✅ Weigh with averages
Every data scientist uses these tools every single day. Now you know them too!
Next step: Practice! Try these on your own arrays and see what secrets you can discover. 🔍✨