🔍 Handling Special Values in NumPy
When Numbers Go Wild!
🎭 The Story: The Detective’s Mystery Box
Imagine you’re a detective, and someone gives you a mystery box full of numbers. Most numbers are normal—like 5, 10, 3.14. But some numbers are weird:
- 🕳️ Empty slots where numbers should be (like missing puzzle pieces)
- ♾️ Infinity (numbers so big they never end!)
- 🚫 “Not a Number” (NaN) — things that broke and can’t be fixed
Your job? Find these weird values and handle them like a pro!
📦 What Are “Special Values”?
In the world of numbers, most values are normal. But computers sometimes create special values when things go wrong:
| Special Value | What It Means | Real-Life Example |
|---|---|---|
np.nan |
Not a Number | A survey question left blank |
np.inf |
Positive Infinity | Dividing by zero (1/0) |
-np.inf |
Negative Infinity | The opposite direction |
Think of it like this:
- Normal numbers = People at a party 🎉
- NaN = Ghost at the party 👻 (you can see it, but it’s not really there)
- Infinity = Someone who never stops talking ♾️
🔎 Part 1: Checking for Special Values
Finding the Ghosts (NaN Detection)
import numpy as np
# Create an array with some ghosts (NaN)
data = np.array([1, 2, np.nan, 4, np.nan])
# Find where the ghosts are hiding
ghosts = np.isnan(data)
print(ghosts)
# Output: [False False True False True]
What happened?
np.isnan()is like a ghost detector 👻- It returns
Truewhere NaN is hiding - Returns
Falsefor normal numbers
Finding Infinity
# Create an array with infinity
data = np.array([1, np.inf, -np.inf, 4])
# Find positive infinity
has_inf = np.isinf(data)
print(has_inf)
# Output: [False True True False]
# Find ONLY positive infinity
pos_inf = np.isposinf(data)
print(pos_inf)
# Output: [False True False False]
# Find ONLY negative infinity
neg_inf = np.isneginf(data)
print(neg_inf)
# Output: [False False True False]
The “Are You Normal?” Check
# Check if values are "finite" (not NaN, not Inf)
data = np.array([1, np.nan, np.inf, 4])
normal_ones = np.isfinite(data)
print(normal_ones)
# Output: [True False False True]
np.isfinite() = Your “normal number” detector!
🧹 Part 2: Handling NaN Values
NaN values are like holes in your data. Here’s how to deal with them:
Strategy 1: Count the Ghosts
data = np.array([1, np.nan, 3, np.nan, 5])
# How many ghosts do we have?
ghost_count = np.sum(np.isnan(data))
print(f"Found {ghost_count} ghosts!")
# Output: Found 2 ghosts!
Strategy 2: Replace the Ghosts
data = np.array([1, np.nan, 3, np.nan, 5])
# Replace ghosts with 0
clean_data = np.where(np.isnan(data), 0, data)
print(clean_data)
# Output: [1. 0. 3. 0. 5.]
Strategy 3: Remove the Ghosts Entirely
data = np.array([1, np.nan, 3, np.nan, 5])
# Keep only the normal numbers
no_ghosts = data[~np.isnan(data)]
print(no_ghosts)
# Output: [1. 3. 5.]
The ~ symbol means “NOT” — so ~np.isnan(data) means “NOT a ghost”!
🧮 Part 3: NaN-Safe Aggregations
The Problem: NaN Ruins Everything!
data = np.array([10, 20, np.nan, 40])
# Regular sum includes the ghost
regular_sum = np.sum(data)
print(regular_sum)
# Output: nan 😱 One ghost ruined it all!
The Solution: NaN-Safe Functions!
NumPy has special “nan-safe” versions that ignore the ghosts:
data = np.array([10, 20, np.nan, 40])
# NaN-safe sum (ignores ghosts)
safe_sum = np.nansum(data)
print(safe_sum) # Output: 70.0 ✅
# NaN-safe mean
safe_mean = np.nanmean(data)
print(safe_mean) # Output: 23.33... ✅
# NaN-safe max
safe_max = np.nanmax(data)
print(safe_max) # Output: 40.0 ✅
# NaN-safe min
safe_min = np.nanmin(data)
print(safe_min) # Output: 10.0 ✅
# NaN-safe standard deviation
safe_std = np.nanstd(data)
print(safe_std) # Output: 12.47... ✅
🎯 The Complete NaN-Safe Family
| Regular Function | NaN-Safe Version | What It Does |
|---|---|---|
np.sum() |
np.nansum() |
Add all numbers |
np.mean() |
np.nanmean() |
Find average |
np.max() |
np.nanmax() |
Find biggest |
np.min() |
np.nanmin() |
Find smallest |
np.std() |
np.nanstd() |
Find spread |
np.var() |
np.nanvar() |
Find variance |
np.median() |
np.nanmedian() |
Find middle |
📏 Part 4: Floating Point Limits
What’s the Biggest Number a Computer Can Handle?
Computers have limits! NumPy tells us these limits:
# Get info about float64 (default)
info = np.finfo(np.float64)
print(f"Biggest number: {info.max}")
print(f"Smallest positive: {info.tiny}")
print(f"Smallest difference: {info.eps}")
Output:
Biggest number: 1.7976931348623157e+308
Smallest positive: 2.2250738585072014e-308
Smallest difference: 2.220446049250313e-16
Understanding the Limits
# Machine epsilon - the smallest "step" between numbers
eps = np.finfo(float).eps
print(f"Epsilon: {eps}")
# This is True because the difference is too tiny!
print(1.0 + eps/2 == 1.0) # True 😮
print(1.0 + eps == 1.0) # False ✅
The Key Numbers to Know
info = np.finfo(np.float64)
# 1. Maximum value (overflow beyond this)
print(info.max) # ~1.8e+308
# 2. Minimum positive value (underflow below this)
print(info.tiny) # ~2.2e-308
# 3. Machine epsilon (precision limit)
print(info.eps) # ~2.2e-16
# 4. Number of decimal digits of precision
print(info.precision) # 15
What Happens at the Limits?
# Going beyond max creates infinity!
big = np.float64(1e308)
too_big = big * 10
print(too_big) # inf
# Really tiny numbers become zero
tiny = np.float64(1e-308)
too_tiny = tiny / 1e100
print(too_tiny) # 0.0 (underflow)
🎨 Visual Summary
graph LR A["Special Values"] --> B["NaN"] A --> C["Infinity"] A --> D["Limits"] B --> B1["np.isnan - Find NaN"] B --> B2["np.nansum - Safe math"] B --> B3["Replace/Remove NaN"] C --> C1["np.isinf - Find infinity"] C --> C2["np.isposinf - Positive only"] C --> C3["np.isneginf - Negative only"] D --> D1["np.finfo - Get limits"] D --> D2["max - Biggest number"] D --> D3["eps - Precision"]
🚀 Quick Reference
Detection Functions
np.isnan(x) # Find NaN
np.isinf(x) # Find infinity
np.isfinite(x) # Find normal numbers
np.isposinf(x) # Find +infinity
np.isneginf(x) # Find -infinity
NaN-Safe Math
np.nansum(x) # Sum ignoring NaN
np.nanmean(x) # Mean ignoring NaN
np.nanmax(x) # Max ignoring NaN
np.nanmin(x) # Min ignoring NaN
np.nanstd(x) # Std ignoring NaN
Float Limits
np.finfo(float).max # Biggest number
np.finfo(float).eps # Smallest step
np.finfo(float).tiny # Smallest positive
🎯 Key Takeaways
- NaN is sneaky — it spreads like a virus in calculations!
- Use nan-safe functions — they ignore the ghosts
- Check your data first —
np.isnan()is your friend - Computers have limits — know them with
np.finfo() - Infinity happens — when you go beyond limits
Remember: Good data scientists always check for special values FIRST! 🔍
