NumPy Essentials: Your Data Superpower 🚀
Imagine you have a magical toolbox. Instead of hammers and screwdrivers, it holds powerful tools that can crunch millions of numbers in the blink of an eye. That toolbox is called NumPy—and today, you’re going to learn how to use it!
🎯 What is NumPy?
Think of NumPy like a super-organized shelf for your data.
Regular Python lists are like throwing toys into a messy box—you can store stuff, but finding and working with things is slow.
NumPy arrays are like a perfectly organized bookshelf—every item has its exact spot, and you can grab or change things lightning fast.
Why does this matter?
- Data scientists use NumPy to analyze millions of data points
- It’s 50x faster than regular Python lists
- Every data tool (Pandas, TensorFlow, etc.) is built on NumPy
📦 NumPy Arrays: Your Data Containers
What is an Array?
An array is like a row of lockers. Each locker holds one item, and each has a number (starting from 0).
import numpy as np
# Creating your first array
my_locker = np.array([10, 20, 30, 40, 50])
print(my_locker)
# Output: [10 20 30 40 50]
Creating Arrays Different Ways
1. From a regular list:
grades = np.array([85, 92, 78, 95, 88])
2. Array of zeros (empty lockers):
empty = np.zeros(5)
# Output: [0. 0. 0. 0. 0.]
3. Array of ones:
ones = np.ones(4)
# Output: [1. 1. 1. 1.]
4. Range of numbers:
counting = np.arange(1, 6)
# Output: [1 2 3 4 5]
5. Evenly spaced numbers:
even_split = np.linspace(0, 10, 5)
# Output: [0. 2.5 5. 7.5 10.]
2D Arrays: The Grid
Imagine a spreadsheet or a tic-tac-toe board. That’s a 2D array!
grid = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
graph TD A["2D Array"] --> B["Row 0: 1, 2, 3"] A --> C["Row 1: 4, 5, 6"] A --> D["Row 2: 7, 8, 9"]
Array Properties
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # (2, 3) - 2 rows, 3 columns
print(arr.size) # 6 total elements
print(arr.ndim) # 2 dimensions
print(arr.dtype) # int64 (data type)
🔢 Array Operations and Indexing
Getting Items: Indexing
Just like finding a book on a shelf, you use the position number.
Remember: Python counts from 0!
fruits = np.array(['apple', 'banana', 'cherry', 'date'])
# 0 1 2 3
print(fruits[0]) # 'apple' (first item)
print(fruits[2]) # 'cherry' (third item)
print(fruits[-1]) # 'date' (last item)
Grabbing Multiple Items: Slicing
Think of slicing like cutting a piece of cake—you decide where to start and stop.
numbers = np.array([10, 20, 30, 40, 50, 60])
print(numbers[1:4]) # [20 30 40]
print(numbers[:3]) # [10 20 30] (first 3)
print(numbers[3:]) # [40 50 60] (from index 3)
print(numbers[::2]) # [10 30 50] (every 2nd)
2D Indexing: Row and Column
grid = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
print(grid[0, 0]) # 1 (row 0, column 0)
print(grid[1, 2]) # 6 (row 1, column 2)
print(grid[0, :]) # [1 2 3] (entire row 0)
print(grid[:, 1]) # [2 5 8] (entire column 1)
graph TD A["grid[1, 2]"] --> B["Row 1"] B --> C["Column 2"] C --> D["Result: 6"]
Boolean Indexing: Smart Filtering
This is like having a magical filter that only shows items matching your rules.
scores = np.array([45, 82, 67, 91, 55, 78])
# Find all scores above 70
high_scores = scores[scores > 70]
print(high_scores) # [82 91 78]
# Find all passing scores (>= 60)
passing = scores[scores >= 60]
print(passing) # [82 67 91 78]
Array Shape Operations
arr = np.array([1, 2, 3, 4, 5, 6])
# Reshape to 2x3 grid
reshaped = arr.reshape(2, 3)
print(reshaped)
# [[1 2 3]
# [4 5 6]]
# Flatten back to 1D
flat = reshaped.flatten()
print(flat) # [1 2 3 4 5 6]
âž• NumPy Mathematical Operations
Element-wise Operations
When you do math with arrays, it happens to every element automatically. It’s like having tiny calculators in each locker!
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])
print(a + b) # [11 22 33 44]
print(a - b) # [-9 -18 -27 -36]
print(a * b) # [10 40 90 160]
print(a / b) # [0.1 0.1 0.1 0.1]
print(a ** 2) # [1 4 9 16] (squared)
Broadcasting: The Magic Stretch
NumPy can stretch smaller arrays to match bigger ones. Like magic!
arr = np.array([1, 2, 3, 4, 5])
# Add 10 to every element
print(arr + 10) # [11 12 13 14 15]
# Multiply all by 3
print(arr * 3) # [3 6 9 12 15]
Built-in Math Functions
NumPy has special math functions that work on entire arrays:
numbers = np.array([1, 4, 9, 16, 25])
print(np.sqrt(numbers)) # [1. 2. 3. 4. 5.]
print(np.square(numbers)) # [1 16 81 256 625]
print(np.log(numbers)) # Natural log
print(np.exp(numbers)) # e^x
Trigonometry
angles = np.array([0, np.pi/2, np.pi])
print(np.sin(angles)) # [0. 1. 0.]
print(np.cos(angles)) # [1. 0. -1.]
Rounding Functions
decimals = np.array([1.2, 2.7, 3.5, 4.9])
print(np.round(decimals)) # [1. 3. 4. 5.]
print(np.floor(decimals)) # [1. 2. 3. 4.]
print(np.ceil(decimals)) # [2. 3. 4. 5.]
Aggregate Functions
These functions crunch all numbers into one result:
data = np.array([5, 10, 15, 20, 25])
print(np.sum(data)) # 75 (add all)
print(np.prod(data)) # 375000 (multiply all)
print(np.min(data)) # 5
print(np.max(data)) # 25
📊 NumPy Statistical Functions
Basic Statistics
scores = np.array([72, 85, 90, 78, 95, 88, 76])
print(np.mean(scores)) # 83.43 (average)
print(np.median(scores)) # 85.0 (middle value)
print(np.std(scores)) # 7.89 (spread)
print(np.var(scores)) # 62.24 (variance)
graph TD A["Data: 72, 85, 90, 78, 95, 88, 76"] A --> B["Mean: 83.43"] A --> C["Median: 85"] A --> D["Std Dev: 7.89"] A --> E["Range: 23"]
Understanding Mean, Median, Mode
Mean = Add everything, divide by count (the “average”)
Median = The middle number when sorted (not fooled by outliers!)
Example:
salaries = np.array([30, 35, 40, 45, 1000])
print(np.mean(salaries)) # 230 (skewed by 1000!)
print(np.median(salaries)) # 40 (true middle)
Percentiles and Quartiles
data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
print(np.percentile(data, 25)) # 27.5 (25th percentile)
print(np.percentile(data, 50)) # 55.0 (median)
print(np.percentile(data, 75)) # 77.5 (75th percentile)
Min, Max, and Range
temps = np.array([68, 72, 75, 71, 69, 80, 77])
print(np.min(temps)) # 68
print(np.max(temps)) # 80
print(np.ptp(temps)) # 12 (peak-to-peak range)
print(np.argmin(temps)) # 0 (index of min)
print(np.argmax(temps)) # 5 (index of max)
Statistical Operations on 2D Arrays
grades = np.array([
[85, 90, 78], # Student 1
[92, 88, 95], # Student 2
[76, 82, 80] # Student 3
])
# Mean of all grades
print(np.mean(grades)) # 85.11
# Mean per student (across columns)
print(np.mean(grades, axis=1)) # [84.33, 91.67, 79.33]
# Mean per subject (across rows)
print(np.mean(grades, axis=0)) # [84.33, 86.67, 84.33]
Correlation
See how two things relate to each other:
study_hours = np.array([2, 3, 4, 5, 6, 7, 8])
test_scores = np.array([60, 65, 70, 75, 85, 90, 95])
correlation = np.corrcoef(study_hours, test_scores)
print(correlation[0, 1]) # 0.99 (strong positive!)
Correlation Values:
- Close to 1 = Strong positive (both go up together)
- Close to -1 = Strong negative (one up, other down)
- Close to 0 = No relationship
🎉 You Did It!
You’ve just learned the essentials of NumPy:
âś… Arrays - Your organized data containers âś… Indexing & Slicing - Finding and grabbing data âś… Math Operations - Lightning-fast calculations âś… Statistical Functions - Understanding your data
Next Steps:
- Practice with real datasets
- Explore Pandas (built on NumPy!)
- Dive into data visualization
Remember: Every data scientist started exactly where you are now. Keep practicing, and these tools will become second nature! 🌟
NumPy is your superpower for data. Now go crunch some numbers! đź’Ş
