๐ Practical NumPy: Best Practices and Tips
The Kitchen Metaphor ๐ณ
Imagine youโre a chef in a busy restaurant kitchen. You have all the right ingredients (arrays), but HOW you organize your workspace, HOW you display your dishes, and WHAT shortcuts you know makes all the difference between chaos and a Michelin star!
NumPy is your kitchen. Today, weโll learn the pro chef secrets that turn messy code into elegant, fast, bug-free masterpieces.
๐จ๏ธ Array Printing Configuration
The Problem: Information Overload!
Picture this: You have a HUGE shopping list (array) with 10,000 items. Do you want to read ALL 10,000 items? No way! You want a summary.
NumPy thinks the same way. By default, it shows you just a preview of big arrays.
The Magic Control Panel: np.set_printoptions()
import numpy as np
# Create a big array (1000 numbers)
big_array = np.arange(1000)
# Default printing - NumPy hides middle!
print(big_array)
# [ 0 1 2 ... 997 998 999]
Your Control Knobs
1. threshold - How many items before summarizing?
# Show ALL items (no matter how many)
np.set_printoptions(threshold=np.inf)
# Or show summary after 10 items
np.set_printoptions(threshold=10)
2. precision - Decimal places
arr = np.array([3.14159265, 2.71828])
np.set_printoptions(precision=2)
print(arr) # [3.14 2.72]
np.set_printoptions(precision=4)
print(arr) # [3.1416 2.7183]
3. suppress - Hide tiny decimals
tiny = np.array([0.000000001, 1.5])
np.set_printoptions(suppress=True)
print(tiny) # [0. 1.5]
# No more ugly 1e-09 notation!
4. linewidth - Characters per line
np.set_printoptions(linewidth=50)
# Arrays wrap at 50 characters
๐ฏ Real Example: Reset to Normal
# Go back to defaults anytime!
np.set_printoptions(edgeitems=3,
threshold=1000,
precision=8,
suppress=False)
๐ Ensuring Minimum Dimensions
The Problem: Shape Surprises!
Imagine youโre packing boxes. Sometimes you get a single item, sometimes a row of items, sometimes a stack of boxes. Your packing machine expects boxes of a certain shape!
# These look different to NumPy!
single = np.array(5) # shape: ()
row = np.array([1, 2, 3]) # shape: (3,)
grid = np.array([[1,2],[3,4]]) # shape: (2, 2)
The Solution: np.atleast_Nd()
Think of it as gift wrapping - adding boxes around your item until it has the right shape!
np.atleast_1d() - At least a row
x = np.array(5) # Just a number
wrapped = np.atleast_1d(x)
print(wrapped) # [5]
print(wrapped.shape) # (1,)
np.atleast_2d() - At least a table
row = np.array([1, 2, 3])
table = np.atleast_2d(row)
print(table) # [[1 2 3]]
print(table.shape) # (1, 3)
np.atleast_3d() - At least a cube
flat = np.array([[1, 2], [3, 4]])
cube = np.atleast_3d(flat)
print(cube.shape) # (2, 2, 1)
๐ฏ Why Does This Matter?
Many NumPy operations expect specific dimensions. These functions prevent broadcasting errors and keep your code safe!
def safe_process(data):
# Always works, even if data is scalar!
data = np.atleast_1d(data)
return data.sum()
โก Performance Best Practices
The Racing Track Metaphor ๐๏ธ
NumPy is like a race car. But even a race car goes slow if you:
- Drive on a bumpy road (bad memory access)
- Stop at every light (Python loops)
- Carry heavy luggage (copying data)
Letโs make your code ZOOM!
Rule 1: Avoid Python Loops ๐โก๏ธ๐
# SLOW - Python loop (like walking)
result = []
for x in range(1000000):
result.append(x * 2)
# FAST - Vectorized (like flying!)
arr = np.arange(1000000)
result = arr * 2 # 100x faster!
Rule 2: Use Views, Not Copies ๐
arr = np.array([1, 2, 3, 4, 5])
# VIEW - shares memory (fast, no copy)
view = arr[1:4]
# COPY - new memory (slower)
copy = arr[1:4].copy()
# Check if it's a view
print(view.base is arr) # True = view
print(copy.base is arr) # False = copy
Rule 3: Prefer Contiguous Arrays ๐
Arrays can be stored in two ways:
- C-order (row-major): Read leftโright, then down
- F-order (column-major): Read topโbottom, then right
# C-order is default and usually faster
arr = np.array([[1,2,3],[4,5,6]])
print(arr.flags['C_CONTIGUOUS']) # True
# Make contiguous if not
arr = np.ascontiguousarray(arr)
Rule 4: Use In-Place Operations ๐
arr = np.array([1, 2, 3])
# Creates NEW array (uses more memory)
arr = arr + 1
# Modifies IN PLACE (faster!)
arr += 1
# or
np.add(arr, 1, out=arr)
Rule 5: Pre-allocate Arrays ๐๏ธ
# BAD - Growing array is slow
result = np.array([])
for i in range(1000):
result = np.append(result, i)
# GOOD - Pre-allocate, then fill
result = np.empty(1000)
for i in range(1000):
result[i] = i
๐ฏ Performance Summary
| Technique | Speed Boost |
|---|---|
| Vectorization | 10-100x |
| Views vs Copies | 2-10x |
| Contiguous Memory | 1.5-3x |
| In-place Operations | 1.5-2x |
| Pre-allocation | 5-20x |
โ ๏ธ Common NumPy Pitfalls
The Banana Peel Moments ๐
Even expert chefs slip sometimes! Here are the traps to avoid.
Pitfall 1: The View Trap ๐ป
You change a slice, and suddenly your original array changes too!
original = np.array([1, 2, 3, 4, 5])
slice_view = original[1:4]
slice_view[0] = 999
print(original)
# [ 1 999 3 4 5] SURPRISE!
Fix: Use .copy() when you need independence!
slice_copy = original[1:4].copy()
slice_copy[0] = 888
print(original) # Still [1 999 3 4 5]
Pitfall 2: Integer Division Surprise ๐ข
# In Python 3, this is fine
print(5 / 2) # 2.5
# But in NumPy integer arrays...
arr = np.array([5, 7, 9])
result = arr / 2 # Now it's float!
# [2.5 3.5 4.5]
# If you WANT integers:
result = arr // 2 # [2 3 4]
Pitfall 3: Boolean Indexing Creates Copies! ๐
arr = np.array([1, 2, 3, 4, 5])
# This creates a COPY, not a view
selected = arr[arr > 2]
selected[0] = 999
print(arr) # Still [1 2 3 4 5]
Pitfall 4: Shape vs Size Confusion ๐
arr = np.array([[1, 2, 3], [4, 5, 6]])
# shape = dimensions as tuple
print(arr.shape) # (2, 3)
# size = total number of elements
print(arr.size) # 6
# len = first dimension only!
print(len(arr)) # 2
Pitfall 5: NaN Comparisons are Weird ๐ค
import numpy as np
nan_val = np.nan
# This is ALWAYS False!
print(nan_val == np.nan) # False
# Use isnan() instead
print(np.isnan(nan_val)) # True
Pitfall 6: Broadcasting Gone Wrong ๐ก
a = np.array([[1, 2, 3]]) # (1, 3)
b = np.array([[1], [2], [3]]) # (3, 1)
# This broadcasts to (3, 3)!
result = a + b
# [[2 3 4]
# [3 4 5]
# [4 5 6]]
# Is this what you wanted? ๐ค
๐ฏ Quick Reference Diagram
graph LR A["NumPy Best Practices"] --> B["Print Config"] A --> C["Min Dimensions"] A --> D["Performance"] A --> E["Pitfalls"] B --> B1["threshold"] B --> B2["precision"] B --> B3["suppress"] C --> C1["atleast_1d"] C --> C2["atleast_2d"] C --> C3["atleast_3d"] D --> D1["Vectorize"] D --> D2["Use Views"] D --> D3["In-place Ops"] E --> E1["View Trap"] E --> E2["NaN Issues"] E --> E3["Broadcasting"]
๐ Golden Rules to Remember
- Configure printing for readable output
- Ensure dimensions match expectations
- Vectorize everything - no Python loops!
- Know when you have a view vs a copy
- Test with edge cases - empty arrays, NaN, scalars
Youโre now equipped with pro-level NumPy skills! Go forth and compute with confidence! ๐
