NumPy Data Types: The Secret Labels on Your Containers 🏷️
Imagine you have a magical warehouse full of boxes. Each box can hold numbers, words, or true/false answers. But here’s the trick: every box has a label that tells you exactly what’s inside and how much space it takes.
In NumPy, these labels are called data types (or dtype). Let’s explore!
The dtype Object: Your Box Label
Think of dtype as a sticky note on each box. It tells NumPy:
- What kind of stuff is inside (numbers? text? yes/no?)
- How big each item is (small number? huge number?)
import numpy as np
arr = np.array([1, 2, 3])
print(arr.dtype) # int64
The dtype object has useful info:
print(arr.dtype.name) # 'int64'
print(arr.dtype.itemsize) # 8 bytes
Why care? The right label means NumPy uses memory wisely and runs super fast!
Numeric Data Types: Numbers of All Sizes
Numbers come in different “box sizes.” Small numbers need small boxes. Giant numbers need big boxes!
Integer Types (Whole Numbers)
| Type | Size | Range |
|---|---|---|
int8 |
1 byte | -128 to 127 |
int16 |
2 bytes | -32,768 to 32,767 |
int32 |
4 bytes | ~-2 billion to 2 billion |
int64 |
8 bytes | Really huge! |
tiny = np.array([1, 2], dtype=np.int8)
big = np.array([1, 2], dtype=np.int64)
print(tiny.itemsize) # 1 byte each
print(big.itemsize) # 8 bytes each
Unsigned Integers (No Negatives!)
Start with u - they only hold positive numbers:
positive = np.array([200], dtype=np.uint8)
# Range: 0 to 255
Floating-Point (Decimal Numbers)
For numbers with decimal points:
| Type | Size | Precision |
|---|---|---|
float16 |
2 bytes | Low |
float32 |
4 bytes | Medium |
float64 |
8 bytes | High |
precise = np.array([3.14159], dtype=np.float64)
Complex Numbers
For math with imaginary parts:
comp = np.array([1+2j], dtype=np.complex128)
print(comp) # [1.+2.j]
Boolean Data Type: Yes or No!
Booleans are like light switches. Only two states: ON (True) or OFF (False).
flags = np.array([True, False, True])
print(flags.dtype) # bool
# Takes only 1 byte per item!
print(flags.itemsize) # 1
Super useful for filtering:
numbers = np.array([1, 5, 3, 8, 2])
mask = numbers > 3
print(mask) # [False True False True False]
print(numbers[mask]) # [5 8]
String and Object dtypes: Words and Everything Else
Fixed-Length Strings
NumPy strings have a set maximum length:
names = np.array(['cat', 'dog', 'bird'])
print(names.dtype) # <U4 (Unicode, max 4 chars)
The U means Unicode text. The number is max length.
# Force specific length
fixed = np.array(['hi'], dtype='U10')
print(fixed.dtype) # <U10
Object dtype: The “Anything” Box
When NumPy doesn’t know what type, it uses object:
mixed = np.array([1, 'hello', [1,2,3]],
dtype=object)
print(mixed.dtype) # object
Warning: Object arrays are slower. NumPy works best with specific types!
Specifying dtype at Creation
You can choose the box size when you create an array:
# Method 1: Using dtype parameter
arr1 = np.array([1, 2, 3], dtype=np.float32)
# Method 2: Using string shorthand
arr2 = np.array([1, 2, 3], dtype='float32')
# Method 3: Using type character
arr3 = np.zeros(5, dtype='i4') # int32
Common shortcuts:
| Shorthand | Meaning |
|---|---|
'i4' |
4-byte integer |
'f8' |
8-byte float |
'U10' |
Unicode string (10 chars) |
'?' |
Boolean |
# Create array of zeros with specific type
zeros = np.zeros(10, dtype=np.int8)
print(zeros.nbytes) # Only 10 bytes!
Changing dtype with astype
What if you picked the wrong box? Swap it!
# Start with integers
nums = np.array([1, 2, 3])
# Convert to floats
floats = nums.astype(np.float64)
print(floats) # [1. 2. 3.]
Important Rules
Integer to Float: Safe! No data lost.
np.array([1, 2]).astype(float)
# [1.0, 2.0]
Float to Integer: Decimals get chopped off!
np.array([1.9, 2.7]).astype(int)
# [1, 2] - Not rounded, truncated!
String to Number:
text = np.array(['1.5', '2.5', '3.5'])
numbers = text.astype(float)
print(numbers) # [1.5 2.5 3.5]
Remember:
astype()creates a NEW array. The original stays the same!
Type Promotion Rules: When Types Meet
What happens when different types mix together? NumPy promotes to the bigger type!
graph TD A[bool] --> B[int8] B --> C[int16] C --> D[int32] D --> E[int64] E --> F[float64] G[float16] --> H[float32] H --> F
The Golden Rule
Bigger type wins!
int_arr = np.array([1, 2], dtype=np.int32)
float_arr = np.array([1.5, 2.5], dtype=np.float64)
result = int_arr + float_arr
print(result.dtype) # float64
Promotion Examples
# Bool + Int = Int
np.array([True]) + np.array([5])
# Result: array([6]) - dtype int64
# Int + Float = Float
np.array([1]) + np.array([1.5])
# Result: array([2.5]) - dtype float64
# Small int + Big int = Big int
np.int8(10) + np.int64(20)
# Result: 30 - dtype int64
Checking Result Type
Use np.result_type() to predict:
np.result_type(np.int32, np.float64)
# dtype('float64')
np.result_type('i4', 'f8')
# dtype('float64')
Quick Reference Chart
graph TD subgraph "NumPy Data Types" A[dtype] --> B[Numeric] A --> C[Boolean] A --> D[String] A --> E[Object] B --> F[Integer<br>int8,16,32,64] B --> G[Unsigned<br>uint8,16,32,64] B --> H[Float<br>float16,32,64] B --> I[Complex<br>complex64,128] end
Key Takeaways
- dtype = label telling NumPy what’s in the box
- Pick small types when you can (saves memory!)
- astype() converts between types (makes a copy)
- Bigger type wins when mixing types
- Avoid object dtype for best performance
You now understand how NumPy labels its data! Like a well-organized warehouse, the right labels make everything faster and more efficient. 🚀