🎯 NumPy Sorting & Searching: Finding Things Fast!
The Messy Drawer Problem
Imagine you have a drawer full of toys. They’re all mixed up—cars, dolls, blocks, everything jumbled together. Now your friend asks: “Do you have a red car?”
Two ways to find it:
- The slow way: Pick up every single toy, check if it’s a red car. This takes forever!
- The fast way: First, organize the toys. Put all cars together. Now finding the red car is super quick!
This is exactly why sorting and searching matter in NumPy. When your data is organized, finding things becomes lightning fast! ⚡
🔢 Sorting Arrays: Putting Things in Order
Sorting means arranging things from smallest to biggest (or biggest to smallest).
The Magic Spell: np.sort()
import numpy as np
messy = np.array([5, 2, 8, 1, 9])
tidy = np.sort(messy)
print(tidy) # [1 2 5 8 9]
What happened? NumPy looked at all the numbers and lined them up from smallest to biggest!
Sorting Doesn’t Change the Original
Here’s something cool: np.sort() gives you a NEW sorted array. Your original stays messy!
messy = np.array([5, 2, 8, 1, 9])
tidy = np.sort(messy)
print("Original:", messy) # [5 2 8 1 9]
print("Sorted:", tidy) # [1 2 5 8 9]
Sorting Backwards (Biggest First)
What if you want the biggest numbers first? Easy trick!
scores = np.array([85, 92, 78, 95, 88])
top_first = np.sort(scores)[::-1]
print(top_first) # [95 92 88 85 78]
The [::-1] part flips everything around!
Sorting 2D Arrays: Rows vs Columns
When you have a grid of numbers, you can sort by rows OR columns.
grid = np.array([
[3, 1, 4],
[1, 5, 9],
[2, 6, 5]
])
# Sort each row (left to right)
by_row = np.sort(grid, axis=1)
print(by_row)
# [[1 3 4]
# [1 5 9]
# [2 5 6]]
# Sort each column (top to bottom)
by_col = np.sort(grid, axis=0)
print(by_col)
# [[1 1 4]
# [2 5 5]
# [3 6 9]]
Think of axis=0 as “up and down” and axis=1 as “left and right.”
🏷️ Indirect Sorting with argsort: “Where Should This Go?”
Sometimes you don’t want the sorted values—you want to know WHERE each value should go!
The Story of the Race
Imagine 4 friends finished a race with these times:
Friend A: 12 seconds
Friend B: 8 seconds
Friend C: 15 seconds
Friend D: 10 seconds
Who came 1st, 2nd, 3rd, 4th?
times = np.array([12, 8, 15, 10])
positions = np.argsort(times)
print(positions) # [1 3 0 2]
What does [1 3 0 2] mean?
- Position 0 (1st place): Index 1 → Friend B (8 sec) 🥇
- Position 1 (2nd place): Index 3 → Friend D (10 sec) 🥈
- Position 2 (3rd place): Index 0 → Friend A (12 sec) 🥉
- Position 3 (4th place): Index 2 → Friend C (15 sec)
Using argsort to Get Sorted Values
Here’s a cool trick: use those positions to sort!
times = np.array([12, 8, 15, 10])
order = np.argsort(times)
sorted_times = times[order]
print(sorted_times) # [8 10 12 15]
Why Is argsort Useful?
Real example: You have student names and their scores:
names = np.array(['Ali', 'Bob', 'Cat', 'Dan'])
scores = np.array([75, 92, 88, 65])
# Sort by scores but keep names matched!
order = np.argsort(scores)[::-1] # Best first
for i in order:
print(f"{names[i]}: {scores[i]}")
# Bob: 92
# Cat: 88
# Ali: 75
# Dan: 65
Without argsort, keeping names matched to scores would be tricky!
⚡ Partial Sorting with partition: “I Just Need the Top Few!”
Imagine you have 1000 test scores but only need the top 10. Why sort ALL 1000?
The Smart Shortcut: np.partition()
Partition does just enough work to find what you need!
scores = np.array([45, 92, 78, 88, 55, 99, 72, 85])
# Get the 3 smallest scores
partial = np.partition(scores, 3)
print(partial)
# [45 55 72 78 88 99 92 85]
# ↑ ↑ ↑
# These 3 are the smallest (not sorted among themselves)
What partition does:
- Puts the 3 smallest values at the front
- Everything else goes after
- But it DOESN’T fully sort them!
Why Not Just Sort?
Sorting 1 million numbers takes time. But partition is MUCH faster when you only need a few values!
big_array = np.random.rand(1000000)
# Get just the 5 smallest
top_5_area = np.partition(big_array, 5)[:5]
This is way faster than sorting all million numbers!
Finding the Top K Values
Want the biggest instead of smallest? Use negative!
scores = np.array([45, 92, 78, 88, 55, 99, 72, 85])
# Get the 3 BIGGEST scores
partial = np.partition(scores, -3)
print(partial[-3:]) # [88 92 99]
argpartition: Positions of Top Values
Just like argsort, there’s np.argpartition()!
scores = np.array([45, 92, 78, 88, 55, 99, 72, 85])
# Which students are in top 3?
top_3_indices = np.argpartition(scores, -3)[-3:]
print(top_3_indices) # [3 1 5] → positions of 88, 92, 99
🔍 Searching in Sorted Arrays: Finding Things Super Fast
Once your array is sorted, finding things becomes magical!
The Phone Book Trick: Binary Search
Remember phone books? They’re sorted by name. To find “Smith”:
- Open to the middle: “M”—too early!
- Open to 3/4: “R”—still early!
- Open to 7/8: “T”—too far!
- Open between: Found “Smith”!
This is called binary search—cutting the search in half each time!
np.searchsorted(): Where Would This Fit?
sorted_arr = np.array([10, 20, 30, 40, 50])
# Where would 25 fit?
pos = np.searchsorted(sorted_arr, 25)
print(pos) # 2
# Insert 25 at position 2 → [10, 20, 25, 30, 40, 50]
NumPy tells you: “Put 25 at position 2 to keep it sorted!”
Finding Multiple Values at Once
sorted_arr = np.array([10, 20, 30, 40, 50])
values = np.array([15, 35, 55])
positions = np.searchsorted(sorted_arr, values)
print(positions) # [1 3 5]
# 15 → position 1 (between 10 and 20)
# 35 → position 3 (between 30 and 40)
# 55 → position 5 (after everything)
left vs right: Handling Duplicates
What if the value already exists?
arr = np.array([10, 20, 20, 20, 30])
left = np.searchsorted(arr, 20, side='left')
right = np.searchsorted(arr, 20, side='right')
print(f"Left: {left}") # 1 (before all 20s)
print(f"Right: {right}") # 4 (after all 20s)
Checking If a Value Exists
sorted_arr = np.array([10, 20, 30, 40, 50])
def is_in_sorted(arr, value):
pos = np.searchsorted(arr, value)
return pos < len(arr) and arr[pos] == value
print(is_in_sorted(sorted_arr, 30)) # True
print(is_in_sorted(sorted_arr, 35)) # False
🎯 Summary: When to Use What
graph TD A["Need to organize data?"] --> B{Full sort needed?} B -->|Yes| C["np.sort"] B -->|Only top/bottom K| D["np.partition"] E["Need positions, not values?"] --> F{All positions?} F -->|Yes| G["np.argsort"] F -->|Only top K positions| H["np.argpartition"] I["Array already sorted?"] --> J["np.searchsorted"]
| Task | Function | Speed |
|---|---|---|
| Sort everything | np.sort() |
Normal |
| Get top/bottom K | np.partition() |
Fast! |
| Find insert position | np.searchsorted() |
Super fast! |
| Get sort order | np.argsort() |
Normal |
| Get top K indices | np.argpartition() |
Fast! |
🚀 You Did It!
Now you know how to:
- ✅ Sort arrays with
np.sort() - ✅ Get positions with
np.argsort() - ✅ Find top values quickly with
np.partition() - ✅ Search sorted arrays with
np.searchsorted()
Think of your data as toys in a drawer. Keep them organized, and you’ll always find what you need—fast! 🎉
