🎯 Pandas: Filtering Data Like a Detective
Imagine you’re a detective with a giant room full of treasure chests. Each chest has different items inside. You need to find ONLY the chests with gold coins. How do you do it? You CHECK each chest and KEEP the ones that match your rule!
That’s exactly what filtering in Pandas does. Let’s become data detectives!
🔍 Boolean Indexing: The Yes/No Magic Wand
Think of Boolean indexing like having a magic wand that points at each row and says “YES!” or “NO!”
When you wave your wand, you get a list of True/False answers. Then Pandas shows you ONLY the “True” rows.
import pandas as pd
# Our treasure data
df = pd.DataFrame({
'item': ['gold', 'silver', 'gold', 'bronze'],
'value': [100, 50, 80, 20]
})
# Wave the magic wand!
mask = df['value'] > 50
# Result: [True, False, True, False]
# Use the mask to filter
treasures = df[mask]
What happened?
- We asked: “Is value greater than 50?”
- Pandas checked EACH row
- We got: True, False, True, False
- Pandas kept ONLY the True rows!
🎯 Conditional Selection: Asking Questions
Conditional selection is like asking a question about your data. The answer is always YES or NO for each row.
# Question: Which items are gold?
gold_items = df[df['item'] == 'gold']
# Question: Which items cost less than 60?
cheap_items = df[df['value'] < 60]
Common questions you can ask:
==means “is exactly equal to”!=means “is NOT equal to”>means “is greater than”<means “is less than”>=means “is greater than or equal to”<=means “is less than or equal to”
🔗 Multiple Conditions with AND: Both Must Be True
What if you want items that are BOTH gold AND worth more than 50?
Use the & symbol. Think of it as “AND” - BOTH conditions must be true!
# Gold items AND value > 50
result = df[
(df['item'] == 'gold') &
(df['value'] > 50)
]
⚠️ SUPER IMPORTANT: Always wrap each condition in parentheses ()!
graph TD A[Check Row] --> B{Is it gold?} B -->|Yes| C{Value > 50?} B -->|No| D[❌ Skip it] C -->|Yes| E[✅ Keep it!] C -->|No| D
🔀 Multiple Conditions with OR: Either One Works
What if you want items that are gold OR worth more than 90?
Use the | symbol. Think of it as “OR” - if EITHER condition is true, keep the row!
# Gold items OR value > 90
result = df[
(df['item'] == 'gold') |
(df['value'] > 90)
]
graph TD A[Check Row] --> B{Is it gold?} B -->|Yes| E[✅ Keep it!] B -->|No| C{Value > 90?} C -->|Yes| E C -->|No| D[❌ Skip it]
Quick Memory Trick:
&= AND = BOTH must be true|= OR = ANY one can be true
📋 The isin() Method: Shopping List Filtering
Imagine you have a shopping list: “I want apples, oranges, or bananas.”
The isin() method checks if values are IN your list!
# My shopping list
wanted = ['gold', 'silver']
# Find items in my list
result = df[df['item'].isin(wanted)]
This is MUCH cleaner than writing:
# The hard way (don't do this!)
df[(df['item'] == 'gold') |
(df['item'] == 'silver')]
When to use isin():
- Checking for multiple specific values
- Filtering by a list of options
- Making your code cleaner and easier to read
⚠️ SettingWithCopyWarning: The Mysterious Warning
This warning is like a teacher saying: “Are you SURE you want to do that?”
The Problem
When you filter data, Pandas sometimes gives you a view (like looking through a window) and sometimes a copy (like taking a photo).
If you try to change a view, you might accidentally change the original!
# This might cause a warning!
filtered = df[df['value'] > 50]
filtered['value'] = 999 # ⚠️ Warning!
The Solution
Always use .copy() when you want to modify filtered data:
# Safe way - make a copy first!
filtered = df[df['value'] > 50].copy()
filtered['value'] = 999 # ✅ No warning!
👁️ View vs Copy: Looking vs Taking
This is like the difference between:
- View: Looking through a window at a painting (changes to the painting show through)
- Copy: Taking a photo of the painting (your photo is independent)
When You Get a View
# Simple column selection often gives a view
col = df['value']
col[0] = 999 # Might change df too!
When You Get a Copy
# Filtering usually gives a copy
filtered = df[df['value'] > 50]
# But it's unpredictable!
The Golden Rule
When in doubt, use .copy()!
# Always safe
my_data = df[df['value'] > 50].copy()
graph TD A[Filter Data] --> B{Want to modify?} B -->|Yes| C[Use .copy] B -->|No| D[View is fine] C --> E[✅ Safe to edit] D --> F[✅ Just reading]
🆕 Copy-on-Write: The Future is Here!
Copy-on-Write (CoW) is Pandas’ new smart system. It’s like having a magical photocopier that only makes copies when you actually need them!
How It Works
- You filter data → Pandas gives you something that LOOKS like a copy
- You just READ it → No actual copy is made (saves memory!)
- You try to WRITE/CHANGE it → NOW Pandas makes a real copy
Enabling Copy-on-Write
# Turn on CoW (recommended for new code!)
pd.options.mode.copy_on_write = True
# Now this is always safe!
filtered = df[df['value'] > 50]
filtered['value'] = 999 # Automatically safe!
Why It’s Amazing
- No more warnings about views vs copies
- Saves memory (copies only when needed)
- Predictable behavior (always works the same way)
- The future of Pandas (will be default soon!)
🎓 Putting It All Together
Let’s be data detectives with everything we learned!
import pandas as pd
# Enable the future!
pd.options.mode.copy_on_write = True
# Our treasure room
treasures = pd.DataFrame({
'type': ['gold', 'silver', 'gold', 'ruby', 'silver'],
'value': [100, 30, 80, 200, 45],
'found_in': ['cave', 'river', 'cave', 'mountain', 'river']
})
# 1. Boolean indexing
high_value = treasures['value'] > 50
# 2. Conditional selection
gold_only = treasures[treasures['type'] == 'gold']
# 3. AND condition: gold AND high value
gold_and_valuable = treasures[
(treasures['type'] == 'gold') &
(treasures['value'] > 50)
]
# 4. OR condition: gold OR ruby
gold_or_ruby = treasures[
(treasures['type'] == 'gold') |
(treasures['type'] == 'ruby')
]
# 5. isin: from specific locations
from_good_places = treasures[
treasures['found_in'].isin(['cave', 'mountain'])
]
🌟 Key Takeaways
| Concept | What It Does | Example |
|---|---|---|
| Boolean Indexing | Creates True/False mask | df['col'] > 5 |
| Conditional Selection | Filters rows by condition | df[df['col'] > 5] |
AND (&) |
Both conditions must be true | (cond1) & (cond2) |
OR (|) |
Either condition can be true | (cond1) | (cond2) |
isin() |
Check if value is in a list | df['col'].isin([1,2,3]) |
.copy() |
Make independent copy | df[filter].copy() |
| Copy-on-Write | Smart automatic copying | Enable in settings! |
You’re now a data detective! 🕵️ Go find those treasures in your data!