Pandas Advanced Operations

Back

Loading concept...

🐼 Pandas Advanced Operations: The Data Kitchen

Imagine you’re a master chef in a magical kitchen. Your ingredients? Data tables! Today, we’ll learn how to mix, match, clean, and transform them like a pro!


🍳 The Big Picture

Think of Pandas DataFrames like recipe cards in your kitchen. Sometimes you need to:

  • Combine two recipe cards into one (Merge & Join)
  • Stack multiple cards together (Concatenation)
  • Flip the card to see it differently (Pivot & Unpivot)
  • Apply a special sauce to every ingredient (Apply & Map)
  • Clean up messy handwriting (Data Cleaning)
  • Remove duplicate cards (Duplicate Detection)

Let’s cook! 🍴


1. 🔗 Merge and Join: Combining Recipe Cards

What’s the Story?

You have two recipe cards:

  • Card A: Lists ingredients with their prices
  • Card B: Lists ingredients with their calories

You want ONE card with prices AND calories together!

graph TD A["Card A: Ingredients + Prices"] --> C["Combined Card"] B["Card B: Ingredients + Calories"] --> C C --> D["Ingredients + Prices + Calories"]

The Magic Spell

# Two DataFrames
prices = pd.DataFrame({
    'item': ['apple', 'banana'],
    'price': [1.0, 0.5]
})

calories = pd.DataFrame({
    'item': ['apple', 'banana'],
    'cal': [95, 105]
})

# Merge them!
combined = pd.merge(
    prices,
    calories,
    on='item'
)

Result:

item price cal
apple 1.0 95
banana 0.5 105

Types of Merges (Like Venn Diagrams!)

Type What It Does Real Life
inner Only matching items Friends in BOTH clubs
left All from left + matches Your list + friend’s notes
right All from right + matches Friend’s list + your notes
outer Everything from both All friends combined
# Left merge example
pd.merge(A, B, on='key', how='left')

Join: The Index-Based Cousin

join() is like merge() but uses the index (row labels) instead of a column.

# Join using index
df1.join(df2, how='inner')

💡 Simple Rule: Use merge() for columns, join() for indexes!


2. 📚 Concatenation: Stacking Cards

What’s the Story?

Imagine you have recipe cards from Monday and Tuesday. You want to stack them into one big pile!

graph TD A["Monday Recipes"] --> C["All Recipes"] B["Tuesday Recipes"] --> C

The Magic Spell

monday = pd.DataFrame({
    'dish': ['pasta', 'salad']
})

tuesday = pd.DataFrame({
    'dish': ['soup', 'pizza']
})

# Stack vertically (like a pile)
all_dishes = pd.concat(
    [monday, tuesday]
)

Result:

dish
pasta
salad
soup
pizza

Two Directions

Direction Code Like…
Vertical (rows) axis=0 Stacking papers
Horizontal (columns) axis=1 Gluing side-by-side
# Side by side
pd.concat([df1, df2], axis=1)

💡 Pro Tip: Use ignore_index=True to get fresh row numbers!


3. 🔄 Pivot and Unpivot: Flipping the Card

What’s the Story?

Your recipe card shows data in a long list. But you want to see it as a neat table instead!

Pivot: Long → Wide

Think of it like unfolding a paper crane back into a flat sheet.

# Long format data
sales = pd.DataFrame({
    'day': ['Mon', 'Mon', 'Tue'],
    'fruit': ['apple', 'banana', 'apple'],
    'qty': [10, 15, 8]
})

# Pivot to wide format
wide = sales.pivot(
    index='day',
    columns='fruit',
    values='qty'
)

Before (Long):

day fruit qty
Mon apple 10
Mon banana 15
Tue apple 8

After (Wide):

day apple banana
Mon 10 15
Tue 8 NaN

Unpivot (Melt): Wide → Long

The opposite! Like folding paper into a crane.

# Melt back to long format
long = wide.melt(
    ignore_index=False,
    var_name='fruit',
    value_name='qty'
)

💡 Remember:

  • Pivot = Spread out (like opening an umbrella)
  • Melt = Gather up (like closing an umbrella)

4. 🎨 Apply and Map: The Special Sauce

What’s the Story?

You want to add the same sauce to EVERY dish. Or maybe different sauces based on the dish type!

Apply: Do Something to Each Row/Column

# Double every price
df['price'] = df['price'].apply(
    lambda x: x * 2
)

# Or use a regular function
def add_tax(price):
    return price * 1.1

df['final'] = df['price'].apply(add_tax)

Map: Replace Values Like Magic

# Map codes to names
grade_map = {
    'A': 'Excellent',
    'B': 'Good',
    'C': 'Okay'
}

df['grade_name'] = df['grade'].map(
    grade_map
)

Apply to Whole DataFrame

# Apply to each column
df.apply(lambda col: col.max())

# Apply to each row
df.apply(
    lambda row: row['a'] + row['b'],
    axis=1
)

💡 Quick Guide:

Function Use When
apply() Custom function on each element/row
map() Replace values using a dictionary
applymap() Apply function to EVERY cell

5. 🧹 Data Cleaning: Fixing Messy Handwriting

What’s the Story?

Your recipe card has messy handwriting:

  • Some words are missing
  • Some are spelled wrong
  • Some have extra spaces

Time to clean up!

Handling Missing Values

# Find missing values
df.isna().sum()

# Fill missing with a value
df['col'].fillna(0)

# Fill with the average
df['col'].fillna(df['col'].mean())

# Drop rows with missing data
df.dropna()

Fixing Text Data

# Remove extra spaces
df['name'] = df['name'].str.strip()

# Make lowercase
df['name'] = df['name'].str.lower()

# Replace values
df['name'] = df['name'].str.replace(
    'old', 'new'
)

Fixing Data Types

# Convert to number
df['age'] = pd.to_numeric(
    df['age'],
    errors='coerce'
)

# Convert to date
df['date'] = pd.to_datetime(
    df['date']
)

💡 Cleaning Checklist:

  • ✅ Check for NaN values
  • ✅ Strip whitespace
  • ✅ Standardize text case
  • ✅ Fix data types
  • ✅ Handle outliers

6. 🔍 Duplicate Detection & Removal

What’s the Story?

Oh no! Someone copied the same recipe card twice. Let’s find and remove the copies!

Finding Duplicates

# Check for duplicates
df.duplicated()

# See which rows are duplicates
df[df.duplicated()]

# Count duplicates
df.duplicated().sum()

Removing Duplicates

# Remove all duplicates
clean_df = df.drop_duplicates()

# Keep first occurrence
df.drop_duplicates(keep='first')

# Keep last occurrence
df.drop_duplicates(keep='last')

# Check specific columns only
df.drop_duplicates(
    subset=['name', 'date']
)
graph TD A["Original Data"] --> B{Duplicates?} B -->|Yes| C["drop_duplicates"] C --> D["Clean Data"] B -->|No| D

💡 Duplicate Strategy:

Situation Use
Keep first entry keep='first'
Keep last entry keep='last'
Remove all copies keep=False

🎓 Quick Summary

Operation What It Does Main Function
Merge/Join Combine tables by key pd.merge()
Concat Stack tables pd.concat()
Pivot Long → Wide df.pivot()
Melt Wide → Long df.melt()
Apply Custom function df.apply()
Map Replace values df.map()
Clean Fix messy data fillna(), strip()
Dedupe Remove copies drop_duplicates()

🚀 You Did It!

You’ve mastered the Pandas kitchen! Now you can:

  • ✅ Combine data from different sources
  • ✅ Reshape data any way you want
  • ✅ Apply transformations like a pro
  • ✅ Clean up messy datasets
  • ✅ Remove pesky duplicates

Remember: Data is like ingredients. With these tools, you can cook up anything! 🍳🎉

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.