🐼 Pandas: Your Data’s Best Friend
The Story of the Magic Spreadsheet
Imagine you have a giant box of LEGO bricks scattered all over your room. Finding the red ones? Nightmare! Sorting by size? Hours of work! Now imagine a magic helper that can instantly find, sort, combine, and organize ALL your bricks in seconds.
That’s Pandas! 🎉
Pandas is like having a super-smart assistant for your data. It takes messy information and makes it neat, organized, and easy to understand.
🎯 What We’ll Learn
graph TD A[🐼 Pandas Basics] --> B[📊 DataFrames] B --> C[🔧 Data Manipulation] C --> D[📦 GroupBy Magic] D --> E[🔗 Merge & Join] E --> F[📅 DateTime Handling]
📚 Chapter 1: Meet Pandas
What is Pandas?
Think of Pandas as a super-powered spreadsheet that lives inside Python. Just like how Excel has rows and columns, Pandas has them too—but with superpowers!
Real Life Examples:
- Netflix uses it to organize viewer data
- Banks use it to track transactions
- Scientists use it to analyze experiments
Your First Pandas Code
import pandas as pd
# Create a simple table
data = {
'Name': ['Anna', 'Bob', 'Cara'],
'Age': [10, 12, 11],
'Score': [95, 88, 92]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Score
0 Anna 10 95
1 Bob 12 88
2 Cara 11 92
🎈 Think of it: Each row is a person, each column is information about them.
📊 Chapter 2: The DataFrame - Your Data Table
What’s a DataFrame?
A DataFrame is like a table in a notebook. It has:
- Rows = Each item (like each student)
- Columns = Information about items (like name, age, score)
- Index = Row numbers (like seat numbers)
Creating DataFrames
Method 1: From a Dictionary
students = {
'Name': ['Emma', 'Liam'],
'Grade': ['A', 'B']
}
df = pd.DataFrame(students)
Method 2: From a List
data = [
['Emma', 'A'],
['Liam', 'B']
]
df = pd.DataFrame(data,
columns=['Name', 'Grade'])
Selecting Data
# Get one column
df['Name']
# Get multiple columns
df[['Name', 'Grade']]
# Get one row by position
df.iloc[0] # First row
# Get rows by condition
df[df['Grade'] == 'A']
🧙♂️ Magic Tip: iloc = position (like “item 0”), loc = label (like “row named X”)
🔧 Chapter 3: Data Manipulation
The Art of Shaping Data
Imagine you’re a chef preparing ingredients. Sometimes you need to:
- Add new ingredients (new columns)
- Remove bad parts (drop columns/rows)
- Change how things look (transform data)
- Filter out what you don’t need
Adding New Columns
df['Bonus'] = df['Score'] * 0.1
# Or with a condition
df['Pass'] = df['Score'] >= 60
Removing Data
# Drop a column
df = df.drop('Bonus', axis=1)
# Drop a row
df = df.drop(0, axis=0)
# Drop rows with missing values
df = df.dropna()
Changing Values
# Replace values
df['Grade'] = df['Grade'].replace(
'F', 'Fail')
# Apply a function
df['Name'] = df['Name'].str.upper()
Filtering Data
# Students who passed
passed = df[df['Score'] >= 60]
# Multiple conditions
stars = df[
(df['Score'] >= 90) &
(df['Age'] <= 12)
]
🎯 Remember: & means AND, | means OR
📦 Chapter 4: GroupBy - Sorting into Buckets
The Bucket Story
Imagine you have a basket of mixed fruits and you want to:
- Group them by type (apples together, oranges together)
- Count how many of each
- Find the biggest one in each group
That’s exactly what GroupBy does!
Basic GroupBy
# Sample data
sales = pd.DataFrame({
'Store': ['A', 'B', 'A', 'B'],
'Product': ['Apple', 'Apple',
'Banana', 'Banana'],
'Amount': [10, 15, 20, 25]
})
# Group by Store and sum
by_store = sales.groupby('Store')
print(by_store['Amount'].sum())
Output:
Store
A 30
B 40
Multiple Aggregations
# Get multiple stats at once
stats = sales.groupby('Store').agg({
'Amount': ['sum', 'mean', 'count']
})
Group by Multiple Columns
detailed = sales.groupby(
['Store', 'Product']
)['Amount'].sum()
🪣 Think of it: GroupBy = Put similar things in buckets, then do math on each bucket!
🔗 Chapter 5: Merge and Join
The Puzzle Piece Story
Imagine you have two puzzle pieces that belong together:
- Piece 1: Student names and their IDs
- Piece 2: IDs and their test scores
To see “which student got which score,” you need to connect the pieces using the ID!
Types of Joins
graph TD A[Two Tables] --> B[Inner Join] A --> C[Left Join] A --> D[Right Join] A --> E[Outer Join] B --> F[Only matching rows] C --> G[All left + matching right] D --> H[All right + matching left] E --> I[All rows from both]
Merge Example
# Table 1: Students
students = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Anna', 'Bob', 'Cara']
})
# Table 2: Scores
scores = pd.DataFrame({
'ID': [1, 2, 4],
'Score': [95, 88, 75]
})
# Inner join (only matching IDs)
result = pd.merge(
students, scores,
on='ID', how='inner'
)
Result:
ID Name Score
0 1 Anna 95
1 2 Bob 88
Different Join Types
# Left join - keep all students
left = pd.merge(
students, scores,
on='ID', how='left'
)
# Outer join - keep everyone
outer = pd.merge(
students, scores,
on='ID', how='outer'
)
🧩 Remember:
inner= Only matchesleft= All from left tableright= All from right tableouter= Everything from both
📅 Chapter 6: DateTime Handling
Time is Data Too!
Dates and times are special. They’re not just numbers or text—they have meaning! Pandas understands this.
Examples of date questions:
- “How many sales in January?”
- “What day had the most visitors?”
- “How many hours between these events?”
Converting to DateTime
# Create dates from strings
df = pd.DataFrame({
'date': ['2024-01-15',
'2024-02-20',
'2024-03-25']
})
# Convert to datetime
df['date'] = pd.to_datetime(df['date'])
Extracting Date Parts
# Get year, month, day
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
# Get day of week (0=Monday)
df['weekday'] = df['date'].dt.dayofweek
# Get day name
df['day_name'] = df['date'].dt.day_name()
Date Math
# Add days
df['next_week'] = df['date'] + \
pd.Timedelta(days=7)
# Find difference
df['diff'] = df['date'].diff()
# Resample by month
monthly = df.set_index('date').\
resample('M').sum()
Filtering by Date
# Sales in 2024
sales_2024 = df[
df['date'].dt.year == 2024
]
# Between two dates
jan_sales = df[
(df['date'] >= '2024-01-01') &
(df['date'] <= '2024-01-31')
]
📆 Pro Tip: Always convert date strings to datetime FIRST, then do operations!
🎉 You Did It!
You’ve learned the six superpowers of Pandas:
| Power | What It Does |
|---|---|
| 🐼 Pandas Basics | Import and create data |
| 📊 DataFrames | Organize in rows/columns |
| 🔧 Manipulation | Add, remove, change data |
| 📦 GroupBy | Sort into buckets & summarize |
| 🔗 Merge/Join | Connect two tables |
| 📅 DateTime | Work with dates & times |
🚀 Quick Reference
import pandas as pd
# Create DataFrame
df = pd.DataFrame(data)
# Select
df['column'] # One column
df[['a', 'b']] # Multiple columns
df.iloc[0] # Row by position
df.loc[0] # Row by label
# Filter
df[df['col'] > 5]
# Group
df.groupby('col').sum()
# Merge
pd.merge(df1, df2, on='key')
# DateTime
pd.to_datetime(df['date'])
df['date'].dt.year
Remember: Data is like LEGO bricks. Pandas helps you build anything you imagine! 🧱✨