Pandas Operations

Back

Loading concept...

🐼 Pandas Operations: Your Data Kitchen Adventure!

Imagine you’re a chef in a magical kitchen. Your ingredients? Data! And Pandas is your super-smart kitchen assistant that helps you chop, mix, sort, and transform ingredients into delicious insights.

Today, we’ll learn five powerful kitchen tricks:

  1. 🔤 String Operations (chopping and slicing text)
  2. 📅 DateTime Operations (working with time)
  3. 👥 Groupby Operations (sorting into baskets)
  4. 📊 Aggregation (counting and measuring)
  5. ↔️ Wide vs Long Data (reshaping your table)

🔤 String Operations in Pandas

What Are String Operations?

Think of text as a piece of string (like yarn!). String operations let you cut, tie, stretch, and change that string however you want.

Simple Example:

  • You have a list of names: “JOHN”, “mary”, “Bob”
  • You want them all to look the same: “John”, “Mary”, “Bob”
  • Pandas helps you fix them all at once!

The Magic .str Accessor

In Pandas, we use .str to work with text. It’s like saying “Hey Pandas, treat this as text!”

import pandas as pd

# Our messy data
names = pd.Series(['JOHN', 'mary', 'Bob'])

# Make all lowercase
names.str.lower()
# Result: ['john', 'mary', 'bob']

# Make all uppercase
names.str.upper()
# Result: ['JOHN', 'MARY', 'BOB']

# Proper case (first letter big)
names.str.title()
# Result: ['John', 'Mary', 'Bob']

Common String Tricks

Trick What It Does Example
.str.lower() All small letters “HELLO” → “hello”
.str.upper() All BIG letters “hello” → “HELLO”
.str.strip() Remove extra spaces " hi " → “hi”
.str.replace() Swap one thing for another “cat” → “dog”
.str.contains() Check if text has something “hello” has “ell”? ✅
.str.len() Count characters “hello” = 5
.str.split() Break text apart “a-b-c” → [‘a’,‘b’,‘c’]

Real Kitchen Example

# Messy ingredient list
ingredients = pd.Series([
    '  TOMATOES  ',
    'onions',
    'GARLIC cloves'
])

# Clean them up!
clean = ingredients.str.strip()
clean = clean.str.lower()
clean = clean.str.title()

# Result: ['Tomatoes', 'Onions',
#          'Garlic Cloves']

Finding Things in Text

emails = pd.Series([
    'john@gmail.com',
    'mary@yahoo.com',
    'bob@gmail.com'
])

# Find all Gmail users
gmail_users = emails.str.contains('gmail')
# Result: [True, False, True]

# Extract the domain
domains = emails.str.split('@').str[1]
# Result: ['gmail.com', 'yahoo.com',
#          'gmail.com']

📅 DateTime Operations

What Are DateTime Operations?

Time is tricky! Is “01/02/2024” January 2nd or February 1st? Pandas helps you work with dates and times without confusion.

Simple Example:

  • You have birthdays as text: “1990-05-15”
  • Pandas turns them into real dates you can calculate with!
  • “How old is this person?” becomes easy!

Converting to DateTime

import pandas as pd

# Text dates
dates_text = pd.Series([
    '2024-01-15',
    '2024-06-20',
    '2024-12-25'
])

# Convert to real dates
dates = pd.to_datetime(dates_text)

Extracting Parts of a Date

Like taking apart a clock to see the gears!

# Create a date column
df = pd.DataFrame({
    'event': ['Birthday', 'Holiday', 'Meeting'],
    'date': pd.to_datetime([
        '2024-03-15',
        '2024-12-25',
        '2024-07-04'
    ])
})

# Extract parts using .dt accessor
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['weekday'] = df['date'].dt.day_name()

Result:

event year month day weekday
Birthday 2024 3 15 Friday
Holiday 2024 12 25 Wednesday
Meeting 2024 7 4 Thursday

Date Math

from datetime import timedelta

# Today's date
today = pd.Timestamp('2024-06-15')

# Add 7 days
next_week = today + timedelta(days=7)
# Result: 2024-06-22

# Difference between dates
date1 = pd.Timestamp('2024-01-01')
date2 = pd.Timestamp('2024-12-31')
difference = date2 - date1
# Result: 365 days

Common DateTime Properties

Property What It Gives You
.dt.year The year (2024)
.dt.month Month number (1-12)
.dt.day Day of month (1-31)
.dt.hour Hour (0-23)
.dt.minute Minute (0-59)
.dt.day_name() “Monday”, “Tuesday”…
.dt.month_name() “January”, “February”…
.dt.quarter Quarter (1-4)
.dt.dayofweek 0=Monday, 6=Sunday

👥 Groupby Operations

What Is Groupby?

Imagine you have a big box of colored marbles. Groupby helps you sort them into separate piles by color, then do something with each pile!

Simple Example:

  • You have sales data for different stores
  • Group by store name
  • Now you can see total sales per store!
graph TD A["🎨 All Marbles Mixed"] --> B["Groupby Color"] B --> C["🔴 Red Pile"] B --> D["🔵 Blue Pile"] B --> E["🟢 Green Pile"] C --> F["Count: 5"] D --> G["Count: 3"] E --> H["Count: 7"]

Basic Groupby

import pandas as pd

# Sales data
df = pd.DataFrame({
    'store': ['A', 'B', 'A', 'B', 'A'],
    'product': ['Apple', 'Apple', 'Banana',
                'Banana', 'Apple'],
    'sales': [100, 150, 80, 120, 90]
})

# Group by store
grouped = df.groupby('store')

# Total sales per store
grouped['sales'].sum()
# Store A: 270
# Store B: 270

Multiple Grouping

# Group by TWO things: store AND product
grouped = df.groupby(['store', 'product'])

# Sales for each store-product combo
grouped['sales'].sum()

# Store A, Apple: 190
# Store A, Banana: 80
# Store B, Apple: 150
# Store B, Banana: 120

Common Groupby Operations

Operation What It Does
.sum() Add all values together
.mean() Find the average
.count() Count how many items
.min() Find smallest value
.max() Find largest value
.first() Get first item
.last() Get last item

📊 Pandas Aggregation

What Is Aggregation?

Aggregation means combining many values into one summary number. Like counting all your toys instead of listing each one!

Simple Example:

  • You have 50 test scores
  • Aggregation gives you: Average = 85, Highest = 100, Lowest = 60
  • One line of summary instead of 50 numbers!

The .agg() Function

import pandas as pd

df = pd.DataFrame({
    'category': ['A', 'A', 'B', 'B'],
    'value': [10, 20, 30, 40],
    'count': [1, 2, 3, 4]
})

# Multiple aggregations at once!
result = df.groupby('category').agg({
    'value': ['sum', 'mean'],
    'count': ['min', 'max']
})

Result:

category value_sum value_mean count_min count_max
A 30 15.0 1 2
B 70 35.0 3 4

Named Aggregations (Cleaner Way!)

result = df.groupby('category').agg(
    total_value=('value', 'sum'),
    avg_value=('value', 'mean'),
    min_count=('count', 'min'),
    max_count=('count', 'max')
)

Custom Aggregation Functions

# Your own function!
def range_func(x):
    return x.max() - x.min()

df.groupby('category')['value'].agg(range_func)
# A: 10 (20-10)
# B: 10 (40-30)

Aggregation Without Groupby

# Summarize entire DataFrame
df['value'].agg(['sum', 'mean', 'min', 'max'])

# Result:
# sum     100
# mean     25
# min      10
# max      40

↔️ Wide vs Long Data

What’s the Difference?

Think of a spreadsheet. Wide data spreads information across columns. Long data stacks everything in rows.

Wide Data (like a calendar):

Name Jan Feb Mar
Alice 100 110 120
Bob 200 210 220

Long Data (like a receipt):

Name Month Value
Alice Jan 100
Alice Feb 110
Alice Mar 120
Bob Jan 200
Bob Feb 210
Bob Mar 220

When to Use Which?

graph TD A["Your Data"] --> B{What do you need?} B -->|Easy to read| C["Wide Format"] B -->|Easy to analyze| D["Long Format"] C --> E["Reports, Dashboards"] D --> F["Groupby, Plotting, ML"]

Going from Wide to Long: melt()

import pandas as pd

# Wide data
wide_df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Jan': [100, 200],
    'Feb': [110, 210],
    'Mar': [120, 220]
})

# Melt to long format
long_df = pd.melt(
    wide_df,
    id_vars=['Name'],      # Keep this column
    var_name='Month',      # New column for headers
    value_name='Sales'     # New column for values
)

Result:

Name Month Sales
Alice Jan 100
Alice Feb 110
Alice Mar 120
Bob Jan 200
Bob Feb 210
Bob Mar 220

Going from Long to Wide: pivot()

# Long data back to wide
wide_again = long_df.pivot(
    index='Name',     # Row labels
    columns='Month',  # Column headers
    values='Sales'    # Values to fill in
)

pivot_table() for Aggregation

When you have duplicate entries, use pivot_table():

# Data with duplicates
sales = pd.DataFrame({
    'Store': ['A', 'A', 'B', 'B'],
    'Product': ['X', 'X', 'X', 'Y'],
    'Sales': [10, 20, 30, 40]
})

# Pivot with aggregation
result = pd.pivot_table(
    sales,
    index='Store',
    columns='Product',
    values='Sales',
    aggfunc='sum'  # Combine duplicates!
)

Result:

Store X Y
A 30 NaN
B 30 40

🎯 Quick Reference Summary

Operation Purpose Key Function
String Work with text .str.method()
DateTime Work with dates .dt.property
Groupby Split into groups .groupby()
Aggregation Summarize data .agg()
Reshape Change layout melt(), pivot()

🌟 You Did It!

You’ve just learned five powerful Pandas operations! Like a chef who now knows how to:

  • ✂️ Chop text perfectly (String Operations)
  • ⏰ Manage cooking times (DateTime)
  • 🗂️ Sort ingredients by type (Groupby)
  • 📏 Measure and count (Aggregation)
  • 🔄 Rearrange your kitchen (Wide vs Long)

Now go cook up some amazing data insights! 🐼✨

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.