GroupBy Operations

Back

Loading concept...

🎯 GroupBy Operations: Sorting Your Toy Collection!

The Big Idea

Imagine you have a huge toy box with hundreds of toys mixed together. Cars, dolls, blocks, action figures — all jumbled up!

Now, what if someone asked: “How many cars do you have?” or “What’s the total value of all your dolls?”

You’d need to:

  1. Group all similar toys together (cars in one pile, dolls in another)
  2. Count or measure each pile

That’s exactly what GroupBy does in Pandas! It’s like having a magical helper that instantly sorts your messy toy box into neat piles and tells you interesting facts about each pile.


🧠 The GroupBy Concept

What is GroupBy?

GroupBy is a three-step process:

graph TD A["🎲 Your Messy Data"] --> B["✂️ SPLIT: Make Groups"] B --> C["⚙️ APPLY: Do Something"] C --> D["🧩 COMBINE: Get Results"]

Split → Apply → Combine

Think of it like this:

  • SPLIT: Separate toys by type (cars here, dolls there)
  • APPLY: Count each pile, or add up their prices
  • COMBINE: Put all the answers together in a nice report

Real Example

Let’s say you have sales data:

import pandas as pd

# Your messy data
sales = pd.DataFrame({
    'Store': ['Mall', 'Mall', 'Downtown', 'Downtown'],
    'Item': ['Apple', 'Banana', 'Apple', 'Banana'],
    'Sold': [50, 30, 40, 25]
})
Store Item Sold
Mall Apple 50
Mall Banana 30
Downtown Apple 40
Downtown Banana 25

📦 GroupBy with Single Column

The Simplest Way to Group

Want to know total sales per store? Group by the Store column!

# Group by Store, sum up Sold
result = sales.groupby('Store')['Sold'].sum()
print(result)

Output:

Store
Downtown    65
Mall        80
Name: Sold, dtype: int64

What Just Happened?

graph TD A["All Sales Data"] --> B{Group by Store} B --> C["📍 Downtown Pile<br>40 + 25 = 65"] B --> D["🏬 Mall Pile<br>50 + 30 = 80"]

It’s like asking: “Put all Downtown receipts in one pile, all Mall receipts in another, then add up each pile!”

More Single Column Examples

# Average sales per store
sales.groupby('Store')['Sold'].mean()

# Count items per store
sales.groupby('Store')['Sold'].count()

# Maximum sale per store
sales.groupby('Store')['Sold'].max()

🎨 GroupBy with Multiple Columns

Going Deeper with Two Keys

What if you want to know sales for each item at each store?

Group by BOTH columns!

# Group by Store AND Item
result = sales.groupby(['Store', 'Item'])['Sold'].sum()
print(result)

Output:

Store     Item
Downtown  Apple     40
          Banana    25
Mall      Apple     50
          Banana    30
Name: Sold, dtype: int64

The Magic of Multiple Keys

graph TD A["All Sales"] --> B{Group by Store + Item} B --> C["Downtown + Apple: 40"] B --> D["Downtown + Banana: 25"] B --> E["Mall + Apple: 50"] B --> F["Mall + Banana: 30"]

It’s like sorting toys twice:

  1. First, by toy type (cars, dolls)
  2. Then, by color (red cars, blue cars)

Now you have super-specific piles!

Getting a Clean Table Back

# Reset index for a regular table
result = sales.groupby(
    ['Store', 'Item']
)['Sold'].sum().reset_index()
Store Item Sold
Downtown Apple 40
Downtown Banana 25
Mall Apple 50
Mall Banana 30

⚡ GroupBy Aggregation with agg()

One Function to Rule Them All

The .agg() method lets you apply any aggregation function!

# Using agg() with a single function
result = sales.groupby('Store')['Sold'].agg('sum')

Why Use agg()?

Because you can use custom functions!

# Custom function: range of sales
def sales_range(x):
    return x.max() - x.min()

result = sales.groupby('Store')['Sold'].agg(sales_range)

Output:

Store
Downtown    15  # (40 - 25)
Mall        20  # (50 - 30)
Name: Sold, dtype: int64

Built-in Functions You Can Use

Function What It Does
'sum' Add all values
'mean' Average
'count' How many?
'min' Smallest
'max' Biggest
'std' How spread out?
'first' First value
'last' Last value

🎭 Multiple Aggregation Functions

Ask Multiple Questions at Once!

Why ask one question when you can ask many?

# Multiple aggregations on one column
result = sales.groupby('Store')['Sold'].agg(
    ['sum', 'mean', 'count']
)

Output:

sum mean count
Downtown 65 32.5 2
Mall 80 40.0 2

Different Functions for Different Columns

# Different aggregations per column
result = sales.groupby('Store').agg({
    'Sold': ['sum', 'mean'],
    'Item': 'count'
})

This is like asking:

  • For Sold: Give me total and average
  • For Item: Just count how many

The Power Move

# Full control with a dictionary
sales.groupby('Store').agg({
    'Sold': ['sum', 'mean', 'max', 'min'],
    'Item': ['count', 'nunique']
})

nunique = how many unique items (no repeats counted)


🏷️ Named Aggregation

The Cleanest Way to GroupBy

Named aggregation lets you:

  1. Choose which column to aggregate
  2. Pick the function to use
  3. Give it a nice name

All in one beautiful line!

result = sales.groupby('Store').agg(
    total_sold=('Sold', 'sum'),
    avg_sold=('Sold', 'mean'),
    num_items=('Sold', 'count')
)

Output:

Store total_sold avg_sold num_items
Downtown 65 32.5 2
Mall 80 40.0 2

The Syntax Explained

new_column_name=('source_column', 'function')

It’s like saying:

“Create a column called total_sold by taking the Sold column and applying sum

Why Named Aggregation is Amazing

Before (messy column names):

       Sold
        sum mean
Store
Mall     80   40

After (clean and clear):

       total_sold  avg_sold
Store
Mall           80        40

Complete Example

# Professional-looking summary
summary = sales.groupby('Store').agg(
    total_revenue=('Sold', 'sum'),
    average_sale=('Sold', 'mean'),
    best_sale=('Sold', 'max'),
    worst_sale=('Sold', 'min'),
    transaction_count=('Sold', 'count')
).reset_index()

🎯 Quick Reference

Want to… Code
Group by one column df.groupby('col')
Group by multiple df.groupby(['a', 'b'])
Sum a column .['col'].sum()
Use any function .agg('func')
Multiple functions .agg(['sum', 'mean'])
Named results .agg(name=('col', 'func'))

🚀 You Did It!

You just learned how to:

Group data by one or more columns ✅ Apply aggregation functions like sum, mean, count ✅ Combine multiple functions in one go ✅ Name your results for clean, professional output

GroupBy is like having a super-smart assistant who can instantly organize any messy data and answer questions about each group. Now you have that power too!

Remember: Split → Apply → Combine

That’s all there is to it! 🎉

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.