Reshaping Data

Back

Loading concept...

🔄 Reshaping Data in Pandas: The Art of Data Origami

Imagine you have a big box of LEGO blocks. Sometimes you need to rebuild them into different shapes to see the picture better. That’s exactly what reshaping data is all about!


🎭 The Big Picture: Why Reshape Data?

Think of your data like a photo. Sometimes the photo is sideways, sometimes it’s too wide to fit on your phone screen. Reshaping lets you flip, fold, and reorganize your data so it shows exactly what you need!

Our Analogy: The Kitchen Table 🍽️

Imagine a family dinner table:

  • Transpose = Spin the table 90 degrees
  • Pivot = Rearrange everyone’s seats by some rule
  • Melt = Turn the table into a long list
  • Stack/Unstack = Stack plates or spread them out

1️⃣ Transpose DataFrame: Flip the Table!

What Is It?

Transpose means swapping rows and columns. Like turning your notebook sideways!

The Magic Word

df.T

That’s it! Just .T and your table flips!

Simple Example

Before (Original Table):

Name Age Score
Alice 10 95
Bob 11 88

After df.T (Transposed):

0 1
Name Alice Bob
Age 10 11
Score 95 88

Real-Life Use

  • When your table is too wide for the screen
  • When you want to compare columns side by side
import pandas as pd

data = {'Name': ['Alice', 'Bob'],
        'Age': [10, 11],
        'Score': [95, 88]}
df = pd.DataFrame(data)

# Flip it!
flipped = df.T
print(flipped)

🧠 Remember: .T = Turn it around!


2️⃣ Pivot Method: Musical Chairs for Data

What Is It?

Pivot reorganizes your data by choosing:

  • What becomes the new row labels (index)
  • What becomes the new column labels (columns)
  • What values fill the table

The Recipe

df.pivot(
    index='column_for_rows',
    columns='column_for_columns',
    values='column_for_values'
)

Simple Example

Before:

Date City Temperature
Monday Paris 20
Monday London 15
Tuesday Paris 22
Tuesday London 14

After Pivot:

df.pivot(index='Date',
         columns='City',
         values='Temperature')
Date London Paris
Monday 15 20
Tuesday 14 22

Code Example

data = {
    'Date': ['Mon', 'Mon', 'Tue', 'Tue'],
    'City': ['Paris', 'London', 'Paris', 'London'],
    'Temp': [20, 15, 22, 14]
}
df = pd.DataFrame(data)

pivoted = df.pivot(
    index='Date',
    columns='City',
    values='Temp'
)
print(pivoted)

⚠️ Warning: Pivot fails if you have duplicate combinations!


3️⃣ Pivot Tables: The Smart Calculator

What Is It?

Pivot tables are like pivot BUT with superpowers! They can:

  • Handle duplicates (by calculating averages, sums, etc.)
  • Add totals automatically
  • Group things cleverly

The Recipe

pd.pivot_table(
    df,
    values='what_to_calculate',
    index='row_groups',
    columns='column_groups',
    aggfunc='how_to_calculate'
)

Simple Example

If Alice scored 90 and 95 on two tests, pivot_table can average them!

data = {
    'Student': ['Alice', 'Alice', 'Bob'],
    'Subject': ['Math', 'Math', 'Math'],
    'Score': [90, 95, 88]
}
df = pd.DataFrame(data)

table = pd.pivot_table(
    df,
    values='Score',
    index='Student',
    aggfunc='mean'  # Calculate average
)
print(table)

Result:

Student Score
Alice 92.5
Bob 88.0

Common aggfunc Options

  • 'mean' = average
  • 'sum' = add them up
  • 'count' = how many
  • 'max' = biggest
  • 'min' = smallest

🎯 Pro Tip: Use margins=True to add row/column totals!


4️⃣ Crosstab Function: The Counter

What Is It?

Crosstab counts how many times things appear together. Like counting how many kids like pizza AND are in 5th grade!

The Recipe

pd.crosstab(
    df['category1'],
    df['category2']
)

Simple Example

Question: How many students from each grade chose each lunch?

data = {
    'Grade': ['5th', '5th', '6th', '6th', '5th'],
    'Lunch': ['Pizza', 'Salad', 'Pizza', 'Pizza', 'Pizza']
}
df = pd.DataFrame(data)

result = pd.crosstab(df['Grade'], df['Lunch'])
print(result)

Result:

Grade Pizza Salad
5th 2 1
6th 2 0

Extras

  • Add margins=True for row/column totals
  • Add normalize=True for percentages
pd.crosstab(
    df['Grade'],
    df['Lunch'],
    margins=True,
    normalize='index'  # Percentages per row
)

🧮 Remember: Crosstab = Cross + Tab(ulate) = Count across categories!


5️⃣ Melt Function: Unpivoting (The Opposite of Pivot!)

What Is It?

Melt takes a wide table and makes it long and skinny. Like melting a block of cheese into a stretchy string!

The Concept

WIDE TABLE (before melt):
| Name  | Math | Science |
|-------|------|---------|
| Alice | 90   | 85      |

LONG TABLE (after melt):
| Name  | Subject | Score |
|-------|---------|-------|
| Alice | Math    | 90    |
| Alice | Science | 85    |

The Recipe

pd.melt(
    df,
    id_vars=['columns_to_keep'],
    value_vars=['columns_to_melt'],
    var_name='new_column_name',
    value_name='new_value_name'
)

Simple Example

data = {
    'Name': ['Alice', 'Bob'],
    'Math': [90, 85],
    'Science': [88, 92]
}
df = pd.DataFrame(data)

melted = pd.melt(
    df,
    id_vars=['Name'],
    value_vars=['Math', 'Science'],
    var_name='Subject',
    value_name='Score'
)
print(melted)

Result:

Name Subject Score
Alice Math 90
Alice Science 88
Bob Math 85
Bob Science 92

🍫 Memory Trick: Melt = Make it LONGER!


6️⃣ Stack Method: Stack Those Columns!

What Is It?

Stack takes columns and stacks them into rows. Like stacking pancakes!

Visual Flow

graph TD A["Wide DataFrame"] --> B["stack"] B --> C["Columns become row levels"] C --> D["Result: MultiIndex Series"]

Simple Example

data = {
    'Math': [90, 85],
    'Science': [88, 92]
}
df = pd.DataFrame(
    data,
    index=['Alice', 'Bob']
)

stacked = df.stack()
print(stacked)

Result:

Alice  Math       90
       Science    88
Bob    Math       85
       Science    92

When To Use Stack?

  • When you need a “longer” format
  • When preparing data for certain charts
  • When you want a MultiIndex (fancy index with levels)

📚 Remember: Stack = Put columns on top of each other!


7️⃣ Unstack Method: Spread Those Rows!

What Is It?

Unstack is the opposite of stack. It takes stacked rows and spreads them back into columns!

Visual Flow

graph TD A["Stacked Data"] --> B["unstack"] B --> C["Row levels become columns"] C --> D["Result: Wide DataFrame"]

Simple Example

# Start with stacked data
stacked = pd.Series(
    [90, 88, 85, 92],
    index=pd.MultiIndex.from_tuples([
        ('Alice', 'Math'),
        ('Alice', 'Science'),
        ('Bob', 'Math'),
        ('Bob', 'Science')
    ])
)

# Unstack it!
unstacked = stacked.unstack()
print(unstacked)

Result:

Math Science
Alice 90 88
Bob 85 92

Key Point

  • stack() and unstack() are inverses
  • df.stack().unstack() gets you back to the original!

🔄 Remember: Unstack = Undo the stack!


🗺️ The Complete Map

graph LR A["Your Data"] --> B{Need to...} B -->|Flip rows↔cols| C["Transpose .T"] B -->|Reorganize| D["Pivot"] B -->|Reorganize + Calculate| E["Pivot Table"] B -->|Count combinations| F["Crosstab"] B -->|Make longer| G["Melt"] B -->|Cols→Rows| H["Stack"] B -->|Rows→Cols| I["Unstack"]

🎯 Quick Reference Table

Method What It Does When To Use
.T Swap rows & columns Table too wide
pivot() Reorganize table Unique combinations
pivot_table() Reorganize + calculate Duplicates exist
crosstab() Count combinations Frequency counts
melt() Wide → Long Need “tidy” data
stack() Columns → Rows Create MultiIndex
unstack() Rows → Columns Undo stack

💡 Final Wisdom

Reshaping data is like having different camera angles:

  • Same information, different view!
  • Choose the shape that tells your story best
  • Practice with small examples first

You’ve got this! 🚀

“Data is like water. It takes the shape of whatever container you put it in. Reshaping is just picking the right container!”

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.