What is memory downcasting in Pandas?

Downcasting converts large data types to smaller ones. For example, int64 uses 8 bytes while int8 uses 1 byte for storing small numbers.

Why are vectorized operations faster than loops?

Vectorized operations process entire columns at once instead of row by row. A loop takes 10 seconds for 1 million rows; vectorized takes 0.01 seconds.

What's the difference between itertuples and iterrows?

itertuples is much faster because it returns named tuples. iterrows returns a Series for each row, making it about 15x slower.

How do I check memory usage in Pandas?

Use df.info(memory_usage='deep') to see total memory or df.memory_usage(deep=True) to get exact bytes per column.

Pandas Performance & Configuration Guide

🐼 Pandas Performance & Configuration: Make Your Data Fly!

Imagine your data is a big backpack. If you stuff it with heavy rocks when you only need pebbles, you’ll walk slow. Let’s learn to pack smart and move fast!

🎒 The Backpack Analogy

Think of your computer’s memory like a backpack for a hike:

Heavy backpack = slow walking = slow code
Light backpack = fast walking = fast code

Pandas helps you pack your data backpack smartly!

📊 Memory Usage: Know What’s In Your Backpack

Before packing smarter, you need to see what’s already inside.

How to Check Memory

import pandas as pd

df = pd.read_csv('my_data.csv')

# See total memory used
print(df.info(memory_usage='deep'))

# Get exact bytes
memory_bytes = df.memory_usage(deep=True)
print(memory_bytes)

# Total in megabytes
total_mb = memory_bytes.sum() / 1024**2
print(f"Total: {total_mb:.2f} MB")

What You’ll See

Column      Memory
─────────   ──────
name        2.4 MB
age         0.8 MB
city        4.1 MB
Total:      7.3 MB

Why it matters: If your backpack weighs 7 MB when it could weigh 2 MB, you’re carrying extra weight for nothing!

🪨➡️🪶 Memory Downcasting: Turn Rocks Into Pebbles

The Problem

Pandas sometimes uses a big container for small things:

Storing the number 5:
├── int64 uses 8 bytes (like a big box)
└── int8 uses 1 byte (like a tiny box)

That’s 8x more space than needed!

The Solution: Downcast!

# Before: Numbers stored as int64 (big)
df['age'] = df['age'].astype('int64')

# After: Numbers stored as int8 (tiny)
df['age'] = pd.to_numeric(
    df['age'],
    downcast='integer'
)

Downcast Cheatsheet

Data Type	Use When	Command
`integer`	Whole numbers (1, 2, 99)	`downcast='integer'`
`unsigned`	Only positive (0, 1, 2)	`downcast='unsigned'`
`float`	Decimals (3.14, 9.99)	`downcast='float'`

Category Type: The Secret Weapon

For columns with repeated values (like city names):

# Before: Each "New York" stored separately
# Uses: 1000 copies × full string

# After: Store once, reference many times
df['city'] = df['city'].astype('category')
# Uses: 1 copy + tiny numbers

graph TD
    A["1000 rows with &&#35;39;New York&&#35;39;"] --> B{Category Type}
    B --> C["Store &&#35;39;New York&&#35;39; once"]
    B --> D["Use number 1 for each row"]
    D --> E["Huge memory savings!"]

🚀 Vectorized Operations: The Conveyor Belt

The Slow Way: One by One

Imagine checking each apple for bruises, one at a time:

# SLOW - like picking apples one by one
result = []
for price in df['price']:
    result.append(price * 1.1)
df['new_price'] = result

The Fast Way: All At Once

Imagine a magic machine that checks ALL apples instantly:

# FAST - like a conveyor belt
df['new_price'] = df['price'] * 1.1

Speed difference:

Loop: 10 seconds for 1 million rows
Vectorized: 0.01 seconds for 1 million rows

Common Vectorized Operations

# Math on whole columns (instant!)
df['total'] = df['price'] * df['quantity']
df['discounted'] = df['price'] * 0.9
df['rounded'] = df['price'].round(2)

# Conditions on whole columns
df['expensive'] = df['price'] > 100
df['category'] = np.where(
    df['price'] > 50,
    'Premium',
    'Budget'
)

🧮 Eval Method: Write Math Like a Human

The Problem

Complex math looks messy in code:

# Hard to read!
df['result'] = (
    (df['a'] + df['b']) * df['c']
    / (df['d'] - df['e'])
)

The Solution: eval()

# Easy to read - like writing on paper!
df['result'] = df.eval(
    '(a + b) * c / (d - e)'
)

Why Use eval()?

Cleaner code - reads like math
Faster - uses less memory for big data
Multiple columns at once:

df.eval('''
    profit = revenue - cost
    margin = profit / revenue
    is_good = margin > 0.2
''', inplace=True)

Query: eval’s Cousin for Filtering

# Instead of this:
big_sales = df[
    (df['amount'] > 1000) &
    (df['region'] == 'West')
]

# Write this:
big_sales = df.query(
    'amount > 1000 and region == "West"'
)

🚶 Row Iteration Methods: When You Must Walk

Sometimes you NEED to look at each row. Here are your options from fastest to slowest:

Option 1: itertuples() - The Fast Walker 🏃

for row in df.itertuples():
    print(row.name, row.age)
    # Access by name: row.name
    # Access by position: row[1]

Option 2: iterrows() - The Slow Walker 🐢

for index, row in df.iterrows():
    print(row['name'], row['age'])
    # Returns a Series (slower)

Speed Comparison

graph LR
    A["1 Million Rows"] --> B["itertuples: 2 sec"]
    A --> C["iterrows: 30 sec"]
    A --> D["apply: 5 sec"]
    B --> E["Winner!"]

The Golden Rule

❌ Avoid loops when possible
✅ Use vectorized operations first
⚡ If you must loop, use itertuples()

When Looping Is Okay

Complex logic that can’t be vectorized
When you need the index AND row data
Processing rows with external APIs

⚙️ Display Option Settings: Control What You See

The Problem

Pandas sometimes hides your data:

     A      B  ...  Z
0    1      2  ...  26
..  ..     ..  ...  ..
999  1      2  ...  26

[1000 rows × 26 columns]

Take Control!

import pandas as pd

# See more rows
pd.set_option('display.max_rows', 100)

# See more columns
pd.set_option('display.max_columns', 50)

# Wider display
pd.set_option('display.width', 200)

# More decimal places
pd.set_option('display.precision', 4)

# Don't truncate long text
pd.set_option('display.max_colwidth', None)

Handy Options Table

Option	What It Does	Example
`max_rows`	Rows shown	`100`
`max_columns`	Columns shown	`50`
`width`	Total display width	`200`
`precision`	Decimal places	`4`
`max_colwidth`	Column text length	`None` (all)

Temporary Changes

# Change just for one block
with pd.option_context(
    'display.max_rows', 10,
    'display.precision', 2
):
    print(df)
# Back to normal after!

Reset Everything

# Oops! Reset all options
pd.reset_option('all')

# Reset just one
pd.reset_option('display.max_rows')

🎯 Quick Summary

graph LR
    A["Pandas Performance"] --> B["Memory"]
    A --> C["Speed"]
    A --> D["Display"]

    B --> B1["Check with info"]
    B --> B2["Downcast numbers"]
    B --> B3["Use category"]

    C --> C1["Vectorize first!"]
    C --> C2["Use eval for math"]
    C --> C3["itertuples if needed"]

    D --> D1["set_option"]
    D --> D2["option_context"]
    D --> D3["reset_option"]

🌟 Remember This!

Goal	Do This	Avoid This
Check memory	`df.info(memory_usage='deep')`	Guessing
Save memory	`downcast='integer'`	Default int64
Fast math	`df['a'] * df['b']`	For loops
Clean math	`df.eval('a * b')`	Messy brackets
Loop data	`itertuples()`	`iterrows()`
See more	`set_option()`	Truncated output

You did it! 🎉

Your Pandas backpack is now lighter, your code is faster, and you can see exactly what you need. Go make your data fly!

Performance and Configuration

Unable to load concept

Coming Soon...

🐼 Pandas Performance & Configuration: Make Your Data Fly!

🎒 The Backpack Analogy

📊 Memory Usage: Know What’s In Your Backpack

How to Check Memory

What You’ll See

🪨➡️🪶 Memory Downcasting: Turn Rocks Into Pebbles

The Problem

The Solution: Downcast!

Downcast Cheatsheet

Category Type: The Secret Weapon

🚀 Vectorized Operations: The Conveyor Belt

The Slow Way: One by One

The Fast Way: All At Once

Common Vectorized Operations

🧮 Eval Method: Write Math Like a Human

The Problem

The Solution: eval()

Why Use eval()?

Query: eval’s Cousin for Filtering

🚶 Row Iteration Methods: When You Must Walk

Option 1: itertuples() - The Fast Walker 🏃

Option 2: iterrows() - The Slow Walker 🐢

Speed Comparison

The Golden Rule

When Looping Is Okay

⚙️ Display Option Settings: Control What You See

The Problem

Take Control!

Handy Options Table

Temporary Changes

Reset Everything

🎯 Quick Summary

🌟 Remember This!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue