Pandas Fundamentals

Back

Loading concept...

🐼 Pandas Fundamentals: Your Data Kitchen Adventure

Imagine you’re a chef in a magical kitchen. Instead of cooking food, you’re cooking DATA! Pandas is your super-powered cooking assistant that helps you organize, clean, and transform ingredients (data) into delicious meals (insights).


🎯 What You’ll Learn

Think of this journey like learning to cook in a restaurant kitchen:

  1. Pandas Series β†’ Single ingredient containers
  2. Pandas DataFrame β†’ Your recipe organizer
  3. Index Concepts β†’ Labels on your containers
  4. Reading & Writing Data β†’ Getting ingredients in and out
  5. Data Selection & Filtering β†’ Picking the right ingredients
  6. Handling Missing Data β†’ Dealing with empty containers
  7. Data Type Conversion β†’ Transforming ingredients

πŸ“¦ Pandas Series: Your Single-Column Container

What is a Series?

Think of a Series like a single column of labeled boxes on a shelf. Each box has:

  • A label (index) on the outside
  • One thing inside (the value)

Real Life Example:

  • Your piggy bank slots labeled by month
  • Each slot has coins inside
import pandas as pd

# Create a Series - like labeling jars
fruits = pd.Series(
    [5, 3, 8, 2],
    index=['apples', 'bananas', 'oranges', 'grapes']
)

print(fruits)

Output:

apples     5
bananas    3
oranges    8
grapes     2
dtype: int64

Quick Operations on Series

# Get total fruits
total = fruits.sum()  # 18

# Find average
average = fruits.mean()  # 4.5

# Get specific fruit
apple_count = fruits['apples']  # 5

πŸ“Š Pandas DataFrame: Your Super Spreadsheet

What is a DataFrame?

A DataFrame is like a table with rows and columns. Think of it as:

  • A notebook with many columns
  • Each column is a Series
  • Like a mini Excel inside Python!
graph TD A["DataFrame"] --> B["Column 1<br>Series"] A --> C["Column 2<br>Series"] A --> D["Column 3<br>Series"] B --> E["Row 0"] B --> F["Row 1"] B --> G["Row 2"]

Creating a DataFrame

# Like making a class roster
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [10, 11, 10],
    'grade': ['A', 'B', 'A']
})

print(students)

Output:

      name  age grade
0    Alice   10     A
1      Bob   11     B
2  Charlie   10     A

DataFrame from a List

data = [
    ['Pizza', 10],
    ['Burger', 8],
    ['Salad', 5]
]

menu = pd.DataFrame(
    data,
    columns=['food', 'price']
)

🏷️ Pandas Index Concepts: Your Labeling System

What is an Index?

The index is like name tags on lockers. It helps you find things fast!

graph TD A["Index = Labels"] --> B["0: First Row"] A --> C["1: Second Row"] A --> D["2: Third Row"] E["Custom Index"] --> F["Alice: First Row"] E --> G["Bob: Second Row"] E --> H["Charlie: Third Row"]

Default vs Custom Index

# Default index: 0, 1, 2, 3...
df = pd.DataFrame({'score': [85, 90, 78]})
# Index: 0, 1, 2

# Custom index: meaningful labels
df_custom = pd.DataFrame(
    {'score': [85, 90, 78]},
    index=['Alice', 'Bob', 'Charlie']
)

Working with Index

# Set a column as index
df = df.set_index('name')

# Reset back to numbers
df = df.reset_index()

# Access by index label
df.loc['Alice']

# Access by position
df.iloc[0]

πŸ“‚ Reading and Writing Data: Import & Export

Reading Data Files

Think of this as opening a recipe book and copying recipes into your kitchen.

graph TD A["External Files"] --> B["CSV Files"] A --> C["Excel Files"] A --> D["JSON Files"] B --> E["pd.read_csv"] C --> F["pd.read_excel"] D --> G["pd.read_json"] E --> H["DataFrame in Python"] F --> H G --> H

Reading CSV (Most Common!)

# Read a CSV file
df = pd.read_csv('students.csv')

# See first 5 rows
print(df.head())

# See last 3 rows
print(df.tail(3))

Reading Excel

# Read Excel file
df = pd.read_excel('data.xlsx')

# Read specific sheet
df = pd.read_excel(
    'data.xlsx',
    sheet_name='Sheet2'
)

Writing Data Out

# Save to CSV
df.to_csv('output.csv', index=False)

# Save to Excel
df.to_excel('output.xlsx', index=False)

# Save to JSON
df.to_json('output.json')

🎯 Data Selection and Filtering: Finding What You Need

Selecting Columns

# Single column (returns Series)
names = df['name']

# Multiple columns (returns DataFrame)
subset = df[['name', 'age']]

Selecting Rows

# By position (iloc)
first_row = df.iloc[0]
first_three = df.iloc[0:3]

# By label (loc)
alice_row = df.loc['Alice']

Filtering with Conditions

This is like asking questions to your data!

# Who is older than 10?
older = df[df['age'] > 10]

# Who got grade A?
grade_a = df[df['grade'] == 'A']

# Combine conditions with &
smart_young = df[
    (df['grade'] == 'A') &
    (df['age'] == 10)
]
graph TD A["All Data"] --> B{age > 10?} B -->|Yes| C["Keep Row"] B -->|No| D["Skip Row"] C --> E["Filtered Data"]

Quick Selection Examples

# First 5 rows
df.head()

# Last 5 rows
df.tail()

# Random sample of 3 rows
df.sample(3)

# Rows 10 to 20
df.iloc[10:21]

πŸ•³οΈ Handling Missing Data: Dealing with Empty Boxes

What is Missing Data?

Missing data shows as NaN (Not a Number). It’s like:

  • Empty boxes in your storage
  • Blank cells in a spreadsheet
  • Information we don’t have yet

Finding Missing Data

# Check for missing values
print(df.isnull())

# Count missing in each column
print(df.isnull().sum())

# Check if any value is missing
print(df.isnull().any())

Dealing with Missing Data

# Option 1: Remove rows with missing
df_clean = df.dropna()

# Option 2: Fill with a value
df_filled = df.fillna(0)

# Option 3: Fill with average
df['age'] = df['age'].fillna(
    df['age'].mean()
)

# Option 4: Fill with previous value
df_ffill = df.fillna(method='ffill')
graph TD A["Missing Data?"] --> B{How to handle?} B --> C["dropna<br>Remove it"] B --> D["fillna<br>Replace it"] D --> E["With 0"] D --> F["With mean"] D --> G["With previous"]

πŸ”„ Data Type Conversion: Transforming Your Data

Why Convert Types?

Sometimes data comes in wrong format:

  • Numbers stored as text β€œ123”
  • Dates stored as text β€œ2024-01-15”
  • Categories stored as text

Checking Data Types

# See all column types
print(df.dtypes)

# Common types:
# int64 = whole numbers
# float64 = decimal numbers
# object = text/mixed
# bool = True/False
# datetime64 = dates

Converting Types

# Text to number
df['price'] = df['price'].astype(int)

# Number to text
df['id'] = df['id'].astype(str)

# Text to datetime
df['date'] = pd.to_datetime(df['date'])

# Text to category (saves memory!)
df['color'] = df['color'].astype('category')

Handling Conversion Errors

# Safe conversion with errors='coerce'
# Bad values become NaN instead of error
df['age'] = pd.to_numeric(
    df['age'],
    errors='coerce'
)
graph TD A["Original Type"] --> B{Convert to?} B --> C["int/float<br>astype or to_numeric"] B --> D["string<br>astype str"] B --> E["datetime<br>pd.to_datetime"] B --> F["category<br>astype category"]

πŸŽ‰ Quick Reference Card

Task Code
Create Series pd.Series([1,2,3])
Create DataFrame pd.DataFrame({'a':[1,2]})
Read CSV pd.read_csv('file.csv')
Write CSV df.to_csv('out.csv')
Select column df['column']
Filter rows df[df['age'] > 18]
Check missing df.isnull().sum()
Fill missing df.fillna(0)
Convert type df['col'].astype(int)

πŸš€ You Did It!

You now understand the 7 fundamental pillars of Pandas:

  1. βœ… Series - Single columns of data
  2. βœ… DataFrame - Tables with rows and columns
  3. βœ… Index - Labels for fast lookup
  4. βœ… Read/Write - Getting data in and out
  5. βœ… Selection/Filtering - Finding specific data
  6. βœ… Missing Data - Handling empty values
  7. βœ… Type Conversion - Transforming data types

Next step: Practice with real data! Try loading a CSV file and explore it using what you learned.

Remember: Every data scientist started exactly where you are now. Keep practicing, keep exploring, and you’ll master Pandas in no time! 🐼✨

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.