π Pandas: Common File Formats
Your Dataβs Travel Documents
π The Story: Data is a Traveler
Imagine your data is like a traveler who needs to visit different places. Just like you need a passport to travel between countries, your data needs file formats to travel between programs!
- CSV = A simple backpack π (light, works everywhere)
- Excel = A fancy suitcase π§³ (organized with compartments)
- JSON = A messenger bag π¬ (perfect for web apps)
Pandas is like a magical translator who can read any passport and help your data travel anywhere!
π Reading CSV Files
What is CSV?
CSV stands for Comma-Separated Values. Think of it like a shopping list where each item is separated by commas.
name,age,city
Alice,25,New York
Bob,30,London
How to Read It
import pandas as pd
# Read a CSV file
df = pd.read_csv('my_data.csv')
# That's it! Your data is now ready
print(df)
Real Life Example
You downloaded a list of your favorite songs from Spotify. Itβs a CSV file. Pandas reads it in ONE line!
graph TD A[π CSV File] --> B[πΌ Pandas reads it] B --> C[π DataFrame ready!]
Pro Tips π
# Skip bad rows
df = pd.read_csv('file.csv',
error_bad_lines=False)
# Read only first 100 rows
df = pd.read_csv('file.csv',
nrows=100)
# Use different separator
df = pd.read_csv('file.csv',
sep=';')
πΎ Writing CSV Files
Why Save to CSV?
You cleaned your data. Now you want to share it. CSV works everywhere - Excel, Google Sheets, any program!
How to Save
# Save your DataFrame to CSV
df.to_csv('my_output.csv')
# Don't want row numbers?
df.to_csv('my_output.csv',
index=False)
Example: Saving Your Work
# You fixed some data
df['price'] = df['price'] * 1.1
# Save the updated version
df.to_csv('updated_prices.csv',
index=False)
print("Saved! β
")
Quick Options
| Parameter | What It Does |
|---|---|
index=False |
No row numbers |
header=False |
No column names |
sep=';' |
Use semicolon |
π Reading Excel Files
Excel = The Organized Suitcase
Excel files can have multiple sheets - like a suitcase with different compartments!
Basic Read
# Read Excel file
df = pd.read_excel('sales.xlsx')
# Read specific sheet
df = pd.read_excel('sales.xlsx',
sheet_name='January')
Real Life Example
Your boss sends monthly sales in Excel. Each month is a different sheet.
graph TD A[π Excel File] --> B{Which Sheet?} B -->|Sheet 1| C[January Data] B -->|Sheet 2| D[February Data] B -->|Sheet 3| E[March Data]
Need openpyxl!
β οΈ First time? Install the reader:
pip install openpyxl
π Writing Excel Files
Save Your Work to Excel
# Save to Excel
df.to_excel('output.xlsx',
index=False)
# Give the sheet a name
df.to_excel('output.xlsx',
sheet_name='Results',
index=False)
Example: Creating a Report
# Your analysis is done
summary = df.groupby('category').sum()
# Save as Excel report
summary.to_excel('monthly_report.xlsx',
sheet_name='Summary')
print("Report ready! π")
π Multi-Sheet Excel Handling
The Magic of Multiple Sheets
Sometimes you need ALL the sheets. Or you want to CREATE multiple sheets!
Reading All Sheets
# Read all sheets at once
all_sheets = pd.read_excel(
'data.xlsx',
sheet_name=None
)
# all_sheets is a dictionary!
# Key = sheet name
# Value = DataFrame
for name, data in all_sheets.items():
print(f"Sheet: {name}")
print(data.head())
Writing Multiple Sheets
# Create an Excel writer
with pd.ExcelWriter('report.xlsx') as writer:
# Write each DataFrame
sales_df.to_excel(writer,
sheet_name='Sales')
costs_df.to_excel(writer,
sheet_name='Costs')
summary_df.to_excel(writer,
sheet_name='Summary')
print("Multi-sheet Excel created! π")
graph TD A[π ExcelWriter] --> B[Sheet 1: Sales] A --> C[Sheet 2: Costs] A --> D[Sheet 3: Summary] B --> E[π One Excel File] C --> E D --> E
π¬ Reading JSON Files
What is JSON?
JSON is like a messenger for the web. It organizes data in pairs:
{
"name": "Alice",
"age": 25,
"city": "New York"
}
How to Read It
# Read JSON file
df = pd.read_json('data.json')
# Read from web API
df = pd.read_json(
'https://api.example.com/data'
)
Different JSON Shapes
JSON can be organized differently:
# Records format (most common)
# [{"name": "A"}, {"name": "B"}]
df = pd.read_json('file.json',
orient='records')
# Columns format
# {"name": ["A", "B"], "age": [1, 2]}
df = pd.read_json('file.json',
orient='columns')
Real Example: API Data
# Weather API returns JSON
weather = pd.read_json(
'weather_data.json'
)
print(weather[['city', 'temp']])
π€ Writing JSON Files
Share Data with Web Apps
# Save to JSON
df.to_json('output.json')
# Pretty format (human readable)
df.to_json('output.json',
indent=2)
# Different orientations
df.to_json('output.json',
orient='records')
Orient Options Explained
| Orient | Output Shape |
|---|---|
records |
List of dictionaries |
columns |
Dict of columns |
index |
Dict with index as keys |
values |
Just the values |
Example: Making API-Ready Data
# Prepare data for website
users = df[['id', 'name', 'email']]
# Save as JSON for API
users.to_json('api_users.json',
orient='records',
indent=2)
print("Ready for the web! π")
π― Quick Decision Guide
graph TD A{What do you need?} --> B[Simple sharing] A --> C[Multiple sheets] A --> D[Web/API use] B --> E[Use CSV π] C --> F[Use Excel π] D --> G[Use JSON π¬]
π You Did It!
Now you can:
- β Read CSV, Excel, and JSON files
- β Write data to any format
- β Handle multiple Excel sheets
- β Choose the right format for any job
Your data can now travel anywhere! π
π§ Remember This
| Task | CSV | Excel | JSON |
|---|---|---|---|
| Simple data | β Best | β Good | β OK |
| Multiple sheets | β No | β Best | β No |
| Web/API | β No | β No | β Best |
| Universal | β Best | β οΈ Needs Office | β Good |
When in doubt, use CSV. It works everywhere! π