π Pandas Foundations: Inspecting and Exporting Your Data
The Detectiveβs Toolkit π΅οΈ
Imagine you just received a mystery box filled with data. Before you can solve any puzzle, you need to look inside and understand what you have. Thatβs exactly what Pandas helps you do!
Think of a DataFrame like a filing cabinet with labeled drawers. Before organizing anything, you need to:
- See what types of things are in each drawer
- Get a quick summary of everything
- Check if any drawers are empty
- Eventually, share your organized files with others
Letβs open our detective toolkit!
π What Type Is This? The dtypes Attribute
The Label Maker
Every column in your DataFrame has a type β like labels on jars in your kitchen.
- Numbers (int64, float64) β For math stuff
- Text (object) β Words and sentences
- Dates (datetime64) β Calendar things
- True/False (bool) β Yes or No answers
See Your Types
import pandas as pd
# Our toy box of data
df = pd.DataFrame({
'name': ['Anna', 'Bob', 'Cara'],
'age': [10, 12, 11],
'height': [4.2, 4.8, 4.5],
'has_pet': [True, False, True]
})
# Look at all the labels!
print(df.dtypes)
Output:
name object
age int64
height float64
has_pet bool
dtype: object
π― Quick Translation
| Type | What It Means |
|---|---|
object |
Text/words |
int64 |
Whole numbers |
float64 |
Decimal numbers |
bool |
True or False |
π The Full Report: info() Method
Your Dataβs ID Card
The info() method is like asking your data: βTell me everything about yourself!β
It shows:
- β How many rows and columns
- β Column names
- β Data types
- β Missing values count
- β Memory usage
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 age 3 non-null int64
2 height 3 non-null float64
3 has_pet 3 non-null bool
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 227.0+ bytes
π Key Things to Notice
- 3 entries = 3 rows of data
- Non-Null Count = How many cells have real values
- memory usage = How much computer space it takes
π Quick Math Summary: describe() Method
The Statistics Wizard
Want quick math facts about your numbers? describe() does the magic!
print(df.describe())
Output:
age height
count 3.000000 3.000000
mean 11.000000 4.500000
std 1.000000 0.300000
min 10.000000 4.200000
25% 10.500000 4.350000
50% 11.000000 4.500000
75% 11.500000 4.650000
max 12.000000 4.800000
π― What Each Row Means
graph LR A[describe output] --> B[count: How many values] A --> C[mean: The average] A --> D[std: How spread out] A --> E[min: Smallest value] A --> F[25%: Quarter mark] A --> G[50%: Middle value] A --> H[75%: Three-quarter mark] A --> I[max: Biggest value]
Include Text Columns Too!
# See ALL columns, not just numbers
print(df.describe(include='all'))
This adds unique, top, and freq for text columns!
π Creating Empty DataFrames
Starting Fresh
Sometimes you need a blank canvas before painting. Hereβs how to create an empty DataFrame:
# Method 1: Completely empty
empty_df = pd.DataFrame()
# Method 2: Empty with column names
empty_df = pd.DataFrame(
columns=['name', 'score', 'grade']
)
# Method 3: Empty with specific types
empty_df = pd.DataFrame({
'name': pd.Series(dtype='object'),
'score': pd.Series(dtype='int64')
})
Why Create Empty DataFrames?
- π¦ To fill with data later
- π As a starting point for loops
- π To define your structure first
β Is This DataFrame Empty?
The Empty Check
Before working with data, itβs smart to check if thereβs actually anything there!
# Create an empty DataFrame
empty_df = pd.DataFrame()
# Check if it's empty
if empty_df.empty:
print("Nothing here! π")
else:
print("We have data! π¬")
Output: Nothing here! π
Two Ways to Check
# Method 1: The .empty attribute
df.empty # Returns True or False
# Method 2: Check the length
len(df) == 0 # Also True or False
Real-World Example
# After filtering, check if results exist
results = df[df['age'] > 100]
if results.empty:
print("No one is over 100 years old!")
else:
print(f"Found {len(results)} centenarians!")
π€ Exporting Your Data
Sharing Is Caring
Once youβve cleaned and organized your data, you want to save it or share it. Pandas can export to many formats!
Export to CSV (Most Common)
# Save to a CSV file
df.to_csv('my_data.csv', index=False)
# index=False means don't save row numbers
Export to Excel
# Save to Excel file
df.to_excel('my_data.xlsx', index=False)
Export to JSON
# Save as JSON
df.to_json('my_data.json')
# Pretty format for reading
df.to_json('my_data.json', indent=2)
More Export Options
graph LR A[DataFrame] --> B[.to_csv - Spreadsheets] A --> C[.to_excel - Microsoft Excel] A --> D[.to_json - Web apps] A --> E[.to_html - Web pages] A --> F[.to_sql - Databases] A --> G[.to_pickle - Python storage]
Quick Export Reference
| Method | File Type | Best For |
|---|---|---|
to_csv() |
.csv | Universal sharing |
to_excel() |
.xlsx | Business reports |
to_json() |
.json | Web applications |
to_html() |
.html | Web display |
to_sql() |
database | Large data storage |
π― Quick Summary Flow
graph TD A[Get DataFrame] --> B[Check dtypes] B --> C[Run info] C --> D[Run describe] D --> E{Is it empty?} E -->|Yes| F[Add data first!] E -->|No| G[Work with data] G --> H[Export when done] H --> I[CSV/Excel/JSON]
π You Did It!
You now know how to:
β
See data types with .dtypes
β
Get full info with .info()
β
Get statistics with .describe()
β Create empty DataFrames
β
Check if DataFrame is empty with .empty
β Export data to CSV, Excel, JSON, and more!
Youβre ready to inspect and export like a pro! π