What is astype() in Pandas?

astype() is a universal type converter that transforms data from one type to another, like converting strings to integers or floats to booleans.

How does to_numeric handle invalid data?

to_numeric has an errors parameter: 'raise' crashes on failure, 'coerce' turns bad values into NaN, and 'ignore' keeps originals unchanged.

What are nullable data types in Pandas?

Nullable types like Int64 (capital I) allow integers to contain missing values without converting to float. They use pd.NA for missing data.

What's the difference between datetime and timedelta?

Datetime represents a point in time (when something happens). Timedelta represents a duration (how long something takes).

Data Type Conversion in Pandas | Complete Guide

🎭 The Shape-Shifter’s Guide to Pandas Data Types

Imagine your data is like Play-Doh. Sometimes it arrives as a messy blob, but you need it to be a perfect star, or a circle, or a number. That’s what data type conversion does—it reshapes your data into exactly what you need!

🌟 The Big Picture

In Pandas, every column has a data type (like int64, float64, object, datetime64). Sometimes your data arrives wearing the wrong costume—numbers dressed as text, dates pretending to be strings. Type conversion is the magic wardrobe that helps your data wear the right outfit!

graph TD
    A["Raw Data"] --> B{Check Type}
    B --> C["Wrong Type?"]
    C -->|Yes| D["Convert It!"]
    C -->|No| E["Ready to Use"]
    D --> E

🪄 Type Conversion with astype()

What Is It?

astype() is like a universal translator. It takes data in one form and transforms it into another.

The Simple Story

Think of a toy that can change from a car to a robot. astype() is the button you press to make that change!

How It Works

import pandas as pd

# Create a simple DataFrame
df = pd.DataFrame({
    'price': ['100', '200', '300'],
    'quantity': [1.5, 2.0, 3.0]
})

# Convert price from text to numbers
df['price'] = df['price'].astype(int)

# Convert quantity to integer
df['quantity'] = df['quantity'].astype(int)

🎯 Key Points

One type at a time: Pick the destination type
Common conversions: int, float, str, bool
Be careful: Converting 'hello' to int will fail!

🔢 to_numeric() Conversion

What Is It?

to_numeric() is a specialist—it ONLY converts things to numbers. But it’s super smart about it!

The Simple Story

Imagine a picky robot that only accepts numbers. When you hand it "42", it happily takes it. When you hand it "hello", it can either reject it OR replace it with a special “I don’t know” value.

How It Works

import pandas as pd

# Some messy data
data = pd.Series(['1', '2', 'three', '4'])

# Option 1: Errors become NaN
clean = pd.to_numeric(data, errors='coerce')
# Result: [1.0, 2.0, NaN, 4.0]

# Option 2: Just ignore errors
clean = pd.to_numeric(data, errors='ignore')
# Result: ['1', '2', 'three', '4']

# Option 3: Raise an error
# pd.to_numeric(data, errors='raise')
# This would crash!

🎯 The `errors` Parameter

Value	What Happens
`'raise'`	Crashes if anything fails (default)
`'coerce'`	Bad values become NaN
`'ignore'`	Keeps original if conversion fails

📅 to_datetime() Conversion

What Is It?

to_datetime() is your time machine operator. It takes messy date strings and turns them into proper datetime objects.

The Simple Story

You write “December 9, 2025” on a piece of paper. Your computer doesn’t understand that. to_datetime() translates it into a language your computer speaks fluently!

How It Works

import pandas as pd

# Messy date strings
dates = pd.Series([
    '2025-12-09',
    'December 9, 2025',
    '09/12/2025'
])

# Convert to datetime
clean_dates = pd.to_datetime(dates)

# With custom format
custom = pd.to_datetime(
    '09-12-2025',
    format='%d-%m-%Y'
)

🎯 Common Format Codes

Code	Meaning	Example
`%Y`	4-digit year	2025
`%m`	Month (01-12)	12
`%d`	Day (01-31)	09
`%H`	Hour (00-23)	14
`%M`	Minute (00-59)	30

⏱️ to_timedelta() Conversion

What Is It?

to_timedelta() handles durations—how long something takes, not when it happens.

The Simple Story

Datetime = “The movie starts at 7 PM” (a point in time)
Timedelta = “The movie is 2 hours long” (a duration)

How It Works

import pandas as pd

# Duration strings
durations = pd.Series([
    '1 day',
    '2 hours',
    '30 minutes',
    '1 day 2 hours 30 min'
])

# Convert to timedelta
time_deltas = pd.to_timedelta(durations)

# Math with timedeltas!
start = pd.to_datetime('2025-12-09 10:00')
end = start + pd.to_timedelta('2 hours')
# end = 2025-12-09 12:00:00

🎯 Quick Reference

String	Result
`'1 day'`	1 days 00:00:00
`'2h'`	0 days 02:00:00
`'30min'`	0 days 00:30:00
`'1 day 5h 30min'`	1 days 05:30:00

✨ Convert to Best dtypes with convert_dtypes()

What Is It?

convert_dtypes() is like having a smart assistant who looks at ALL your data and picks the best type for each column automatically!

The Simple Story

Instead of you checking each column one by one, this helper scans everything and says: “This should be an integer, that should be a string, and this one looks like a boolean!”

How It Works

import pandas as pd

# Messy DataFrame
df = pd.DataFrame({
    'A': ['1', '2', '3'],
    'B': [1.0, 2.0, 3.0],
    'C': [True, False, True],
    'D': ['x', 'y', 'z']
})

# Let Pandas pick the best types
df_clean = df.convert_dtypes()

# Check what it chose
print(df_clean.dtypes)
# A    string
# B    Int64
# C    boolean
# D    string

🎯 Why Use It?

🚀 Automatic: No guessing what type to use
💾 Memory efficient: Picks optimal types
🛡️ Nullable-aware: Uses modern nullable types

🔍 Select dtypes with select_dtypes()

What Is It?

select_dtypes() is your data filter. It picks only the columns that match certain types.

The Simple Story

Imagine a classroom where you say “Everyone wearing blue, stand up!” That’s select_dtypes()—it selects columns based on their type!

How It Works

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30],
    'salary': [50000.0, 60000.0],
    'active': [True, False]
})

# Get only number columns
numbers = df.select_dtypes(include='number')
# Returns: age, salary

# Get only text columns
text = df.select_dtypes(include='object')
# Returns: name

# Exclude certain types
no_floats = df.select_dtypes(exclude='float')
# Returns: name, age, active

🎯 Common Type Selectors

Selector	What It Gets
`'number'`	All numeric types
`'int64'`	Only integers
`'float64'`	Only floats
`'object'`	Strings/mixed
`'bool'`	Booleans
`'datetime'`	Dates and times

🛡️ Nullable Data Types

What Is It?

Traditional Pandas types can’t handle missing values in integers. Nullable types solve this!

The Simple Story

Imagine a scorecard. Sometimes a student didn’t take the test—their score is “missing” (not zero!). Old integer types would crash or convert to float. Nullable types say “that’s okay, we’ll remember it’s missing!”

The Problem

import pandas as pd
import numpy as np

# Old way - integers can't have NaN
data = pd.Series([1, 2, np.nan, 4])
print(data.dtype)  # float64 (not int!)

The Solution

import pandas as pd

# New nullable integer type
data = pd.Series(
    [1, 2, pd.NA, 4],
    dtype='Int64'  # Capital I!
)
print(data.dtype)  # Int64
print(data)
# 0       1
# 1       2
# 2    <NA>
# 3       4

🎯 Nullable Types Reference

Old Type	Nullable Version
`int64`	`Int64`
`float64`	`Float64`
`bool`	`boolean`
`object`	`string`

Why Does This Matter?

✅ Keep integers as integers (even with missing data)
✅ Clearer distinction between “missing” and “zero”
✅ Better memory usage
✅ Consistent behavior across operations

🎬 Putting It All Together

graph TD
    A["Raw Data"] --> B{What conversion?}
    B -->|Any type| C["astype"]
    B -->|To numbers| D["to_numeric"]
    B -->|To dates| E["to_datetime"]
    B -->|To durations| F["to_timedelta"]
    B -->|Auto-detect| G["convert_dtypes"]
    B -->|Filter columns| H["select_dtypes"]
    C --> I["Clean Data"]
    D --> I
    E --> I
    F --> I
    G --> I
    H --> I

💡 Pro Tips

Always check types first: Use df.dtypes before converting
Use errors='coerce': When you’re not sure if all values are clean
Prefer nullable types: Use Int64 over int64 for integers with possible nulls
convert_dtypes() is your friend: Let Pandas do the heavy lifting when starting out

🎯 Quick Decision Guide

I need to…	Use this
Convert to any specific type	`astype()`
Convert messy strings to numbers	`to_numeric()`
Convert strings to dates	`to_datetime()`
Convert strings to durations	`to_timedelta()`
Auto-detect best types	`convert_dtypes()`
Filter columns by type	`select_dtypes()`
Handle missing values in integers	Nullable types (`Int64`)

🌟 Remember: Your data is like water—it takes the shape of its container. Type conversion lets YOU choose the container!

Data Type Conversion

Unable to load concept

Coming Soon...

🎭 The Shape-Shifter’s Guide to Pandas Data Types

🌟 The Big Picture

🪄 Type Conversion with astype()

What Is It?

The Simple Story

How It Works

🎯 Key Points

🔢 to_numeric() Conversion

What Is It?

The Simple Story

How It Works

🎯 The errors Parameter

📅 to_datetime() Conversion

What Is It?

The Simple Story

How It Works

🎯 Common Format Codes

⏱️ to_timedelta() Conversion

What Is It?

The Simple Story

How It Works

🎯 Quick Reference

✨ Convert to Best dtypes with convert_dtypes()

What Is It?

The Simple Story

How It Works

🎯 Why Use It?

🔍 Select dtypes with select_dtypes()

What Is It?

The Simple Story

How It Works

🎯 Common Type Selectors

🛡️ Nullable Data Types

What Is It?

The Simple Story

The Problem

The Solution

🎯 Nullable Types Reference

Why Does This Matter?

🎬 Putting It All Together

💡 Pro Tips

🎯 Quick Decision Guide

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue

🎯 The `errors` Parameter