Data Type Conversion

Back

Loading concept...

🎭 The Shape-Shifter’s Guide to Pandas Data Types

Imagine your data is like Play-Doh. Sometimes it arrives as a messy blob, but you need it to be a perfect star, or a circle, or a number. That’s what data type conversion does—it reshapes your data into exactly what you need!


🌟 The Big Picture

In Pandas, every column has a data type (like int64, float64, object, datetime64). Sometimes your data arrives wearing the wrong costume—numbers dressed as text, dates pretending to be strings. Type conversion is the magic wardrobe that helps your data wear the right outfit!

graph TD A["Raw Data"] --> B{Check Type} B --> C["Wrong Type?"] C -->|Yes| D["Convert It!"] C -->|No| E["Ready to Use"] D --> E

🪄 Type Conversion with astype()

What Is It?

astype() is like a universal translator. It takes data in one form and transforms it into another.

The Simple Story

Think of a toy that can change from a car to a robot. astype() is the button you press to make that change!

How It Works

import pandas as pd

# Create a simple DataFrame
df = pd.DataFrame({
    'price': ['100', '200', '300'],
    'quantity': [1.5, 2.0, 3.0]
})

# Convert price from text to numbers
df['price'] = df['price'].astype(int)

# Convert quantity to integer
df['quantity'] = df['quantity'].astype(int)

🎯 Key Points

  • One type at a time: Pick the destination type
  • Common conversions: int, float, str, bool
  • Be careful: Converting 'hello' to int will fail!

🔢 to_numeric() Conversion

What Is It?

to_numeric() is a specialist—it ONLY converts things to numbers. But it’s super smart about it!

The Simple Story

Imagine a picky robot that only accepts numbers. When you hand it "42", it happily takes it. When you hand it "hello", it can either reject it OR replace it with a special “I don’t know” value.

How It Works

import pandas as pd

# Some messy data
data = pd.Series(['1', '2', 'three', '4'])

# Option 1: Errors become NaN
clean = pd.to_numeric(data, errors='coerce')
# Result: [1.0, 2.0, NaN, 4.0]

# Option 2: Just ignore errors
clean = pd.to_numeric(data, errors='ignore')
# Result: ['1', '2', 'three', '4']

# Option 3: Raise an error
# pd.to_numeric(data, errors='raise')
# This would crash!

🎯 The errors Parameter

Value What Happens
'raise' Crashes if anything fails (default)
'coerce' Bad values become NaN
'ignore' Keeps original if conversion fails

đź“… to_datetime() Conversion

What Is It?

to_datetime() is your time machine operator. It takes messy date strings and turns them into proper datetime objects.

The Simple Story

You write “December 9, 2025” on a piece of paper. Your computer doesn’t understand that. to_datetime() translates it into a language your computer speaks fluently!

How It Works

import pandas as pd

# Messy date strings
dates = pd.Series([
    '2025-12-09',
    'December 9, 2025',
    '09/12/2025'
])

# Convert to datetime
clean_dates = pd.to_datetime(dates)

# With custom format
custom = pd.to_datetime(
    '09-12-2025',
    format='%d-%m-%Y'
)

🎯 Common Format Codes

Code Meaning Example
%Y 4-digit year 2025
%m Month (01-12) 12
%d Day (01-31) 09
%H Hour (00-23) 14
%M Minute (00-59) 30

⏱️ to_timedelta() Conversion

What Is It?

to_timedelta() handles durations—how long something takes, not when it happens.

The Simple Story

  • Datetime = “The movie starts at 7 PM” (a point in time)
  • Timedelta = “The movie is 2 hours long” (a duration)

How It Works

import pandas as pd

# Duration strings
durations = pd.Series([
    '1 day',
    '2 hours',
    '30 minutes',
    '1 day 2 hours 30 min'
])

# Convert to timedelta
time_deltas = pd.to_timedelta(durations)

# Math with timedeltas!
start = pd.to_datetime('2025-12-09 10:00')
end = start + pd.to_timedelta('2 hours')
# end = 2025-12-09 12:00:00

🎯 Quick Reference

String Result
'1 day' 1 days 00:00:00
'2h' 0 days 02:00:00
'30min' 0 days 00:30:00
'1 day 5h 30min' 1 days 05:30:00

✨ Convert to Best dtypes with convert_dtypes()

What Is It?

convert_dtypes() is like having a smart assistant who looks at ALL your data and picks the best type for each column automatically!

The Simple Story

Instead of you checking each column one by one, this helper scans everything and says: “This should be an integer, that should be a string, and this one looks like a boolean!”

How It Works

import pandas as pd

# Messy DataFrame
df = pd.DataFrame({
    'A': ['1', '2', '3'],
    'B': [1.0, 2.0, 3.0],
    'C': [True, False, True],
    'D': ['x', 'y', 'z']
})

# Let Pandas pick the best types
df_clean = df.convert_dtypes()

# Check what it chose
print(df_clean.dtypes)
# A    string
# B    Int64
# C    boolean
# D    string

🎯 Why Use It?

  • 🚀 Automatic: No guessing what type to use
  • đź’ľ Memory efficient: Picks optimal types
  • 🛡️ Nullable-aware: Uses modern nullable types

🔍 Select dtypes with select_dtypes()

What Is It?

select_dtypes() is your data filter. It picks only the columns that match certain types.

The Simple Story

Imagine a classroom where you say “Everyone wearing blue, stand up!” That’s select_dtypes()—it selects columns based on their type!

How It Works

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30],
    'salary': [50000.0, 60000.0],
    'active': [True, False]
})

# Get only number columns
numbers = df.select_dtypes(include='number')
# Returns: age, salary

# Get only text columns
text = df.select_dtypes(include='object')
# Returns: name

# Exclude certain types
no_floats = df.select_dtypes(exclude='float')
# Returns: name, age, active

🎯 Common Type Selectors

Selector What It Gets
'number' All numeric types
'int64' Only integers
'float64' Only floats
'object' Strings/mixed
'bool' Booleans
'datetime' Dates and times

🛡️ Nullable Data Types

What Is It?

Traditional Pandas types can’t handle missing values in integers. Nullable types solve this!

The Simple Story

Imagine a scorecard. Sometimes a student didn’t take the test—their score is “missing” (not zero!). Old integer types would crash or convert to float. Nullable types say “that’s okay, we’ll remember it’s missing!”

The Problem

import pandas as pd
import numpy as np

# Old way - integers can't have NaN
data = pd.Series([1, 2, np.nan, 4])
print(data.dtype)  # float64 (not int!)

The Solution

import pandas as pd

# New nullable integer type
data = pd.Series(
    [1, 2, pd.NA, 4],
    dtype='Int64'  # Capital I!
)
print(data.dtype)  # Int64
print(data)
# 0       1
# 1       2
# 2    <NA>
# 3       4

🎯 Nullable Types Reference

Old Type Nullable Version
int64 Int64
float64 Float64
bool boolean
object string

Why Does This Matter?

  • âś… Keep integers as integers (even with missing data)
  • âś… Clearer distinction between “missing” and “zero”
  • âś… Better memory usage
  • âś… Consistent behavior across operations

🎬 Putting It All Together

graph TD A["Raw Data"] --> B{What conversion?} B -->|Any type| C["astype"] B -->|To numbers| D["to_numeric"] B -->|To dates| E["to_datetime"] B -->|To durations| F["to_timedelta"] B -->|Auto-detect| G["convert_dtypes"] B -->|Filter columns| H["select_dtypes"] C --> I["Clean Data"] D --> I E --> I F --> I G --> I H --> I

đź’ˇ Pro Tips

  1. Always check types first: Use df.dtypes before converting
  2. Use errors='coerce': When you’re not sure if all values are clean
  3. Prefer nullable types: Use Int64 over int64 for integers with possible nulls
  4. convert_dtypes() is your friend: Let Pandas do the heavy lifting when starting out

🎯 Quick Decision Guide

I need to… Use this
Convert to any specific type astype()
Convert messy strings to numbers to_numeric()
Convert strings to dates to_datetime()
Convert strings to durations to_timedelta()
Auto-detect best types convert_dtypes()
Filter columns by type select_dtypes()
Handle missing values in integers Nullable types (Int64)

🌟 Remember: Your data is like water—it takes the shape of its container. Type conversion lets YOU choose the container!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.