Merging DataFrames

Back

Loading concept...

🔗 Merging DataFrames in Pandas

The Story: Bringing Friends Together at a Party

Imagine you’re throwing a birthday party. You have two guest lists:

  • List A: Names and phone numbers
  • List B: Names and favorite cake flavors

To plan the perfect party, you need to combine these lists. That’s exactly what merging does in Pandas!


🎯 What is Merging?

Merging is like connecting puzzle pieces. You take two tables and join them based on something they have in common.

import pandas as pd

# List A: Guest contacts
contacts = pd.DataFrame({
    'name': ['Emma', 'Liam', 'Olivia'],
    'phone': ['111', '222', '333']
})

# List B: Cake preferences
cakes = pd.DataFrame({
    'name': ['Emma', 'Liam', 'Noah'],
    'cake': ['Chocolate', 'Vanilla', 'Strawberry']
})

contacts:

name phone
Emma 111
Liam 222
Olivia 333

cakes:

name cake
Emma Chocolate
Liam Vanilla
Noah Strawberry

1️⃣ Merge DataFrames Basics

The simplest merge finds matching values in both tables.

# Basic merge - finds common names
result = pd.merge(contacts, cakes)
print(result)

Result:

name phone cake
Emma 111 Chocolate
Liam 222 Vanilla

What happened?

  • Emma and Liam are in both lists
  • Olivia (only in contacts) is gone
  • Noah (only in cakes) is gone

Think of it like: “Who’s on BOTH guest lists?”


2️⃣ Merge on Column

Sometimes your columns have different names but mean the same thing.

# Different column names
employees = pd.DataFrame({
    'emp_name': ['Sara', 'Mike'],
    'dept': ['Sales', 'Tech']
})

salaries = pd.DataFrame({
    'worker': ['Sara', 'Mike'],
    'pay': [50000, 60000]
})

# Tell Pandas which columns to match
result = pd.merge(
    employees,
    salaries,
    left_on='emp_name',  # From left table
    right_on='worker'    # From right table
)

Result:

emp_name dept worker pay
Sara Sales Sara 50000
Mike Tech Mike 60000

Key Point: Use left_on and right_on when column names don’t match!


3️⃣ Merge on Index

What if the “key” is the row label (index) instead of a column?

# Tables with meaningful indexes
products = pd.DataFrame(
    {'price': [10, 20]},
    index=['apple', 'banana']
)

stock = pd.DataFrame(
    {'quantity': [100, 50]},
    index=['apple', 'banana']
)

# Merge using indexes
result = pd.merge(
    products,
    stock,
    left_index=True,   # Use left index
    right_index=True   # Use right index
)

Result:

index price quantity
apple 10 100
banana 20 50

Pro tip: You can mix! Use left_on='col' with right_index=True.


4️⃣ Merge Join Types

This is where merging gets powerful! There are 4 types:

graph TD A["Join Types"] --> B["inner"] A --> C["left"] A --> D["right"] A --> E["outer"] B --> B1["Only matches"] C --> C1["All from left"] D --> D1["All from right"] E --> E1["Everything"]

Let’s See Each Type:

left = pd.DataFrame({
    'key': ['A', 'B', 'C'],
    'value': [1, 2, 3]
})

right = pd.DataFrame({
    'key': ['B', 'C', 'D'],
    'score': [10, 20, 30]
})

INNER (Default) - “Only the Matches”

pd.merge(left, right, on='key', how='inner')
key value score
B 2 10
C 3 20

Only B and C are in both!

LEFT - “Keep Everyone from Left”

pd.merge(left, right, on='key', how='left')
key value score
A 1 NaN
B 2 10
C 3 20

A stays even with no match. Missing = NaN.

RIGHT - “Keep Everyone from Right”

pd.merge(left, right, on='key', how='right')
key value score
B 2 10
C 3 20
D NaN 30

D stays even with no match.

OUTER - “Keep Absolutely Everyone”

pd.merge(left, right, on='key', how='outer')
key value score
A 1 NaN
B 2 10
C 3 20
D NaN 30

Nobody left behind! Fill gaps with NaN.


5️⃣ Merge with Suffixes

Problem: What if both tables have the same column name?

jan = pd.DataFrame({
    'product': ['Book', 'Pen'],
    'sales': [100, 50]
})

feb = pd.DataFrame({
    'product': ['Book', 'Pen'],
    'sales': [120, 60]
})

# Without suffixes = confusion!
result = pd.merge(jan, feb, on='product')

Result (confusing):

product sales_x sales_y
Book 100 120
Pen 50 60

Solution: Add meaningful suffixes!

result = pd.merge(
    jan,
    feb,
    on='product',
    suffixes=('_jan', '_feb')  # Custom labels!
)

Result (clear!):

product sales_jan sales_feb
Book 100 120
Pen 50 60

Now you know exactly which sales came from which month!


🎉 Quick Summary

Task Code
Basic merge pd.merge(df1, df2)
Merge on specific column pd.merge(df1, df2, on='col')
Different column names pd.merge(df1, df2, left_on='a', right_on='b')
Merge on index pd.merge(df1, df2, left_index=True, right_index=True)
Keep all from left pd.merge(df1, df2, how='left')
Keep all from right pd.merge(df1, df2, how='right')
Keep everything pd.merge(df1, df2, how='outer')
Custom suffixes pd.merge(df1, df2, suffixes=('_1', '_2'))

🚀 You Did It!

Merging is like being a matchmaker for data. You bring tables together based on what they have in common. Now you can:

  • Combine any two DataFrames
  • Handle different column names
  • Use indexes as keys
  • Choose exactly who stays (join types)
  • Avoid confusion with suffixes

Go forth and merge with confidence! 🎊

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.