🔗 Merging DataFrames in Pandas
The Story: Bringing Friends Together at a Party
Imagine you’re throwing a birthday party. You have two guest lists:
- List A: Names and phone numbers
- List B: Names and favorite cake flavors
To plan the perfect party, you need to combine these lists. That’s exactly what merging does in Pandas!
🎯 What is Merging?
Merging is like connecting puzzle pieces. You take two tables and join them based on something they have in common.
import pandas as pd
# List A: Guest contacts
contacts = pd.DataFrame({
'name': ['Emma', 'Liam', 'Olivia'],
'phone': ['111', '222', '333']
})
# List B: Cake preferences
cakes = pd.DataFrame({
'name': ['Emma', 'Liam', 'Noah'],
'cake': ['Chocolate', 'Vanilla', 'Strawberry']
})
contacts:
| name | phone |
|---|---|
| Emma | 111 |
| Liam | 222 |
| Olivia | 333 |
cakes:
| name | cake |
|---|---|
| Emma | Chocolate |
| Liam | Vanilla |
| Noah | Strawberry |
1️⃣ Merge DataFrames Basics
The simplest merge finds matching values in both tables.
# Basic merge - finds common names
result = pd.merge(contacts, cakes)
print(result)
Result:
| name | phone | cake |
|---|---|---|
| Emma | 111 | Chocolate |
| Liam | 222 | Vanilla |
What happened?
- Emma and Liam are in both lists
- Olivia (only in contacts) is gone
- Noah (only in cakes) is gone
Think of it like: “Who’s on BOTH guest lists?”
2️⃣ Merge on Column
Sometimes your columns have different names but mean the same thing.
# Different column names
employees = pd.DataFrame({
'emp_name': ['Sara', 'Mike'],
'dept': ['Sales', 'Tech']
})
salaries = pd.DataFrame({
'worker': ['Sara', 'Mike'],
'pay': [50000, 60000]
})
# Tell Pandas which columns to match
result = pd.merge(
employees,
salaries,
left_on='emp_name', # From left table
right_on='worker' # From right table
)
Result:
| emp_name | dept | worker | pay |
|---|---|---|---|
| Sara | Sales | Sara | 50000 |
| Mike | Tech | Mike | 60000 |
Key Point: Use left_on and right_on when column names don’t match!
3️⃣ Merge on Index
What if the “key” is the row label (index) instead of a column?
# Tables with meaningful indexes
products = pd.DataFrame(
{'price': [10, 20]},
index=['apple', 'banana']
)
stock = pd.DataFrame(
{'quantity': [100, 50]},
index=['apple', 'banana']
)
# Merge using indexes
result = pd.merge(
products,
stock,
left_index=True, # Use left index
right_index=True # Use right index
)
Result:
| index | price | quantity |
|---|---|---|
| apple | 10 | 100 |
| banana | 20 | 50 |
Pro tip: You can mix! Use left_on='col' with right_index=True.
4️⃣ Merge Join Types
This is where merging gets powerful! There are 4 types:
graph TD A["Join Types"] --> B["inner"] A --> C["left"] A --> D["right"] A --> E["outer"] B --> B1["Only matches"] C --> C1["All from left"] D --> D1["All from right"] E --> E1["Everything"]
Let’s See Each Type:
left = pd.DataFrame({
'key': ['A', 'B', 'C'],
'value': [1, 2, 3]
})
right = pd.DataFrame({
'key': ['B', 'C', 'D'],
'score': [10, 20, 30]
})
INNER (Default) - “Only the Matches”
pd.merge(left, right, on='key', how='inner')
| key | value | score |
|---|---|---|
| B | 2 | 10 |
| C | 3 | 20 |
Only B and C are in both!
LEFT - “Keep Everyone from Left”
pd.merge(left, right, on='key', how='left')
| key | value | score |
|---|---|---|
| A | 1 | NaN |
| B | 2 | 10 |
| C | 3 | 20 |
A stays even with no match. Missing = NaN.
RIGHT - “Keep Everyone from Right”
pd.merge(left, right, on='key', how='right')
| key | value | score |
|---|---|---|
| B | 2 | 10 |
| C | 3 | 20 |
| D | NaN | 30 |
D stays even with no match.
OUTER - “Keep Absolutely Everyone”
pd.merge(left, right, on='key', how='outer')
| key | value | score |
|---|---|---|
| A | 1 | NaN |
| B | 2 | 10 |
| C | 3 | 20 |
| D | NaN | 30 |
Nobody left behind! Fill gaps with NaN.
5️⃣ Merge with Suffixes
Problem: What if both tables have the same column name?
jan = pd.DataFrame({
'product': ['Book', 'Pen'],
'sales': [100, 50]
})
feb = pd.DataFrame({
'product': ['Book', 'Pen'],
'sales': [120, 60]
})
# Without suffixes = confusion!
result = pd.merge(jan, feb, on='product')
Result (confusing):
| product | sales_x | sales_y |
|---|---|---|
| Book | 100 | 120 |
| Pen | 50 | 60 |
Solution: Add meaningful suffixes!
result = pd.merge(
jan,
feb,
on='product',
suffixes=('_jan', '_feb') # Custom labels!
)
Result (clear!):
| product | sales_jan | sales_feb |
|---|---|---|
| Book | 100 | 120 |
| Pen | 50 | 60 |
Now you know exactly which sales came from which month!
🎉 Quick Summary
| Task | Code |
|---|---|
| Basic merge | pd.merge(df1, df2) |
| Merge on specific column | pd.merge(df1, df2, on='col') |
| Different column names | pd.merge(df1, df2, left_on='a', right_on='b') |
| Merge on index | pd.merge(df1, df2, left_index=True, right_index=True) |
| Keep all from left | pd.merge(df1, df2, how='left') |
| Keep all from right | pd.merge(df1, df2, how='right') |
| Keep everything | pd.merge(df1, df2, how='outer') |
| Custom suffixes | pd.merge(df1, df2, suffixes=('_1', '_2')) |
🚀 You Did It!
Merging is like being a matchmaker for data. You bring tables together based on what they have in common. Now you can:
- Combine any two DataFrames
- Handle different column names
- Use indexes as keys
- Choose exactly who stays (join types)
- Avoid confusion with suffixes
Go forth and merge with confidence! 🎊
