π§© Concatenating DataFrames in Pandas
The LEGO Brick Story
Imagine you have two boxes of LEGO bricks. One box has red bricks, the other has blue bricks. You want to play with ALL of them together!
Concatenating is just like dumping both boxes into one big pile so you can build something amazing.
Thatβs exactly what pd.concat() does with DataFrames!
π― What is Concatenation?
Concatenation = Stacking DataFrames together
Think of it like:
- Rows: Stacking pancakes on top of each other π₯
- Columns: Putting books side by side on a shelf π
import pandas as pd
# Two small DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'Name': ['Charlie', 'Diana']})
# Concatenate them!
result = pd.concat([df1, df2])
Result: One bigger DataFrame with all four names!
π Concatenating Along ROWS (axis=0)
This is the default way. Like stacking plates!
The Setup
# Morning orders
morning = pd.DataFrame({
'Item': ['Coffee', 'Toast'],
'Price': [3, 2]
})
# Evening orders
evening = pd.DataFrame({
'Item': ['Soup', 'Salad'],
'Price': [5, 4]
})
Stack Them!
all_orders = pd.concat([morning, evening])
print(all_orders)
Output:
Item Price
0 Coffee 3
1 Toast 2
0 Soup 5
1 Salad 4
π€ Waitβ¦ Why are there two "0"s and two "1"s?
Each DataFrame kept its original index!
Fix: Reset the Index
all_orders = pd.concat(
[morning, evening],
ignore_index=True
)
Now the output is clean:
Item Price
0 Coffee 3
1 Toast 2
2 Soup 5
3 Salad 4
π¨ Visual Flow
graph TD A["Morning DataFrame<br/>2 rows"] --> C["pd.concat"] B["Evening DataFrame<br/>2 rows"] --> C C --> D["Combined DataFrame<br/>4 rows stacked vertically"]
π Concatenating Along COLUMNS (axis=1)
Now imagine putting two posters side by side on your wall.
The Setup
# Student names
names = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie']
})
# Their scores
scores = pd.DataFrame({
'Math': [90, 85, 78],
'Science': [88, 92, 80]
})
Put Them Side by Side!
full_report = pd.concat(
[names, scores],
axis=1
)
print(full_report)
Output:
Name Math Science
0 Alice 90 88
1 Bob 85 92
2 Charlie 78 80
π¨ Visual Flow
graph TD A["Names DataFrame<br/>1 column"] --> C["pd.concat axis=1"] B["Scores DataFrame<br/>2 columns"] --> C C --> D["Full Report<br/>3 columns side by side"]
β οΈ Important Rule!
When combining columns, row counts must match.
If they donβt? Youβll get NaN (missing values) filling the gaps!
π The Join Method
Sometimes your DataFrames donβt have the same columns (for rows) or same rows (for columns).
Join decides what to do!
Two Options:
| Join Type | What It Does | Like This⦠|
|---|---|---|
outer |
Keep EVERYTHING | All guests come to the party π |
inner |
Keep only MATCHING | Only VIP guests allowed π« |
Example: Different Columns
df1 = pd.DataFrame({
'A': [1, 2],
'B': [3, 4]
})
df2 = pd.DataFrame({
'B': [5, 6],
'C': [7, 8]
})
Outer Join (Default)
result = pd.concat(
[df1, df2],
join='outer'
)
print(result)
Output:
A B C
0 1.0 3 NaN
1 2.0 4 NaN
0 NaN 5 7.0
1 NaN 6 8.0
Explanation: Column A and C donβt exist in both. So empty spots become NaN.
Inner Join
result = pd.concat(
[df1, df2],
join='inner'
)
print(result)
Output:
B
0 3
1 4
0 5
1 6
Explanation: Only column B exists in BOTH. So only B survives!
π¨ Visual Comparison
graph TD subgraph "Outer Join" O1["A, B"] --> O3["A, B, C"] O2["B, C"] --> O3 end subgraph "Inner Join" I1["A, B"] --> I3["B only"] I2["B, C"] --> I3 end
π Quick Summary
| Task | Code | Result |
|---|---|---|
| Stack rows | pd.concat([df1, df2]) |
Taller DataFrame |
| Stack columns | pd.concat([df1, df2], axis=1) |
Wider DataFrame |
| Clean index | ignore_index=True |
Fresh 0,1,2,3β¦ |
| Keep all data | join='outer' |
NaN fills gaps |
| Only common | join='inner' |
Matching only |
π― Real-World Example
Youβre a teacher with two class sections:
class_a = pd.DataFrame({
'Student': ['Emma', 'Liam'],
'Grade': ['A', 'B']
})
class_b = pd.DataFrame({
'Student': ['Noah', 'Olivia'],
'Grade': ['B', 'A']
})
# Combine all students
all_students = pd.concat(
[class_a, class_b],
ignore_index=True
)
print(all_students)
Output:
Student Grade
0 Emma A
1 Liam B
2 Noah B
3 Olivia A
Now you have ONE list of all your students! π
π‘ Pro Tips
-
Always use
ignore_index=Truewhen stacking rows unless you need original indices. -
Check column names first! Mismatched spelling = separate columns.
-
Use
axis=1carefully. Make sure row counts align. -
outeris safe,inneris strict. Choose based on your needs.
π You Did It!
You now know how to:
- β Stack DataFrames vertically (rows)
- β Stack DataFrames horizontally (columns)
- β
Control what happens with
join - β Keep your index clean
Concatenating is like being a master puzzle builder. You take separate pieces and combine them into one beautiful picture!
Now go stack some DataFrames! π
