Concatenating DataFrames

Back

Loading concept...

🧩 Concatenating DataFrames in Pandas

The LEGO Brick Story

Imagine you have two boxes of LEGO bricks. One box has red bricks, the other has blue bricks. You want to play with ALL of them together!

Concatenating is just like dumping both boxes into one big pile so you can build something amazing.

That’s exactly what pd.concat() does with DataFrames!


🎯 What is Concatenation?

Concatenation = Stacking DataFrames together

Think of it like:

  • Rows: Stacking pancakes on top of each other πŸ₯ž
  • Columns: Putting books side by side on a shelf πŸ“š
import pandas as pd

# Two small DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'Name': ['Charlie', 'Diana']})

# Concatenate them!
result = pd.concat([df1, df2])

Result: One bigger DataFrame with all four names!


πŸ“š Concatenating Along ROWS (axis=0)

This is the default way. Like stacking plates!

The Setup

# Morning orders
morning = pd.DataFrame({
    'Item': ['Coffee', 'Toast'],
    'Price': [3, 2]
})

# Evening orders
evening = pd.DataFrame({
    'Item': ['Soup', 'Salad'],
    'Price': [5, 4]
})

Stack Them!

all_orders = pd.concat([morning, evening])
print(all_orders)

Output:

     Item  Price
0  Coffee      3
1   Toast      2
0    Soup      5
1   Salad      4

πŸ€” Wait… Why are there two "0"s and two "1"s?

Each DataFrame kept its original index!

Fix: Reset the Index

all_orders = pd.concat(
    [morning, evening],
    ignore_index=True
)

Now the output is clean:

     Item  Price
0  Coffee      3
1   Toast      2
2    Soup      5
3   Salad      4

🎨 Visual Flow

graph TD A["Morning DataFrame<br/>2 rows"] --> C["pd.concat"] B["Evening DataFrame<br/>2 rows"] --> C C --> D["Combined DataFrame<br/>4 rows stacked vertically"]

πŸ“– Concatenating Along COLUMNS (axis=1)

Now imagine putting two posters side by side on your wall.

The Setup

# Student names
names = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie']
})

# Their scores
scores = pd.DataFrame({
    'Math': [90, 85, 78],
    'Science': [88, 92, 80]
})

Put Them Side by Side!

full_report = pd.concat(
    [names, scores],
    axis=1
)
print(full_report)

Output:

      Name  Math  Science
0    Alice    90       88
1      Bob    85       92
2  Charlie    78       80

🎨 Visual Flow

graph TD A["Names DataFrame<br/>1 column"] --> C["pd.concat axis=1"] B["Scores DataFrame<br/>2 columns"] --> C C --> D["Full Report<br/>3 columns side by side"]

⚠️ Important Rule!

When combining columns, row counts must match.

If they don’t? You’ll get NaN (missing values) filling the gaps!


πŸ”— The Join Method

Sometimes your DataFrames don’t have the same columns (for rows) or same rows (for columns).

Join decides what to do!

Two Options:

Join Type What It Does Like This…
outer Keep EVERYTHING All guests come to the party πŸŽ‰
inner Keep only MATCHING Only VIP guests allowed 🎫

Example: Different Columns

df1 = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

df2 = pd.DataFrame({
    'B': [5, 6],
    'C': [7, 8]
})

Outer Join (Default)

result = pd.concat(
    [df1, df2],
    join='outer'
)
print(result)

Output:

     A  B    C
0  1.0  3  NaN
1  2.0  4  NaN
0  NaN  5  7.0
1  NaN  6  8.0

Explanation: Column A and C don’t exist in both. So empty spots become NaN.

Inner Join

result = pd.concat(
    [df1, df2],
    join='inner'
)
print(result)

Output:

   B
0  3
1  4
0  5
1  6

Explanation: Only column B exists in BOTH. So only B survives!

🎨 Visual Comparison

graph TD subgraph "Outer Join" O1["A, B"] --> O3["A, B, C"] O2["B, C"] --> O3 end subgraph "Inner Join" I1["A, B"] --> I3["B only"] I2["B, C"] --> I3 end

πŸ† Quick Summary

Task Code Result
Stack rows pd.concat([df1, df2]) Taller DataFrame
Stack columns pd.concat([df1, df2], axis=1) Wider DataFrame
Clean index ignore_index=True Fresh 0,1,2,3…
Keep all data join='outer' NaN fills gaps
Only common join='inner' Matching only

🎯 Real-World Example

You’re a teacher with two class sections:

class_a = pd.DataFrame({
    'Student': ['Emma', 'Liam'],
    'Grade': ['A', 'B']
})

class_b = pd.DataFrame({
    'Student': ['Noah', 'Olivia'],
    'Grade': ['B', 'A']
})

# Combine all students
all_students = pd.concat(
    [class_a, class_b],
    ignore_index=True
)

print(all_students)

Output:

  Student Grade
0    Emma     A
1    Liam     B
2    Noah     B
3  Olivia     A

Now you have ONE list of all your students! πŸŽ“


πŸ’‘ Pro Tips

  1. Always use ignore_index=True when stacking rows unless you need original indices.

  2. Check column names first! Mismatched spelling = separate columns.

  3. Use axis=1 carefully. Make sure row counts align.

  4. outer is safe, inner is strict. Choose based on your needs.


πŸš€ You Did It!

You now know how to:

  • βœ… Stack DataFrames vertically (rows)
  • βœ… Stack DataFrames horizontally (columns)
  • βœ… Control what happens with join
  • βœ… Keep your index clean

Concatenating is like being a master puzzle builder. You take separate pieces and combine them into one beautiful picture!

Now go stack some DataFrames! πŸŽ‰

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.