What is cutting and binning in R?

Cutting and binning groups continuous numbers into categories using cut(). Like sorting toys into Small, Medium, and Large boxes.

How do I find missing values in R?

Use is.na() to check each value, sum(is.na()) to count missing values, and any(is.na()) to check if any exist.

What does na.rm=TRUE do in R functions?

Adding na.rm=TRUE tells R functions to ignore NA values. Without it, functions like mean() return NA if any value is missing.

Data Cleaning in R | Binning & Missing Values

🧹 Data Cleaning in R: Cutting, Binning & Missing Values

The Messy Room Story

Imagine your room is super messy. Toys everywhere! Books scattered! Some things are even missing (where did that sock go?). Before you can play, you need to clean up and organize.

Data cleaning is the same! Real-world data is messy. Numbers are all over the place, and some values just… disappear. Today, we’ll learn two superpowers:

Cutting & Binning → Organizing scattered numbers into neat groups
Missing Value Handling → Finding and fixing the “lost socks” in your data

🎯 Part 1: Cutting and Binning

What’s the Big Idea?

Think of a toy sorting box with different compartments labeled “Small,” “Medium,” and “Large.” Instead of having 100 different toy sizes, you just sort them into 3 groups!

graph TD
    A["Messy Numbers: 5, 23, 67, 12, 89, 45"] --> B["Sorting Box"]
    B --> C["Low: 5, 12"]
    B --> D["Medium: 23, 45"]
    B --> E["High: 67, 89"]

Why do this?

Makes patterns easier to see
Simplifies analysis
Groups similar things together

The `cut()` Function - Your Sorting Tool

The cut() function is like having a magical sorting machine!

Basic Recipe:

cut(x, breaks, labels)

Part	What It Does
`x`	Your messy numbers
`breaks`	Where to make the cuts
`labels`	Names for each group

Example 1: Sorting Ages into Groups

Imagine you have ages of kids at a party:

# Kids' ages
ages <- c(5, 8, 12, 7, 15, 3, 10)

# Sort into groups
age_groups <- cut(
  ages,
  breaks = c(0, 6, 12, 18),
  labels = c("Little", "Medium", "Teen")
)

print(age_groups)
# Little, Medium, Medium,
# Medium, Teen, Little, Medium

What happened?

Ages 0-6 → “Little”
Ages 7-12 → “Medium”
Ages 13-18 → “Teen”

Example 2: Test Scores to Grades

scores <- c(92, 78, 65, 88, 45, 95)

grades <- cut(
  scores,
  breaks = c(0, 60, 70, 80, 90, 100),
  labels = c("F", "D", "C", "B", "A")
)

print(grades)
# A, C, D, B, F, A

Now messy numbers become clear letter grades!

The `include.lowest` Secret

By default, cut() doesn’t include the lowest number. Fix this:

# Include the minimum value
cut(ages,
    breaks = c(0, 6, 12, 18),
    labels = c("Little", "Medium", "Teen"),
    include.lowest = TRUE)

Quick Binning with `ntile()`

Need equal-sized groups? Use ntile() from dplyr:

library(dplyr)

# Split into 3 equal groups
ages <- c(5, 8, 12, 7, 15, 3, 10)
ntile(ages, 3)
# 1, 2, 3, 2, 3, 1, 2

Each group gets roughly the same number of items!

🕳️ Part 2: Missing Value Handling

The Mystery of NA

In R, missing values are shown as NA (Not Available). It’s like a blank space where a number should be.

Where do NAs come from?

Someone forgot to fill in a form
A sensor broke
Data got lost during transfer

# A vector with missing values
temps <- c(72, NA, 68, 75, NA, 70)

Finding Missing Values

Question: “Do I have any missing socks… I mean, values?”

# Check each value
is.na(temps)
# FALSE, TRUE, FALSE, FALSE, TRUE, FALSE

# Count the missing ones
sum(is.na(temps))
# 2

# Are ANY missing?
any(is.na(temps))
# TRUE

graph TD
    A["Your Data"] --> B{is.na?}
    B -->|TRUE| C["Missing! 🕳️"]
    B -->|FALSE| D["Got it! ✓"]

Strategy 1: Remove Missing Values

Sometimes the easiest fix is to just skip the NAs.

temps <- c(72, NA, 68, 75, NA, 70)

# Remove NAs completely
clean_temps <- na.omit(temps)
print(clean_temps)
# 72, 68, 75, 70

# Or use complete.cases for data frames
df <- data.frame(
  name = c("Ana", "Bob", "Cat"),
  age = c(10, NA, 8)
)
df[complete.cases(df), ]

When to use: You have lots of data and losing a few rows is okay.

Strategy 2: Replace with a Fixed Value

Fill the gaps with a specific number:

temps <- c(72, NA, 68, 75, NA, 70)

# Replace NA with 0
temps[is.na(temps)] <- 0

# Or use tidyr's replace_na
library(tidyr)
temps <- c(72, NA, 68, 75, NA, 70)
replace_na(temps, 0)
# 72, 0, 68, 75, 0, 70

Strategy 3: Replace with Mean/Median

Fill gaps with the average value - a smart guess!

temps <- c(72, NA, 68, 75, NA, 70)

# Calculate mean (ignoring NAs)
avg_temp <- mean(temps, na.rm = TRUE)
# avg_temp = 71.25

# Fill NAs with average
temps[is.na(temps)] <- avg_temp
# 72, 71.25, 68, 75, 71.25, 70

The na.rm = TRUE trick: Most R functions fail with NAs. Add na.rm = TRUE to ignore them!

mean(c(1, NA, 3))        # NA (broken!)
mean(c(1, NA, 3), na.rm=TRUE)  # 2 (works!)

Strategy 4: Forward/Backward Fill

Use the previous (or next) value to fill gaps:

library(tidyr)

temps <- c(72, NA, NA, 75, NA, 70)

# Fill with previous value
fill(data.frame(t=temps), t,
     .direction = "down")
# 72, 72, 72, 75, 75, 70

# Fill with next value
fill(data.frame(t=temps), t,
     .direction = "up")
# 72, 75, 75, 75, 70, 70

When to use: Time series data where values flow naturally.

🧩 Putting It All Together

Here’s a real example combining both skills:

library(dplyr)
library(tidyr)

# Messy student data
students <- data.frame(
  name = c("Ava", "Ben", "Cara", "Dan"),
  score = c(85, NA, 72, 91)
)

# Step 1: Handle missing scores
students <- students %>%
  mutate(score = replace_na(
    score,
    mean(score, na.rm = TRUE)
  ))

# Step 2: Bin into grade groups
students <- students %>%
  mutate(grade = cut(
    score,
    breaks = c(0, 60, 70, 80, 90, 100),
    labels = c("F","D","C","B","A")
  ))

print(students)

🎯 Quick Reference

Task	Function	Example
Create bins	`cut()`	`cut(x, breaks=c(0,50,100))`
Equal groups	`ntile()`	`ntile(x, 4)`
Find NAs	`is.na()`	`is.na(x)`
Remove NAs	`na.omit()`	`na.omit(x)`
Replace NAs	`replace_na()`	`replace_na(x, 0)`
Ignore NAs	`na.rm=TRUE`	`mean(x, na.rm=TRUE)`

🏆 You Did It!

You now have two powerful data cleaning superpowers:

✅ Cutting & Binning - Transform messy numbers into organized groups ✅ Missing Value Handling - Find and fix the gaps in your data

Remember: Clean data = Happy analysis! 🎉

Just like cleaning your room makes it easier to find your toys, cleaning your data makes it easier to find insights!

Unable to load concept

Coming Soon...

🧹 Data Cleaning in R: Cutting, Binning & Missing Values

The Messy Room Story

🎯 Part 1: Cutting and Binning

What’s the Big Idea?

The `cut()` Function - Your Sorting Tool

Example 1: Sorting Ages into Groups

Example 2: Test Scores to Grades

The `include.lowest` Secret

Quick Binning with `ntile()`

🕳️ Part 2: Missing Value Handling

The Mystery of NA

Finding Missing Values

Strategy 1: Remove Missing Values

Strategy 2: Replace with a Fixed Value

Strategy 3: Replace with Mean/Median

Strategy 4: Forward/Backward Fill

🧩 Putting It All Together

🎯 Quick Reference

🏆 You Did It!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue

Data Cleaning

Unable to load concept

Coming Soon...

🧹 Data Cleaning in R: Cutting, Binning & Missing Values

The Messy Room Story

🎯 Part 1: Cutting and Binning

What’s the Big Idea?

The cut() Function - Your Sorting Tool

Example 1: Sorting Ages into Groups

Example 2: Test Scores to Grades

The include.lowest Secret

Quick Binning with ntile()

🕳️ Part 2: Missing Value Handling

The Mystery of NA

Finding Missing Values

Strategy 1: Remove Missing Values

Strategy 2: Replace with a Fixed Value

Strategy 3: Replace with Mean/Median

Strategy 4: Forward/Backward Fill

🧩 Putting It All Together

🎯 Quick Reference

🏆 You Did It!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue

The `cut()` Function - Your Sorting Tool

The `include.lowest` Secret

Quick Binning with `ntile()`