๐ณ R Data Processing: Your Kitchen Adventure!
Imagine youโre a master chef in a magical kitchen. Your ingredients? Data! Your cooking tools? R functions! Letโs learn how to slice, dice, mix, and serve beautiful data dishes.
๐ฅ The Kitchen Metaphor
Think of your data like a big basket of ingredients. Sometimes you need to:
- Pick only the tomatoes (Subset)
- Chop them into pieces (Transform)
- Work inside the basket easily (with/within)
- Mix two baskets together (Merge)
- Find whatโs common or different (Set Operations)
- Count how many of each (Contingency Tables)
- Rearrange your table nicely (Table Manipulation)
Letโs cook! ๐ฝ๏ธ
1๏ธโฃ Subset Function: Picking Your Ingredients
What is it?
subset() helps you pick only the data you want โ like reaching into a fruit basket and grabbing only the apples!
The Magic Words
subset(data, condition)
subset(data, condition, select = columns)
๐ Simple Example
# Our fruit basket
fruits <- data.frame(
name = c("apple", "banana", "cherry"),
color = c("red", "yellow", "red"),
price = c(2, 1, 3)
)
# Pick only red fruits
red_fruits <- subset(fruits,
color == "red")
# Result: apple and cherry!
๐ฏ Pro Tips
- Use
selectto pick specific columns:
# Get names of cheap fruits
subset(fruits,
price < 2,
select = name)
Why Kids Love It ๐ง
Itโs like having a magic wand that says โGive me only the toys that are blue!โ and poof โ you get exactly what you asked for!
2๏ธโฃ Transform Function: Cooking Your Data
What is it?
transform() lets you add new columns or change existing ones โ like adding seasoning to your dish!
The Magic Words
transform(data, new_column = calculation)
๐งฎ Simple Example
# Our students
students <- data.frame(
name = c("Amy", "Bob"),
math = c(80, 90),
science = c(70, 85)
)
# Add total score
students <- transform(students,
total = math + science,
average = (math + science) / 2
)
๐จ What Happens?
name math science total average
1 Amy 80 70 150 75.0
2 Bob 90 85 175 87.5
Why Kids Love It ๐ง
Itโs like putting stickers on your notebook โ youโre adding new stuff without throwing anything away!
3๏ธโฃ With and Within Functions: Working Inside the Box
The Problem
Typing data$column everywhere is tiring! Like saying โthe red boxโs apple, the red boxโs banana, the red boxโs cherryโฆโ
The Solution: with() and within()
graph TD A["Your Data Frame"] --> B{What do you want?} B -->|Just calculate something| C["with"] B -->|Change the data| D["within"] C --> E["Returns a result"] D --> F["Returns modified data"]
๐ช with() Example
# Calculate without $ signs
with(students, {
total <- math + science
print(mean(total))
})
# Just shows the answer!
๐ง within() Example
# Modify the data itself
students <- within(students, {
grade <- ifelse(average >= 80,
"A", "B")
})
# Now students HAS a grade column!
The Difference ๐ค
| Function | Does What? | Returns |
|---|---|---|
with() |
Calculates | Just the answer |
within() |
Modifies | Changed data frame |
4๏ธโฃ Merge Function: Mixing Two Baskets
What is it?
merge() is like having two puzzle pieces that snap together! It combines data from different tables.
The Magic Words
merge(table1, table2, by = "matching_column")
๐งฉ Simple Example
# Student names
names_df <- data.frame(
id = c(1, 2, 3),
name = c("Amy", "Bob", "Cat")
)
# Student scores
scores_df <- data.frame(
id = c(1, 2, 3),
score = c(95, 87, 92)
)
# Snap them together!
complete <- merge(names_df,
scores_df,
by = "id")
๐จ Result
id name score
1 1 Amy 95
2 2 Bob 87
3 3 Cat 92
๐ฎ Different Types of Merge
graph TD A["Merge Types"] --> B["Inner: Only matching"] A --> C["Left: Keep all left"] A --> D["Right: Keep all right"] A --> E["Full: Keep everything"]
# Keep everyone from left table
merge(x, y, by="id", all.x=TRUE)
# Keep everyone from both
merge(x, y, by="id", all=TRUE)
5๏ธโฃ Set Operations: Finding Friends & Strangers
What is it?
Like comparing two groups of friends:
- Whoโs in BOTH groups? (intersect)
- Whoโs in EITHER group? (union)
- Whoโs ONLY in group A? (setdiff)
๐ญ Simple Examples
group_a <- c("Amy", "Bob", "Cat")
group_b <- c("Bob", "Cat", "Dan")
# Friends in BOTH
intersect(group_a, group_b)
# "Bob" "Cat"
# ALL friends combined
union(group_a, group_b)
# "Amy" "Bob" "Cat" "Dan"
# Only in group A
setdiff(group_a, group_b)
# "Amy"
# Only in group B
setdiff(group_b, group_a)
# "Dan"
๐ช Visual Summary
Group A: ๐ด Amy | ๐ก Bob | ๐ข Cat
Group B: | ๐ก Bob | ๐ข Cat | ๐ต Dan
intersect: ๐ก๐ข (Bob, Cat)
union: ๐ด๐ก๐ข๐ต (All four)
setdiff A-B: ๐ด (Just Amy)
setdiff B-A: ๐ต (Just Dan)
6๏ธโฃ Contingency Tables: Counting Your Stickers
What is it?
table() counts how many of each thing you have โ like organizing your sticker collection by color and shape!
๐จ Simple Example
# Our pets
pets <- data.frame(
animal = c("cat", "dog", "cat",
"dog", "cat"),
color = c("white", "brown", "brown",
"white", "white")
)
# Count by animal type
table(pets$animal)
# cat: 3, dog: 2
# Two-way table
table(pets$animal, pets$color)
๐ฏ Two-Way Result
brown white
cat 1 2
dog 1 1
๐ง Add Margins (Totals)
my_table <- table(pets$animal,
pets$color)
addmargins(my_table)
brown white Sum
cat 1 2 3
dog 1 1 2
Sum 2 3 5
7๏ธโฃ Table Manipulation: Arranging Your Display
Key Functions
| Function | What It Does |
|---|---|
prop.table() |
Show percentages |
margin.table() |
Get row/column totals |
addmargins() |
Add sum rows/columns |
ftable() |
Flatten multi-way tables |
๐ Proportion Tables
my_table <- table(pets$animal,
pets$color)
# Overall percentages
prop.table(my_table)
# Each cell / total
# Row percentages
prop.table(my_table, 1)
# Each row adds to 1
# Column percentages
prop.table(my_table, 2)
# Each column adds to 1
๐ช Row Percentages Example
brown white
cat 0.33 0.67 (1 brown, 2 white)
dog 0.50 0.50 (1 brown, 1 white)
๐ง Flatten Complex Tables
# 3-way table
t3 <- table(survey$gender,
survey$age,
survey$vote)
# Make it readable
ftable(t3)
๐ The Complete Recipe
graph TD A["๐ฆ Raw Data"] --> B["๐ subset"] B --> C["๐ง transform"] C --> D["๐ with/within"] D --> E{Need to combine?} E -->|Yes| F["๐ merge"] E -->|No| G["Compare sets?"] G -->|Yes| H["โ๏ธ Set Operations"] G -->|No| I["Count things?"] F --> I H --> I I -->|Yes| J["๐ table"] J --> K["๐จ Table Manipulation"] K --> L["โจ Beautiful Results!"]
๐ Quick Reference Card
| Task | Function | Example |
|---|---|---|
| Filter rows | subset() |
subset(df, x > 5) |
| Add columns | transform() |
transform(df, y=x*2) |
| Work inside | with() |
with(df, mean(x)) |
| Modify inside | within() |
within(df, y<-x*2) |
| Join tables | merge() |
merge(a, b, by="id") |
| Common items | intersect() |
intersect(v1, v2) |
| All items | union() |
union(v1, v2) |
| Difference | setdiff() |
setdiff(v1, v2) |
| Count | table() |
table(df$col) |
| Percentages | prop.table() |
prop.table(tbl) |
๐ You Did It!
Youโve just learned how to:
- โ Pick exactly what you need (subset)
- โ Transform and enhance data (transform)
- โ Work efficiently inside data (with/within)
- โ Combine data sources (merge)
- โ Compare groups (set operations)
- โ Count and summarize (contingency tables)
- โ Display beautifully (table manipulation)
Youโre now a Data Processing Chef! ๐จโ๐ณ
โData processing in R is like cooking โ once you know your tools, you can create anything!โ
