Performance

Back

Loading concept...

🚀 Advanced R Programming: Performance

Imagine your R code is a chef in a kitchen. A fast chef knows where every ingredient is, uses the right tools, and never wastes a single move. Today, we’ll teach your R code to cook like a pro!


🍳 The Kitchen Analogy

Think of your computer like a kitchen:

  • Memory = Your counter space (where you prep food)
  • Vectorization = Using a food processor instead of chopping by hand
  • Optimization = Finding the fastest recipe
  • Profiling = Using a timer to see what takes longest

Let’s make your R code a master chef!


📦 Memory Management

What is Memory?

Memory is like your kitchen counter. You only have so much space!

Simple Example:

  • If you put too many bowls on the counter, you can’t work
  • Your computer works the same way with data

Why Does Memory Matter?

# Bad: Making copies everywhere
x <- 1:1000000
y <- x      # Looks innocent...
y[1] <- 0   # R copies the WHOLE thing!

When you change y, R makes a complete copy of x. That’s like photocopying a whole cookbook just to fix one typo!

Smart Memory Tips

graph TD A["Create Data"] --> B{Need to Modify?} B -->|Yes| C["Modify In-Place"] B -->|No| D["Share Memory"] C --> E["Efficient!"] D --> E

Tip 1: Remove What You Don’t Need

# Free up space!
rm(big_data)
gc()  # Garbage collection = cleaning

Tip 2: Check Your Memory Usage

# How big is my data?
object.size(my_data)

# See all objects
ls()

Tip 3: Pre-allocate Space

# Bad: Growing a vector
result <- c()
for(i in 1:1000) {
  result <- c(result, i)  # Slow!
}

# Good: Pre-allocate
result <- numeric(1000)
for(i in 1:1000) {
  result[i] <- i  # Fast!
}

🎯 Key Insight: Pre-allocation is like setting out all your bowls before cooking. No running to the cabinet mid-recipe!


⚡ Vectorization Benefits

What is Vectorization?

Imagine you need to peel 100 potatoes:

  • Loop way: Peel one, put down, pick up next, peel…
  • Vectorized way: Machine peels all 100 at once!

R is AMAZING at doing things all at once.

The Magic of Vectors

# Slow loop way (peeling one by one)
numbers <- 1:1000000
result <- numeric(1000000)
for(i in 1:1000000) {
  result[i] <- numbers[i] * 2
}

# Fast vectorized way (all at once!)
result <- numbers * 2

The vectorized way is 10-100x faster!

Why Vectorization is Fast

graph TD A["Loop"] --> B["Check i"] B --> C["Get value"] C --> D["Calculate"] D --> E["Store"] E --> F["Repeat 1M times"] G["Vectorized"] --> H["Send all to CPU"] H --> I["CPU does all at once"] I --> J["Done!"]

Common Vectorized Functions

Instead of Loop Use This
for + sum() sum(x)
for + mean() mean(x)
for + comparison x > 5
for + math x * 2

Real Example:

# Find all numbers > 50
numbers <- 1:100

# Loop (slow)
big_ones <- c()
for(n in numbers) {
  if(n > 50) big_ones <- c(big_ones, n)
}

# Vectorized (fast!)
big_ones <- numbers[numbers > 50]

🎯 Key Insight: If you’re writing a loop in R, ask yourself: “Can I do this all at once?”


🏎️ Performance Optimization

The Golden Rules

  1. Measure first, optimize second
  2. Don’t optimize code that runs once
  3. Make it work, then make it fast

Common Speed Killers

graph TD A["Slow Code"] --> B["Growing Objects"] A --> C["Unnecessary Copies"] A --> D["Loops over Vectors"] A --> E["Reading Files Repeatedly"]

Quick Wins

1. Use Built-in Functions

# Slow
my_sum <- 0
for(x in numbers) my_sum <- my_sum + x

# Fast (built-in!)
my_sum <- sum(numbers)

2. Avoid Growing Objects

# Bad: List grows each time
results <- list()
for(i in 1:1000) {
  results[[i]] <- do_something(i)
}

# Better: Use lapply
results <- lapply(1:1000, do_something)

3. Read Data Once

# Bad: Reading in a loop
for(i in 1:10) {
  data <- read.csv("file.csv")  # Re-reads!
}

# Good: Read once, use many
data <- read.csv("file.csv")
for(i in 1:10) {
  process(data)  # Uses cached data
}

The apply Family

Function When to Use
lapply Apply to each list item
sapply Same, but simplify result
vapply Same, but specify output
mapply Multiple inputs
# Square each number
numbers <- list(1, 2, 3, 4, 5)
squares <- lapply(numbers, function(x) x^2)
# Result: list(1, 4, 9, 16, 25)

🎯 Key Insight: R’s built-in functions are written in C. They’re MUCH faster than loops!


🔍 Profiling Code

What is Profiling?

Profiling = Finding the slow parts of your code

Like using a stopwatch to time each step of cooking!

The Simple Way: system.time()

# How long does this take?
system.time({
  result <- sum(1:10000000)
})
# user  system elapsed
# 0.05   0.00    0.05
  • user: CPU time for your code
  • system: CPU time for system tasks
  • elapsed: Real wall-clock time

The Pro Way: Rprof()

# Start profiling
Rprof("my_profile.out")

# Run your code
my_slow_function()

# Stop profiling
Rprof(NULL)

# See results
summaryRprof("my_profile.out")

Visual Profiling with profvis

# Install once
install.packages("profvis")

# Profile with pretty pictures!
library(profvis)
profvis({
  # Your code here
  data <- read.csv("big_file.csv")
  result <- process(data)
})

This shows a beautiful flame graph of where time is spent!

Reading Profile Results

graph TD A["Total Time: 10 sec"] --> B["read_data: 6 sec"] A --> C["process: 3 sec"] A --> D["save: 1 sec"] B --> E["Focus here first!"]

The 80/20 Rule: Usually 20% of your code takes 80% of the time. Find that 20%!

Benchmarking with microbenchmark

library(microbenchmark)

# Compare two approaches
microbenchmark(
  loop = {
    s <- 0
    for(i in 1:1000) s <- s + i
  },
  vectorized = sum(1:1000),
  times = 100
)

This runs each version 100 times and shows you statistics!

🎯 Key Insight: Never guess where your code is slow. Measure it!


🎓 Putting It All Together

The Performance Checklist

graph TD A["Slow Code?"] --> B["Profile First"] B --> C["Find Bottleneck"] C --> D{What's Slow?} D -->|Memory| E["Check Copies"] D -->|Loops| F["Try Vectorization"] D -->|Functions| G["Use Built-ins"] E --> H["Optimize"] F --> H G --> H H --> I["Profile Again"] I --> J{Fast Enough?} J -->|No| B J -->|Yes| K["Done!"]

Real-World Example

Before (Slow):

# Process 1 million rows
result <- c()
for(i in 1:nrow(data)) {
  if(data$value[i] > 100) {
    result <- c(result, data$value[i] * 2)
  }
}

After (Fast):

# Vectorized approach
result <- data$value[data$value > 100] * 2

Speed Improvement: 100x faster!


🌟 Summary

Topic Key Takeaway
Memory Pre-allocate, remove unused, avoid copies
Vectorization Do things all at once, not one by one
Optimization Use built-ins, avoid growing objects
Profiling Measure first, then optimize

Your New Superpowers

  1. âś… You know how to check memory usage
  2. âś… You can write vectorized code
  3. âś… You understand the apply family
  4. âś… You can profile and find slow code

“The best code is code that doesn’t waste a single CPU cycle—just like the best chef doesn’t waste a single ingredient!”

Go forth and write FAST R code! 🚀

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.