Model Assessment

Back

Loading concept...

Model Assessment in R: Your Recipe Tasting Journey 🍳

Imagine you’re a chef who just created a new cake recipe. How do you know if it’s actually good? You taste it, ask others to taste it, and maybe even enter it in a baking contest. That’s exactly what Model Assessment is for your R models!


The Big Picture: Why Check Your Models?

Think of building a model like making a paper airplane. You can fold it beautifully, but the real test is: will it fly?

Model assessment answers three questions:

  1. How confident are we? (Confidence Intervals)
  2. Which model is best? (Model Selection)
  3. Will it work on new data? (Cross-validation)

🎯 Confidence Intervals: “How Sure Are We?”

The Story

Imagine you measure your friend’s height 10 times. You get slightly different numbers each time (maybe they stood taller some times!). The average is your best guess, but you’re not 100% sure.

A confidence interval is like saying: “I’m 95% sure my friend is between 165 cm and 170 cm tall.”

In R Models

When you build a model, your predictions come with uncertainty. Confidence intervals tell you the range where the true answer probably lives.

# Build a simple model
model <- lm(mpg ~ wt, data = mtcars)

# Get predictions WITH confidence
predict(model,
        newdata = data.frame(wt = 3),
        interval = "confidence")

Output looks like:

     fit      lwr      upr
  21.25    20.12    22.38

This means: “We predict 21.25 mpg, but we’re 95% confident the true value is between 20.12 and 22.38.”

The Wider the Interval, The Less Sure We Are

graph TD A["Narrow Interval"] --> B["Very Confident!"] C["Wide Interval"] --> D["Less Certain..."] E["More Data"] --> A F["Less Data"] --> C

Quick Tip: Confidence vs Prediction Intervals

Type What it measures Width
Confidence Average prediction Narrower
Prediction Individual prediction Wider
# Prediction interval (wider)
predict(model,
        newdata = data.frame(wt = 3),
        interval = "prediction")

🏆 Model Selection: “Which Model Wins?”

The Story

You made three different cakes: chocolate, vanilla, and strawberry. Which one should you enter in the contest? You need fair ways to compare them!

The Challenge

A complex model might memorize your data perfectly but fail on new data. A simple model might miss important patterns. We need the Goldilocks model: not too simple, not too complex—just right!

Tool 1: AIC (Akaike Information Criterion)

AIC is like a score. Lower is better. It balances fit and simplicity.

# Compare models
model1 <- lm(mpg ~ wt, data = mtcars)
model2 <- lm(mpg ~ wt + hp, data = mtcars)
model3 <- lm(mpg ~ wt + hp + disp, data = mtcars)

# Check AIC scores
AIC(model1)  # Maybe: 166.0
AIC(model2)  # Maybe: 156.7 <- Winner!
AIC(model3)  # Maybe: 158.2

Tool 2: BIC (Bayesian Information Criterion)

Similar to AIC, but punishes complexity more. Good when you want simpler models.

BIC(model1)  # Higher penalty for complexity
BIC(model2)
BIC(model3)

Tool 3: Adjusted R²

Regular R² always goes up when you add variables (even useless ones!). Adjusted R² only goes up if the new variable actually helps.

# Get adjusted R-squared
summary(model1)$adj.r.squared
summary(model2)$adj.r.squared
graph TD A["Start with Simple Model"] --> B{Add Variable} B --> C{Did Adj R² go UP?} C -->|Yes| D["Keep the Variable!"] C -->|No| E["Remove It!"] D --> B

The Golden Rule

Criterion What to Look For When to Use
AIC Lower is better General use
BIC Lower is better Want simpler models
Adj R² Higher is better Comparing nested models

🔄 Cross-Validation: “The Ultimate Test”

The Story

Imagine studying for a test using a practice exam. If you memorize the practice answers, you’ll ace that specific test—but fail any new questions!

Cross-validation is like having multiple practice exams. You train on some, test on others, and see how well you really learned.

Why Cross-Validate?

Your model might “cheat” by memorizing your training data. Cross-validation catches this by testing on data the model has never seen.

K-Fold Cross-Validation

Split your data into K equal parts (folds). Train on K-1 parts, test on the remaining 1. Repeat K times!

graph TD A["All Data"] --> B["Split into 5 Folds"] B --> C["Round 1: Train on 2,3,4,5 - Test on 1"] B --> D["Round 2: Train on 1,3,4,5 - Test on 2"] B --> E["Round 3: Train on 1,2,4,5 - Test on 3"] B --> F["Round 4: Train on 1,2,3,5 - Test on 4"] B --> G["Round 5: Train on 1,2,3,4 - Test on 5"] C --> H["Average All Test Scores"] D --> H E --> H F --> H G --> H

Doing It in R

library(caret)

# Set up 5-fold cross-validation
ctrl <- trainControl(
  method = "cv",
  number = 5
)

# Train with cross-validation
cv_model <- train(
  mpg ~ wt + hp,
  data = mtcars,
  method = "lm",
  trControl = ctrl
)

# See results
print(cv_model)

Leave-One-Out Cross-Validation (LOOCV)

The extreme version: train on all data except one point, test on that one point. Repeat for every point!

# LOOCV setup
ctrl_loo <- trainControl(
  method = "LOOCV"
)

loo_model <- train(
  mpg ~ wt + hp,
  data = mtcars,
  method = "lm",
  trControl = ctrl_loo
)

Which K to Choose?

K Value Pros Cons
5-Fold Fast, stable Slightly biased
10-Fold Good balance Standard choice
LOOCV Low bias Slow, high variance

🎪 Putting It All Together

Here’s how a data scientist thinks:

graph TD A["Build Multiple Models"] --> B["Check AIC/BIC Scores"] B --> C["Pick Top Candidates"] C --> D["Cross-Validate Each"] D --> E["Check Confidence Intervals"] E --> F["Choose Final Model!"]

Complete Example

library(caret)

# 1. Build candidate models
m1 <- lm(mpg ~ wt, data = mtcars)
m2 <- lm(mpg ~ wt + hp, data = mtcars)
m3 <- lm(mpg ~ wt + hp + qsec, data = mtcars)

# 2. Compare with AIC
cat("Model 1 AIC:", AIC(m1), "\n")
cat("Model 2 AIC:", AIC(m2), "\n")
cat("Model 3 AIC:", AIC(m3), "\n")

# 3. Cross-validate the best
ctrl <- trainControl(method = "cv", number = 5)

cv_m2 <- train(
  mpg ~ wt + hp,
  data = mtcars,
  method = "lm",
  trControl = ctrl
)

print(cv_m2$results)

# 4. Check confidence intervals
confint(m2)

🚀 Key Takeaways

Concept What It Does Remember It As
Confidence Intervals Shows uncertainty range “I’m 95% sure it’s between X and Y”
Model Selection (AIC/BIC) Picks best model “Lower score wins”
Cross-Validation Tests on unseen data “Multiple practice exams”

🎯 The Chef’s Final Recipe

  1. Don’t just taste your own cake — test it on strangers (cross-validation)
  2. Know your uncertainty — give a range, not just one number (confidence intervals)
  3. Compare fairly — use scores that balance fit and simplicity (AIC/BIC)

Now you’re ready to assess your R models like a pro! Your models will be robust, trustworthy, and ready for the real world. 🎉

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.