Distribution Analysis

Back

Loading concept...

📊 Distribution Analysis in R: Your Data Detective Toolkit

Imagine you’re a detective. Your job? To understand the story hidden inside numbers. Today, we’ll learn three super-powers that help us crack the case: Quantiles, Correlation, and Scaling!


🎯 The Big Picture

Think of your data like a classroom of kids standing in a line from shortest to tallest.

  • Quantiles help you find who’s in the middle, who’s really tall, and who’s really short.
  • Correlation tells you if tall kids also tend to have big feet (do two things go together?).
  • Scaling is like converting everyone’s height to the same measuring tape so we can compare fairly.

📏 Part 1: Quantile Functions — Finding the Landmarks

What Are Quantiles?

Imagine 100 kids standing in a line from shortest to tallest. Quantiles are like signposts telling you where you are in the line.

  • 25th percentile (Q1): 25 kids are shorter than you
  • 50th percentile (Median): You’re right in the middle!
  • 75th percentile (Q3): 75 kids are shorter than you

🧪 R Example: Finding Quantiles

# Heights of 10 students (in cm)
heights <- c(120, 125, 130, 135, 140,
             145, 150, 155, 160, 165)

# Find the median (50th percentile)
median(heights)
# Result: 142.5

# Find Q1, Median, Q3
quantile(heights, c(0.25, 0.50, 0.75))
#   25%   50%   75%
# 128.75 142.5 156.25

📦 The quantile() Function

quantile(x, probs)
  • x = your data
  • probs = which percentiles you want (0 to 1)

🎁 Special Quantiles You’ll Use Often

Name Percentile R Code
Minimum 0% quantile(x, 0)
Q1 25% quantile(x, 0.25)
Median 50% quantile(x, 0.50)
Q3 75% quantile(x, 0.75)
Maximum 100% quantile(x, 1)

🔍 Quick Summary with fivenum()

fivenum(heights)
# [1] 120 128.75 142.5 156.25 165

This gives you: Min, Q1, Median, Q3, Max — all at once!


🔗 Part 2: Correlation and Covariance — Do Things Move Together?

The Story

Imagine two friends: Height and Shoe Size. When one friend gets bigger, does the other friend also get bigger? That’s what correlation measures!

Three Types of Relationships

graph TD A["Two Variables"] --> B["Positive Correlation"] A --> C["Negative Correlation"] A --> D["No Correlation"] B --> E["📈 Both go up together"] C --> F["📉 One up, other down"] D --> G["🎲 No pattern"]

🧪 R Example: Correlation

# Heights and shoe sizes of 5 people
height <- c(150, 160, 170, 180, 190)
shoe_size <- c(6, 7, 8, 9, 10)

# Calculate correlation
cor(height, shoe_size)
# Result: 1 (perfect positive!)

Understanding Correlation Values

Value Meaning Example
+1 Perfect positive Height ↔ Shoe size
0 No relationship Shoe size ↔ Favorite color
-1 Perfect negative Speed ↔ Travel time

🎯 Covariance: Correlation’s Cousin

Covariance also measures if things move together, but it’s in the original units (harder to interpret).

# Calculate covariance
cov(height, shoe_size)
# Result: 50

# Correlation is easier to understand!
# It's always between -1 and +1

When to Use Which?

  • Correlation (cor()): When you want to know HOW STRONG the relationship is (always -1 to +1)
  • Covariance (cov()): When you need the actual units (used in advanced math)

📊 Correlation Matrix: Many Variables at Once

# Three variables
age <- c(25, 30, 35, 40, 45)
income <- c(30, 45, 60, 75, 90)
savings <- c(5, 12, 20, 30, 42)

# Create a data frame
data <- data.frame(age, income, savings)

# See all correlations at once!
cor(data)
#           age   income  savings
# age      1.00    1.00    1.00
# income   1.00    1.00    1.00
# savings  1.00    1.00    1.00

⚖️ Part 3: Scaling Data — Making Fair Comparisons

The Problem

Imagine comparing:

  • A test score: 85 out of 100
  • A race time: 12 seconds
  • A weight: 50 kg

How do you compare these? They’re all in different units! Scaling puts everything on the same measuring stick.

Two Popular Scaling Methods

graph TD A["Scaling Methods"] --> B["Z-Score / Standardization"] A --> C["Min-Max Normalization"] B --> D["Mean = 0, SD = 1"] C --> E["Range 0 to 1"]

🧪 Z-Score Scaling with scale()

The Z-score tells you: “How many steps away from average are you?”

# Test scores
scores <- c(70, 80, 90, 100, 110)

# Scale the data
scaled_scores <- scale(scores)
print(scaled_scores)
#            [,1]
# [1,] -1.2649111
# [2,] -0.6324555
# [3,]  0.0000000
# [4,]  0.6324555
# [5,]  1.2649111

Understanding Z-Scores

Z-Score What It Means
0 Exactly average
+1 One step above average
-1 One step below average
+2 Very high (rare!)
-2 Very low (rare!)

🎯 Min-Max Scaling: 0 to 1

This squishes all values between 0 and 1.

# Custom min-max function
min_max <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

# Apply it
scores <- c(70, 80, 90, 100, 110)
min_max(scores)
# [1] 0.00 0.25 0.50 0.75 1.00

When to Use Each?

Method Best For Example
Z-Score Comparing how unusual values are “How far from average?”
Min-Max When you need 0-1 range Machine learning inputs

🎯 Quick Reference: All Functions

# QUANTILES
quantile(x, 0.5)    # Get median
quantile(x, c(0.25, 0.75))  # Get Q1 and Q3
fivenum(x)          # Min, Q1, Median, Q3, Max

# CORRELATION & COVARIANCE
cor(x, y)           # Correlation (-1 to +1)
cov(x, y)           # Covariance (original units)
cor(data_frame)     # Correlation matrix

# SCALING
scale(x)            # Z-score standardization
# Custom min-max: (x - min) / (max - min)

🏆 You Did It!

You just learned three powerful detective tools:

  1. Quantiles: Find where values stand in the lineup
  2. Correlation: Discover if things move together
  3. Scaling: Put everything on the same measuring stick

Now you can analyze any dataset like a pro! 🎉


🧠 Remember This Story

A teacher wanted to understand her class better. She used quantiles to find who scored in the top 25%. She checked correlation to see if study hours predicted test scores. Finally, she used scaling to fairly compare math and art scores. The end!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.