Statistical Measures

Back

Loading concept...

Descriptive Statistics in R: Your Number Detective Toolkit

The Story of the Number Detective

Imagine you’re a detective, but instead of solving mysteries with clues, you solve puzzles with numbers! You have a big bag of marbles, and you want to describe them to your friend who can’t see them. How many? What’s the most common color? How spread out are the sizes?

That’s exactly what Descriptive Statistics does. It helps us describe and understand a bunch of numbers quickly!


🎯 Central Tendency Measures: Finding the “Middle” of Your Data

Think of Central Tendency like finding the center of a playground. Where do most kids hang out?

Mean (The Fair Share)

Imagine you and 4 friends have candies: 2, 4, 6, 8, 10.

If you shared them equally, how many would each person get?

candies <- c(2, 4, 6, 8, 10)
mean(candies)
# Result: 6

The mean is 6! Everyone gets 6 candies if we share fairly.

Median (The Middle Kid)

Line up your numbers from smallest to biggest. The median is the one standing right in the middle!

ages <- c(5, 8, 12, 15, 20)
median(ages)
# Result: 12

12 is the median! It’s the middle number when they’re lined up.

What if there are 6 numbers? We take the average of the two middle ones!

scores <- c(10, 20, 30, 40, 50, 60)
median(scores)
# Result: 35

Mode (The Popular Kid)

The mode is the number that shows up the most often. It’s the most popular!

votes <- c(1, 2, 2, 3, 2, 4, 2)
# Which number appears most?
# 2 appears 4 times - it's the mode!

# R doesn't have a built-in mode function
# Here's a simple way:
names(which.max(table(votes)))
# Result: "2"

📊 Dispersion Measures: How Spread Out Are Your Numbers?

Central Tendency tells us the center. Dispersion tells us how scattered the numbers are around that center.

Think of it like this: Are all the sheep standing close together, or are they spread across the whole field?

Variance (The Spread Squared)

Variance measures how far each number is from the mean, then squares it (to make all values positive).

heights <- c(4, 6, 8, 10, 12)
var(heights)
# Result: 10

Higher variance = more spread out!

Standard Deviation (The Spread in Real Units)

Standard deviation is the square root of variance. It’s easier to understand because it’s in the same units as your data!

heights <- c(4, 6, 8, 10, 12)
sd(heights)
# Result: 3.162278

Think of it like this:

  • Small SD = Numbers are close together (like a tight group of friends)
  • Big SD = Numbers are far apart (like kids scattered on a big playground)

📏 Range Functions: The Biggest and Smallest

Range (The Span)

Range tells you the difference between the biggest and smallest number.

temperatures <- c(15, 22, 18, 30, 12)
range(temperatures)
# Result: 12 30

# The actual range (difference):
diff(range(temperatures))
# Result: 18

The temperatures span from 12 to 30 — a range of 18 degrees!

Min and Max (The Extremes)

scores <- c(45, 78, 92, 56, 88)

min(scores)
# Result: 45 (lowest score)

max(scores)
# Result: 92 (highest score)

➕ Summation Functions: Adding Things Up

Sum (Total of Everything)

Like counting all your coins in a piggy bank!

coins <- c(5, 10, 25, 10, 5)
sum(coins)
# Result: 55

You have 55 cents total!

Prod (Multiply Everything)

Instead of adding, we multiply all numbers together.

numbers <- c(2, 3, 4)
prod(numbers)
# Result: 24 (because 2 × 3 × 4 = 24)

📈 Cumulative Functions: Running Totals

Cumulative means keeping a running total as you go along.

Cumsum (Running Sum)

Imagine you’re counting coins one by one:

coins <- c(5, 10, 15, 20)
cumsum(coins)
# Result: 5 15 30 50

Breakdown:

  • After 1st coin: 5
  • After 2nd coin: 5 + 10 = 15
  • After 3rd coin: 15 + 15 = 30
  • After 4th coin: 30 + 20 = 50

Cumprod (Running Multiplication)

numbers <- c(2, 3, 2)
cumprod(numbers)
# Result: 2 6 12

Breakdown:

  • After 1st: 2
  • After 2nd: 2 × 3 = 6
  • After 3rd: 6 × 2 = 12

Cummax (Running Maximum)

Keep track of the highest value so far!

scores <- c(5, 8, 3, 10, 7)
cummax(scores)
# Result: 5 8 8 10 10

The highest score keeps getting tracked!

Cummin (Running Minimum)

Keep track of the lowest value so far!

temps <- c(30, 25, 28, 20, 22)
cummin(temps)
# Result: 30 25 25 20 20

🔄 Difference Function: What Changed?

The diff() function tells you how much changed between each step.

Diff (The Change)

Imagine tracking your height each year:

height_cm <- c(100, 105, 112, 118)
diff(height_cm)
# Result: 5 7 6

Breakdown:

  • Year 1 to 2: You grew 5 cm
  • Year 2 to 3: You grew 7 cm
  • Year 3 to 4: You grew 6 cm
graph TD A["100 cm"] -->|+5| B["105 cm"] B -->|+7| C["112 cm"] C -->|+6| D["118 cm"]

Lag Differences

You can also find differences between values that are further apart:

values <- c(10, 20, 30, 40, 50)
diff(values, lag = 2)
# Result: 20 20 20

This compares each value with the one 2 steps back!


🧙‍♂️ Quick Summary: Your Statistical Toolkit

Category Function What It Does
Central mean() Average (fair share)
Central median() Middle value
Dispersion var() Spread squared
Dispersion sd() Spread in real units
Range range() Min and max
Range min()/max() Smallest/largest
Summation sum() Add all together
Summation prod() Multiply all together
Cumulative cumsum() Running total
Cumulative cumprod() Running product
Cumulative cummax() Running highest
Cumulative cummin() Running lowest
Difference diff() Change between values

🎮 Putting It All Together

Let’s be a Number Detective with a real example!

# Daily steps for a week
steps <- c(5000, 7500, 6000, 8000,
           4500, 9000, 7000)

# Central Tendency
mean(steps)    # 6714 average steps
median(steps)  # 7000 middle value

# Spread
sd(steps)      # 1544 step variation

# Range
range(steps)   # 4500 to 9000

# Total
sum(steps)     # 47000 steps all week!

# Running total
cumsum(steps)  # Daily accumulated steps

# Daily change
diff(steps)    # How steps changed each day

You’re now a Number Detective! You can describe any set of numbers like a pro!


🚀 Remember This!

  • Mean = Share everything equally
  • Median = Find the middle kid
  • SD = How spread out from the center
  • Range = Gap between smallest and biggest
  • Sum = Count everything up
  • Cumulative = Keep a running total
  • Diff = Track the changes

Now go explore your data and tell its story!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.