What is the mean function in R?

The mean() function calculates the average by adding all values and dividing by count. It's like sharing candies equally among friends.

What's the difference between variance and standard deviation in R?

Variance (var) measures spread squared. Standard deviation (sd) is its square root, giving spread in the same units as your data.

What do cumulative functions do in R?

Cumulative functions keep running totals. cumsum() tracks running sums, cummax() tracks highest values so far, cummin() tracks lowest.

Statistical Measures in R | Descriptive Stats

Descriptive Statistics in R: Your Number Detective Toolkit

The Story of the Number Detective

Imagine you’re a detective, but instead of solving mysteries with clues, you solve puzzles with numbers! You have a big bag of marbles, and you want to describe them to your friend who can’t see them. How many? What’s the most common color? How spread out are the sizes?

That’s exactly what Descriptive Statistics does. It helps us describe and understand a bunch of numbers quickly!

🎯 Central Tendency Measures: Finding the “Middle” of Your Data

Think of Central Tendency like finding the center of a playground. Where do most kids hang out?

Mean (The Fair Share)

Imagine you and 4 friends have candies: 2, 4, 6, 8, 10.

If you shared them equally, how many would each person get?

candies <- c(2, 4, 6, 8, 10)
mean(candies)
# Result: 6

The mean is 6! Everyone gets 6 candies if we share fairly.

Median (The Middle Kid)

Line up your numbers from smallest to biggest. The median is the one standing right in the middle!

ages <- c(5, 8, 12, 15, 20)
median(ages)
# Result: 12

12 is the median! It’s the middle number when they’re lined up.

What if there are 6 numbers? We take the average of the two middle ones!

scores <- c(10, 20, 30, 40, 50, 60)
median(scores)
# Result: 35

Mode (The Popular Kid)

The mode is the number that shows up the most often. It’s the most popular!

votes <- c(1, 2, 2, 3, 2, 4, 2)
# Which number appears most?
# 2 appears 4 times - it's the mode!

# R doesn't have a built-in mode function
# Here's a simple way:
names(which.max(table(votes)))
# Result: "2"

📊 Dispersion Measures: How Spread Out Are Your Numbers?

Central Tendency tells us the center. Dispersion tells us how scattered the numbers are around that center.

Think of it like this: Are all the sheep standing close together, or are they spread across the whole field?

Variance (The Spread Squared)

Variance measures how far each number is from the mean, then squares it (to make all values positive).

heights <- c(4, 6, 8, 10, 12)
var(heights)
# Result: 10

Higher variance = more spread out!

Standard Deviation (The Spread in Real Units)

Standard deviation is the square root of variance. It’s easier to understand because it’s in the same units as your data!

heights <- c(4, 6, 8, 10, 12)
sd(heights)
# Result: 3.162278

Think of it like this:

Small SD = Numbers are close together (like a tight group of friends)
Big SD = Numbers are far apart (like kids scattered on a big playground)

📏 Range Functions: The Biggest and Smallest

Range (The Span)

Range tells you the difference between the biggest and smallest number.

temperatures <- c(15, 22, 18, 30, 12)
range(temperatures)
# Result: 12 30

# The actual range (difference):
diff(range(temperatures))
# Result: 18

The temperatures span from 12 to 30 — a range of 18 degrees!

Min and Max (The Extremes)

scores <- c(45, 78, 92, 56, 88)

min(scores)
# Result: 45 (lowest score)

max(scores)
# Result: 92 (highest score)

➕ Summation Functions: Adding Things Up

Sum (Total of Everything)

Like counting all your coins in a piggy bank!

coins <- c(5, 10, 25, 10, 5)
sum(coins)
# Result: 55

You have 55 cents total!

Prod (Multiply Everything)

Instead of adding, we multiply all numbers together.

numbers <- c(2, 3, 4)
prod(numbers)
# Result: 24 (because 2 × 3 × 4 = 24)

📈 Cumulative Functions: Running Totals

Cumulative means keeping a running total as you go along.

Cumsum (Running Sum)

Imagine you’re counting coins one by one:

coins <- c(5, 10, 15, 20)
cumsum(coins)
# Result: 5 15 30 50

Breakdown:

After 1st coin: 5
After 2nd coin: 5 + 10 = 15
After 3rd coin: 15 + 15 = 30
After 4th coin: 30 + 20 = 50

Cumprod (Running Multiplication)

numbers <- c(2, 3, 2)
cumprod(numbers)
# Result: 2 6 12

Breakdown:

After 1st: 2
After 2nd: 2 × 3 = 6
After 3rd: 6 × 2 = 12

Cummax (Running Maximum)

Keep track of the highest value so far!

scores <- c(5, 8, 3, 10, 7)
cummax(scores)
# Result: 5 8 8 10 10

The highest score keeps getting tracked!

Cummin (Running Minimum)

Keep track of the lowest value so far!

temps <- c(30, 25, 28, 20, 22)
cummin(temps)
# Result: 30 25 25 20 20

🔄 Difference Function: What Changed?

The diff() function tells you how much changed between each step.

Diff (The Change)

Imagine tracking your height each year:

height_cm <- c(100, 105, 112, 118)
diff(height_cm)
# Result: 5 7 6

Breakdown:

Year 1 to 2: You grew 5 cm
Year 2 to 3: You grew 7 cm
Year 3 to 4: You grew 6 cm

graph TD
    A["100 cm"] -->|+5| B["105 cm"]
    B -->|+7| C["112 cm"]
    C -->|+6| D["118 cm"]

Lag Differences

You can also find differences between values that are further apart:

values <- c(10, 20, 30, 40, 50)
diff(values, lag = 2)
# Result: 20 20 20

This compares each value with the one 2 steps back!

🧙‍♂️ Quick Summary: Your Statistical Toolkit

Category	Function	What It Does
Central	`mean()`	Average (fair share)
Central	`median()`	Middle value
Dispersion	`var()`	Spread squared
Dispersion	`sd()`	Spread in real units
Range	`range()`	Min and max
Range	`min()`/`max()`	Smallest/largest
Summation	`sum()`	Add all together
Summation	`prod()`	Multiply all together
Cumulative	`cumsum()`	Running total
Cumulative	`cumprod()`	Running product
Cumulative	`cummax()`	Running highest
Cumulative	`cummin()`	Running lowest
Difference	`diff()`	Change between values

🎮 Putting It All Together

Let’s be a Number Detective with a real example!

# Daily steps for a week
steps <- c(5000, 7500, 6000, 8000,
           4500, 9000, 7000)

# Central Tendency
mean(steps)    # 6714 average steps
median(steps)  # 7000 middle value

# Spread
sd(steps)      # 1544 step variation

# Range
range(steps)   # 4500 to 9000

# Total
sum(steps)     # 47000 steps all week!

# Running total
cumsum(steps)  # Daily accumulated steps

# Daily change
diff(steps)    # How steps changed each day

You’re now a Number Detective! You can describe any set of numbers like a pro!

🚀 Remember This!

Mean = Share everything equally
Median = Find the middle kid
SD = How spread out from the center
Range = Gap between smallest and biggest
Sum = Count everything up
Cumulative = Keep a running total
Diff = Track the changes

Now go explore your data and tell its story!

Statistical Measures

Unable to load concept

Coming Soon...

Descriptive Statistics in R: Your Number Detective Toolkit

The Story of the Number Detective

🎯 Central Tendency Measures: Finding the “Middle” of Your Data

Mean (The Fair Share)

Median (The Middle Kid)

Mode (The Popular Kid)

📊 Dispersion Measures: How Spread Out Are Your Numbers?

Variance (The Spread Squared)

Standard Deviation (The Spread in Real Units)

📏 Range Functions: The Biggest and Smallest

Range (The Span)

Min and Max (The Extremes)

➕ Summation Functions: Adding Things Up

Sum (Total of Everything)

Prod (Multiply Everything)

📈 Cumulative Functions: Running Totals

Cumsum (Running Sum)

Cumprod (Running Multiplication)

Cummax (Running Maximum)

Cummin (Running Minimum)

🔄 Difference Function: What Changed?

Diff (The Change)

Lag Differences

🧙‍♂️ Quick Summary: Your Statistical Toolkit

🎮 Putting It All Together

🚀 Remember This!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue