🌐 Tidyverse Extended: Bringing Outside Data Into R
The Story of the Data Explorer
Imagine you’re an explorer with a magical backpack. Inside R, you have amazing tools to sort, clean, and understand treasures. But what if your treasures are locked in different boxes outside—Excel files, websites, databases? Today, you’ll learn to open any box and bring data home!
🗂️ Excel File Operations
What Are Excel Files?
Excel files are like fancy notebooks with rows and columns. People use them everywhere—schools, offices, stores. They have names ending in .xlsx or .xls.
How to Read Excel Files
We use a special helper called readxl:
# First, get the helper
install.packages("readxl")
library(readxl)
# Read your Excel file
my_data <- read_excel("students.xlsx")
# See what's inside
head(my_data)
Reading Specific Sheets
Excel files can have multiple pages (sheets). Like chapters in a book!
# Read the second sheet
grades <- read_excel(
"school.xlsx",
sheet = "Grades"
)
# Or use sheet number
grades <- read_excel(
"school.xlsx",
sheet = 2
)
Writing to Excel
To save your work back to Excel, use writexl:
install.packages("writexl")
library(writexl)
# Save your data
write_xlsx(my_data, "results.xlsx")
Simple Example:
- You have a class list in Excel
- R reads it, adds test scores
- R saves a new Excel file with everything!
📋 JSON Operations
What is JSON?
JSON is like a labeled container system. Everything has a name and a value. Websites love JSON because it’s easy to read and send.
Think of it like this:
📦 Name: "Emma"
📦 Age: 10
📦 Hobbies: ["reading", "soccer"]
Reading JSON
install.packages("jsonlite")
library(jsonlite)
# From a file
pets <- fromJSON("pets.json")
# From a website
weather <- fromJSON(
"https://api.weather.com/today"
)
Writing JSON
Turn your R data into JSON:
# Create simple data
my_info <- list(
name = "Alex",
age = 12,
likes = c("pizza", "games")
)
# Convert to JSON text
json_text <- toJSON(my_info, pretty = TRUE)
print(json_text)
# Save to file
write_json(my_info, "alex.json")
Real Life Example:
- Weather apps send JSON data
- R reads it and shows you: “Today: 22°C, Sunny!”
🔤 Encoding Handling
What is Encoding?
Encoding is like a secret code for letters. Different countries use different codes. Sometimes words look like “café” but appear as “café” if the code is wrong!
Common Encodings
| Encoding | Use Case |
|---|---|
| UTF-8 | Works everywhere (best choice!) |
| Latin-1 | Old European files |
| Windows-1252 | Old Windows files |
Fixing Encoding Problems
# Read with correct encoding
data <- read_csv(
"french_menu.csv",
locale = locale(encoding = "UTF-8")
)
# Or try Latin-1 for old files
data <- read_csv(
"old_file.csv",
locale = locale(encoding = "Latin1")
)
Checking and Converting
# Check what encoding a file uses
guess_encoding("mystery_file.csv")
# Convert text encoding
library(stringi)
fixed_text <- stri_encode(
weird_text,
from = "Latin1",
to = "UTF-8"
)
Think of it like:
- French uses é, German uses ü, Japanese uses 日本
- UTF-8 is a universal translator that understands ALL!
🗄️ Database Connections
What is a Database?
A database is like a giant organized warehouse for data. Instead of thousands of Excel files, everything lives in one smart system.
Connecting to Databases
install.packages("DBI")
install.packages("RSQLite") # For SQLite
library(DBI)
# Connect to a database
con <- dbConnect(
RSQLite::SQLite(),
"my_database.sqlite"
)
Other Database Types
# PostgreSQL
install.packages("RPostgres")
con <- dbConnect(
RPostgres::Postgres(),
dbname = "shop",
host = "localhost",
user = "admin",
password = "secret123"
)
# MySQL
install.packages("RMySQL")
con <- dbConnect(
RMySQL::MySQL(),
dbname = "school",
host = "server.com",
user = "teacher",
password = "pass123"
)
Basic Operations
# List all tables
dbListTables(con)
# Read entire table
students <- dbReadTable(con, "students")
# Always disconnect when done!
dbDisconnect(con)
Like a Library:
- Connect = Get your library card
- Tables = Different book sections
- Disconnect = Return your card
🔍 SQL Queries in R
What is SQL?
SQL (say “sequel”) is the language databases understand. It’s like giving instructions to a librarian: “Find all books by Dr. Seuss published after 2000.”
Basic Queries
# Get all data from a table
result <- dbGetQuery(con,
"SELECT * FROM students"
)
# Get specific columns
names <- dbGetQuery(con,
"SELECT name, age FROM students"
)
# Filter with WHERE
teens <- dbGetQuery(con,
"SELECT * FROM students
WHERE age >= 13"
)
More Query Examples
# Sort results
sorted <- dbGetQuery(con,
"SELECT name, score
FROM students
ORDER BY score DESC"
)
# Count records
count <- dbGetQuery(con,
"SELECT COUNT(*) as total
FROM students"
)
# Group and summarize
by_class <- dbGetQuery(con,
"SELECT class,
AVG(score) as avg_score
FROM students
GROUP BY class"
)
Using dplyr with Databases
The coolest part? Use dplyr commands on databases!
library(dplyr)
library(dbplyr)
# Connect to table
students_db <- tbl(con, "students")
# Use familiar dplyr!
smart_kids <- students_db %>%
filter(score > 90) %>%
select(name, score) %>%
arrange(desc(score)) %>%
collect() # Bring to R
⚡ data.table Basics
What is data.table?
data.table is like R on a racing bike. When you have millions of rows, data.table zooms through them super fast!
Creating a data.table
install.packages("data.table")
library(data.table)
# Convert existing data
DT <- as.data.table(my_data)
# Read files fast
DT <- fread("huge_file.csv") # Super fast!
The Magic Syntax
data.table uses: DT[i, j, by]
- i = Which rows? (filter)
- j = What to do? (select/calculate)
- by = Group by what?
# Create example data
DT <- data.table(
name = c("Ana", "Bob", "Ana", "Bob"),
item = c("apple", "banana", "orange", "apple"),
price = c(1.5, 0.8, 2.0, 1.5)
)
Filtering (the i part)
# Get Ana's purchases
DT[name == "Ana"]
# Get items over $1
DT[price > 1]
Selecting & Calculating (the j part)
# Select columns
DT[, .(name, price)]
# Calculate
DT[, .(total = sum(price))]
# Create new column
DT[, expensive := price > 1]
Grouping (the by part)
# Total per person
DT[, .(total = sum(price)), by = name]
# Count per person
DT[, .N, by = name]
Why data.table is Amazing
| Task | data.frame | data.table |
|---|---|---|
| Read 1GB file | 60 seconds | 5 seconds |
| Group by | 30 seconds | 1 second |
| Memory used | 8 GB | 2 GB |
Think of it like:
- Regular R = Walking
- data.table = Sports car
🗺️ Putting It All Together
graph TD A["Outside World"] --> B{Data Source} B --> C["Excel Files"] B --> D["JSON/APIs"] B --> E["Databases"] C --> F["readxl"] D --> G["jsonlite"] E --> H["DBI + SQL"] F --> I["R Data Frame"] G --> I H --> I I --> J["Analyze with Tidyverse"] I --> K["Speed up with data.table"]
🌟 Quick Reference
| Task | Package | Main Function |
|---|---|---|
| Read Excel | readxl | read_excel() |
| Write Excel | writexl | write_xlsx() |
| Read JSON | jsonlite | fromJSON() |
| Write JSON | jsonlite | toJSON() |
| Fix encoding | readr | locale(encoding=) |
| Connect DB | DBI | dbConnect() |
| Query DB | DBI | dbGetQuery() |
| Fast data | data.table | fread(), DT[i,j,by] |
🎉 You Did It!
Now you can:
- ✅ Read and write Excel files
- ✅ Work with JSON data from anywhere
- ✅ Handle tricky text encodings
- ✅ Connect to real databases
- ✅ Write SQL queries in R
- ✅ Use data.table for lightning speed
You’re no longer limited to data inside R. The whole world of data is now yours to explore! 🚀
