Gleb Satyukov
Senior Research Engineer | Data Science Instructor
R Basics 1: https://environ-175.com/basics/1
R Basics 2: https://environ-175.com/basics/2
R Basics 3: https://environ-175.com/basics/3
R Basics 4: https://environ-175.com/basics/4
R Basics 5: https://environ-175.com/basics/5
R Advanced 1: https://environ-175.com/advanced/1
R Advanced 2: https://environ-175.com/advanced/2
R Advanced 3: https://environ-175.com/advanced/3
R Advanced 4: https://environ-175.com/advanced/4
R Advanced 5: https://environ-175.com/advanced/5
R Spatial 1: https://environ-175.com/spatial/1
R Spatial 2: https://environ-175.com/spatial/2
R Spatial 3: https://environ-175.com/spatial/3
R Spatial 4: https://environ-175.com/spatial/4
R Spatial 5: https://environ-175.com/spatial/5
Changed the order of chapters (slightly)
Additional resources for outside of class
We updated the schedule to match Fall 24 class
Check the syllabus!!
We will be randomly(?) adding folks to teams
There will be 5 teams in total
Roughly 20 people in each team
There is a checklist.pdf
Use offline checklists to stay on track
It is rewarding to check things off
It is a To-Do as well as Ta-Da! 🙌
R vs R Studio
R is the programming language
R Studio is the frontend or IDE
Yes, R Studio has different themes!
Each OS has the same file system approach (FS)
UNIX-based systems use forward slashes ('/')
Windows uses backward slashes ('\')
Please, create a designated folder for this class
In other words, DO NOT save class related files in your Downloads (or the Desktop)
Space out your code to make it more readable
Use comments to clarify your code
Use proper variable names
Be consistent with your quotes (' vs ")
More on this later in this lecture
help()
orrr
?help
for example:
?read_csv
or:
?summary
Pre-Installed libraries
User Contributed libraries
library(dplyr)
-> loads the library quietly
require(dplyr)
-> confirms loading
use of quotes here appears to be optional?
detach('package:dplyr')
-> removes or unloads libraries
How do you clear the environment?
rm(list=ls())
How do you clear the console?
cat("\014")
How do you clear any plots?
dev.off()
# ────────────────────────────────────────────────
# Clears Out Everything
# ────────────────────────────────────────────────
clear_all <- function() {
rm(list = ls()) # clears the environment
cat('\014') # clears the console
dev.off() # clears the plots
}
# calling the function
clear_all()
Use lower case variable names
Separate words with underscores
This is also known as the snake_case
ALLCAPS
dot.case
camelCase
kebab-case
first_name <- "Alice"
✅ OK
temperature_2025 <- 76.5
✅ OK
2025_temperature <- 112.7
❌ NOT OK
Make sure that you have empty lines between meaningful blocks of code
This makes your code much easier to understand
Especially by someone else reading it!
Or even you yourself a year later 📅
library(dplyr)
library(ggplot2)
data <- mtcars
summary <- data %>% group_by(cyl) %>% summarise(avg_mpg = mean(mpg))
plot <- ggplot(summary, aes(x = factor(cyl), y = avg_mpg)) + geom_col()
print(plot)
# ────────────────────────────────────────────────
# Load libraries
# ────────────────────────────────────────────────
library(dplyr)
library(ggplot2)
# ────────────────────────────────────────────────
# Prepare data
# ────────────────────────────────────────────────
data <- mtcars
grouped <- group_by(data, cyl)
summary <- summarise(grouped, avg_mpg = mean(mpg), .groups = "drop")
# ────────────────────────────────────────────────
# Create plot
# ────────────────────────────────────────────────
plot <- ggplot(summary, aes(x = factor(cyl), y = avg_mpg)) +
geom_col() + labs(x = "Cylinders", y = "Average MPG")
# ────────────────────────────────────────────────
# Display plot
# ────────────────────────────────────────────────
print(plot)
.
We use # symbol for comments in R
Make sure that the comments are clarifying
Avoid comments that don't add anything new
Avoid very long comments that go off screen
speed <- 15 # mph
✅ OK
value <- 10 # assign 10
❌ NOT OK
color <- "turquoise" # make sure the color is turquoise because that is my favorite color and i love everything turquoise
❌ NOT OK
Grouping
Condensing
Summarizing
Aggregating
You want to summarize large datasets
You need to prepare data for visualizations
You want to reduce complexity for statistical analysis
You want to understand patterns instead of individual data points
# Step 1: Load libraries
library(readr)
library(dplyr)
library(ggplot2)
library(collapse)
Used for data manipulation, aggregation, and transformation
collap()
Main function for collapsing/aggregating data
GRP()
collap()
fmean()
and fsum()
🏢 Downtown LA
🏡 San Fernando Valley
🛫 LAX (the airport)
🚢 Port of Los Angeles
Date | Station | PM2.5 | Ozone |
---|---|---|---|
2025-01-01 | Downtown | 12.5 | 0.040 |
2025-01-01 | Valley | 20.3 | 0.050 |
2025-01-01 | LAX | 15.0 | 0.045 |
2025-01-01 | Port of LA | 22.1 | 0.055 |
2024-01-02 | Downtown | 14.2 | 0.039 |
Station | Month | Avg PM2.5 | Avg Ozone |
---|---|---|---|
Downtown | 2025-01 | 13.4 | 0.0395 |
Valley | 2025-01 | 20.3 | 0.0500 |
LAX | 2025-01 | 15.0 | 0.0450 |
Port of LA | 2025-01 | 22.1 | 0.0550 |
How did our parents education affect our own educational outcomes?
childid | child_gpa | mom_hsgrad | mom_schoolyrs |
---|---|---|---|
102 | 309 | No | 8 |
204 | 217 | Yes | 15 |
307 | 253 | Yes | 12 |
511 | 162 | Yes | 12 |
1124 | 234 | No | 6 |
1433 | 300 | Yes | 15 |
Collapsing data means summarizing individual records into group-level statistics.
Average GPA by Mother's High School Graduation
mom_hsgrad | mean_gpa | n (children) |
---|---|---|
No | 238.1 | 14 |
Yes | 286.8 | 85 |
Missing values
Weighting data
(out of scope)
Step 1. Load the necessary packages
Step 2. Use original NLSY data
Step 3. Collapse GPA by Mom's education
Step 4. Visualize results with a bar chart
Step 5. Save and submit that assignment-2.R
Step 6. ....... profit?
Correct file paths
Clean environment
Use code spacing
Use comments
Be precise
Published after the class today
Assignment is going to be due this Friday
Friday April 11, 2025 at 11:59 pm PT