Gleb Satyukov
Senior Research Engineer | Data Science Instructor
Wednesdays between 3pm and 4pm on Zoom
Fridays between 3pm and 4pm on Zoom
Gleb’s personal Zoom link is: https://ucla.zoom.us/j/6935808910
This Wednesday between 11am and 12pm (noon) on Zoom
Other Wednesdays between 3pm and 4pm on Zoom
R Basics 1: https://environ-175.com/basics/1
R Basics 2: https://environ-175.com/basics/2
R Basics 3: https://environ-175.com/basics/3
R Basics 4: https://environ-175.com/basics/4
R Basics 5: https://environ-175.com/basics/5
R Advanced 1: https://environ-175.com/advanced/1
R Advanced 2: https://environ-175.com/advanced/2
R Advanced 3: https://environ-175.com/advanced/3
R Advanced 4: https://environ-175.com/advanced/4
R Advanced 5: https://environ-175.com/advanced/5
R Spatial 1: https://environ-175.com/spatial/1
R Spatial 2: https://environ-175.com/spatial/2
R Spatial 3: https://environ-175.com/spatial/3
R Spatial 4: https://environ-175.com/spatial/4
R Spatial 5: https://environ-175.com/spatial/5
Submit your project R script just like the data assignments on canvas, i.e. project-1.R
We will not be answering project related questions
You can discuss it with your classmates on Slack
You will have 48 hours to complete the project
Make sure you follow all of the best practices
Make sure your code runs from start to finish
without any errors or interruptions
Attention to detail
Clean your environment
Use proper file paths
Use proper code spacing
Use inline and block comments!!
Use correct variable names (lowercase)
Save charts programmatiaclly with ggsave
New function!
sample <- filter(data, variable == "keyword")
Logical Operators
Data Assignment
Best Practices
Project 1 Info
meps_data << read_csv("/Users/gleb/ENVIRON-175/meps_2019.csv")
^ ^ ^ ^ ^
| | | | |
(1) (2) (3) (4) (5)
10_age <- mutate(meps_data, 10_age = floor(age / 10) * 10)
^ ^ ^ ^ ^
| | | | |
(1) (2) (3) (4) (5)
new_data <- mutate(data, new_var == var_1 / var_2)
^ ^ ^ ^ ^
| | | | |
(1) (2) (3) (4) (5)
ggplot(data="expense_by_age10") * geom_point(aes(x = age10, y = emergency)
^ ^ ^ ^ ^
| | | | |
(1) (2) (3) (4) (5)
gsave("~/Documents/scatter_plot.png", height = 4, width = 6)
^ ^ ^ ^ ^
| | | | |
(1) (2) (3) (4) (5)
==
Equal to!=
Not equal to>
Greater than<
Less than>=
Greater than or equal to<=
Less than or equal toWe want a sample where the city is equal to Los Angeles, but the code has an error. Where is it?
la_data <- filter(data, city == Los Angeles )
^ ^ ^ ^ ^
| | | | |
(1) (2) (3) (4) (5)
Write the logical condition for this filter:
Correct Answer:
mom_hs == "Yes"
mom_hs != "No"
Write the logical condition for this filter:
Correct Answer:
gpa <= 199
gpa < 200
Write the logical condition for this filter:
Correct Answer:
mom_hs == dad_hs
dad_hs == mom_hs
AND Operator
gpa > 200 & gpa < 300
OR Operator
state == "Florida" | state == "California"
Correct way of combining comparison operators:
gpa > 200 & gpa < 300
✅ OK
200 < gpa < 300
❌ NOT OK
install.packages("forcats")
Factors are how R stores categorical data, e.g.
c("Low", "Medium", "High")
With the forcats
package you can:
forcats
Imagine that our data is survey responses like this:
responses <- factor(c(
"High", "Extreme", "Low", "Very High", "Extreme",
"Low", "Medium", "High", "Very High", "Extreme",
"High", "Low", "Extreme", "Very High", "Medium"
))
forcats
# Get a table with counts
fct_count(responses)
forcats
# Set a meaningful order
ordered_responses <- fct_relevel(responses,
"Low", "Medium", "High", "Very High", "Extreme"
)
# Check the levels
levels(ordered_responses)
forcats
grouped <- fct_collapse(responses,
"High+" = c("Very High", "Extreme")
)
drinking water contaminants pose a harm to public health
16 million cases of acute gastroenteritis that occur each year
while 9–45 million people are possibly affected
relatively few community water systems (3–10%) incur health-based violations
improved compliance is needed to ensure safe drinking water nationwide
1. Load packages up
2. Import fixed-width data
3. Clean up variable names
4. Filter to rural southern counties
5. Collapse down violations by state
6. Round violations for graphing label
7. Make a bar plot (and save/export it!)
Is probably already published on canvas
Assignment is going to be due this Friday
Due Date: Friday April 25, 2025 at 11:59 pm PT