Gleb Satyukov
Senior Research Engineer | Data Science Instructor
R Basics 1: https://environ-175.com/basics/1
R Basics 2: https://environ-175.com/basics/2
R Basics 3: https://environ-175.com/basics/3
R Basics 4: https://environ-175.com/basics/4
R Basics 5: https://environ-175.com/basics/5
R Advanced 1: https://environ-175.com/advanced/1
R Advanced 2: https://environ-175.com/advanced/2
R Advanced 3: https://environ-175.com/advanced/3
R Advanced 4: https://environ-175.com/advanced/4
R Advanced 5: https://environ-175.com/advanced/5
R Spatial 1: https://environ-175.com/spatial/1
R Spatial 2: https://environ-175.com/spatial/2
R Spatial 3: https://environ-175.com/spatial/3
R Spatial 4: https://environ-175.com/spatial/4
R Spatial 5: https://environ-175.com/spatial/5
Wednesdays between 3pm and 4pm on Zoom
Fridays between 3pm and 4pm on Zoom
Gleb’s personal Zoom link is: https://ucla.zoom.us/j/6935808910
Mondays between 12pm (noon) and 1pm
Wednesdays between 11am and 12pm (noon)
Kaitlynn’s personal Zoom link is: https://ucla.zoom.us/j/8321830416
Best Practices (again)
Global Variables
String Operations
If/Else Logic
New functions! Add this to best practices
head()
ifelse()
Attention to detail
Clean your environment
Use proper file paths
Use proper code spacing
Use inline and block comments!!
Use correct variable names (lowercase)
Save charts programmatiaclly with ggsave
Using Global Variables
Set a directory using path_main
Inspecting the data using head()
Keep data in a dedicated data folder
Be consistent with your use of quotes (' vs ")
And more best practices coming soon!
Follow instructions in the assignments
Highly recommend checking it out!
https://style.tidyverse.org/files.htmlPaste function concatenates (combines) objects together after converting them to character vectors
paste("hello", "world")
[1] "hello world"
sep=" "
is a character string to separate the terms
# check the documentation
?paste
# paste function signature
paste(..., sep = " ", collapse = NULL, recycle0 = FALSE)
# paste0 is used when you don't want a separator
paste0(..., collapse = NULL, recycle0 = FALSE)
We are going to build a path to our data files
Create a separate directory for this assignment
Locate the directory where you stored the data
Append filenames to the main directory path
Helps us keep variable definitions short
/Users/gleb/Documents/Environ-175/Advanced-1/
/Users/gleb/Documents/Environ-175/Advanced-2/
/Users/gleb/Documents/Environ-175/Advanced-3/
/Users/gleb/Documents/Environ-175/Advanced-4/
/Users/gleb/Documents/Environ-175/Advanced-5/
C:/Users/gleb/Documents/Environ-175/Advanced-1/
C:/Users/gleb/Documents/Environ-175/Advanced-2/
C:/Users/gleb/Documents/Environ-175/Advanced-3/
C:/Users/gleb/Documents/Environ-175/Advanced-4/
C:/Users/gleb/Documents/Environ-175/Advanced-5/
Path to the folder with data files for the assignment:
# Path leading to the main assignment directory
path_main <- "/Users/gleb/Dropbox/UCLA/ENVIRON-175/Advanced-2/"
.
Path to the folder with data files for the assignment:
# Path leading to the main assignment directory
path_main <- "/Users/gleb/Dropbox/UCLA/ENVIRON-175/Advanced-2/"
# document 1 file path
document_path_1 <- paste0(path_main, "document_1.csv")
# document 2 file path
document_path_2 <- paste0(path_main, "document_2.csv")
Path to the folder with data in a dedicated data folder:
# Path leading to the main assignment directory
path_main <- "/Users/gleb/Dropbox/UCLA/ENVIRON-175/Advanced-2/"
# document 1 file path
document_path_1 <- paste0(path_main, "data/document_1.csv")
# document 2 file path
document_path_2 <- paste0(path_main, "data/document_2.csv")
Global Variables are set-up at the top
And reused throughout your script multiple times
# Setup Global Variables
main_color <- "blue"
accent_color <- "lightblue"
# Graph 1 with YVAR1
ggplot(DATA, aes(x=XVAR, y=YVAR1)) +
geom_point(color = main_color) +
geom_line(color = accent_color)
# Graph 2 with YVAR2
ggplot(DATA, aes(x=VAR1, y=YVAR2)) +
geom_point(color = main_color) +
geom_line(color = accent_color)
Other common string operations include:
Use nchar
to find the length of a string, like this:
####################
# Length of a string
####################
nchar("Hello World") # 11
Use paste
or paste0
to merge two strings together:
#####################
# Concatenate strings
#####################
paste("Hello", "World") # "Hello World"
paste("Hello", "World", sep = ", ") # "Hello, World"
paste0("Hello", "World") # "HelloWorld"
Split a string based on a split
parameter, like so:
################
# Split a string
################
strsplit("apple,banana,kiwi", ",")[[1]]
# [1] "apple" "banana" "kiwi"
Find and replace specific parts of a string:
##################
# Find and replace
##################
gsub("dog", "cat", "The quick brown fox jumps over the lazy dog")
# "The quick brown fox jumps over the lazy cat"
This should be self-explanatory
####################################
# Uppercase / Lowercase / Capitalize
####################################
toupper("hello") # "HELLO"
tolower("HELLO") # "hello"
tools::toTitleCase("hello world") # "Hello World"
This is really useful when your imported data has any leading or trailing whitespace that you don't want:
#################
# Trim whitespace
#################
trimws(" no space pls ") # "no space pls"
Getting a specific slice of a string:
###################
# Substring / slice
###################
substr("Environment", 1, 7) # "Environ"
substring("Environment", 1, 7) # "Environ"
These are used to find matching patterns in text
Returns -1
when specified pattern is not found in text
#####################
# Regular Expressions
#####################
regexpr("@", "gleb@ucla.edu")
# [1] 5
############################
# Global Regular Expressions
############################
gregexpr("n", "Environment")
# [1] 2 7 10
* grep
stands for global regular expression print
# Check if there are any numbers in this text
grepl("[0-9]", "Environ 175") # TRUE
# Replace any number with the letter R
gsub("[0-9]", "R", "Environ 101") # Environ RRR
# Find all strings that start with the letter A
grep("^A", c("Apple", "Banana", "Avocado"))
# Find index of all strings that end in 'ing'
grep("ing$", c("Run", "Swimming", "Eating")) # 2, 3
Check out the wikipedia page here:
https://en.wikipedia.org/wiki/Regular_expressionIf you want to test your regex patterns:
If/Else logic is typically used when you need to set a value based on a certain condition
Conditional logic is one of the building blocks of any programming language
In R we can use it to create categorical variables in our data, for example if a value is over a certain threshold
We can use if
, else if
, and else
to classify numeric values, for example:
# Temperature reading
temp <- 78
if (temp < 60) {
category <- "Cold"
} else if (temp < 80) {
category <- "Warm"
} else {
category <- "Hot"
}
Used when turning numeric data into categories
Recall how our filter function works:
####################
# FILTER BY DISTANCE
####################
data <- filter(data, distance < 50000) # 50km
.
Example of ifelse()
being used as a function:
#############################
# IFELSE() USED AS A FUNCTION
#############################
ifelse(<SOME CONDITION>, <RESULT IF TRUE>, <RESULT IF FALSE>)
.
A monitor is considered to be "Close" if the distance to the power plant is less than 50km (50000 meters)
##################################
# ADD CATEGORIES BASED ON DISTANCE
##################################
data50km <- mutate(all_data,
dist_group = ifelse(distance < 50000, "Close", "Far"))
.
Monitor is considered "Close" if it's distance is less than 50km
##################################
# ADD CATEGORIES BASED ON DISTANCE
##################################
data50km <- mutate(all_data,
dist_group = ifelse(distance < 50000, "Close", "Far"))
.
https://www.epa.gov/acidrain/acid-rain-program
ARP was enacted by the Federal Government in 1990
1995 - Phase 1 where ARP is regulating SO2 emissions of 110 largest coal-fired power plants
We'll be using EPA data from 1992 to 1998 which includes Phase 1 power plants
Power plants were allowed a certain amount of permits
Each permit allowed them to emit 1 ton of SO2
Reaching their limit, power plants could buy more permits from other plants that had left-overs
ARP easily passed through congress and the senate:
110 biggest SO2 polluting plants
1. Clean environment, load libraries
2. Import distance data and EPA data
3. Clean up data and substring to fix ID variable
4. Filter to only include SO2 readings
5. Join EPA and distance data by monitor ID
6. Create distance categories with if else logic
7. Collapse our SO2 data by year and distance category
8. Make a connected scatterplot (and save/export it!)
Going to be published on canvas today
Assignment is going to be due next Monday
Due Date: Monday May 5, 2025 at 11:59 pm PT