Gleb Satyukov
Senior Research Engineer | Data Science Instructor
R Basics 1: https://environ-175.com/basics/1
R Basics 2: https://environ-175.com/basics/2
R Basics 3: https://environ-175.com/basics/3
R Basics 4: https://environ-175.com/basics/4
R Basics 5: https://environ-175.com/basics/5
R Advanced 1: https://environ-175.com/advanced/1
R Advanced 2: https://environ-175.com/advanced/2
R Advanced 3: https://environ-175.com/advanced/3
R Advanced 4: https://environ-175.com/advanced/4
R Advanced 5: https://environ-175.com/advanced/5
R Spatial 1: https://environ-175.com/spatial/1
R Spatial 2: https://environ-175.com/spatial/2
R Spatial 3: https://environ-175.com/spatial/3
R Spatial 4: https://environ-175.com/spatial/4
R Spatial 5: https://environ-175.com/spatial/5
Use OR logic
Use AND logic
Date Formats
Appending data
New library and functions!
library(lubridate) # For working with dates
select()
as.Date()
bind_rows()
case_when()
Building on top of our ifelse()
function
We specify what to do if a condition is true, and what to do if that condition is not truei, for example:
#############################
# IFELSE() USED AS A FUNCTION
#############################
ifelse(<SOME CONDITION>, <RESULT IF TRUE>, <RESULT IF FALSE>)
ifelse(temperature < 70, "Cold", "Warm")
|
symbol is used for OR logic
&
symbol is used for AND logic
!
symbol is used for NOT (negation) logic
e.g. !=
is used for NOT EQUALS TO conditions
We use parentheses to specify the order of operations:
x <- 5
if ((x > 3) & (x < 10)) {
print("x is between 3 and 10")
}
You can combine as many conditions as you want!
#####################
# STEP 7. EXTRACT DAY OF WEEK AND DROP WEEKENDS
#####################
#Create variable to determine the weekends
fbi_drop <- mutate(fbi_drop, dow=wday(fbi_date))
#Drop weekends
fbi_nowkd = filter(fbi_drop,
dow==2 | dow==3 | dow==4 | dow==5 | dow==6)
Certain values may evaluate to TRUE or FALSE:
0 evaluates to FALSE
1 evaluates to TRUE
Be careful using as.logical()
because it may not exactly behave the way you'd expect it to behave
It is much better to specify the conditions explicitely whenever possible
Try it yourself in the R Studio console
Person | Name | Age |
---|---|---|
A | Emma | 45 |
B | Emma | 30 |
C | Ryo | 30 |
Person | Name | Age |
---|---|---|
A | Emma | 45 |
B | Emma | 30 |
C | Ryo | 30 |
Person | Name | Age |
---|---|---|
A | Emma | 45 |
B | Emma | 30 |
C | Ryo | 30 |
Person | Name | Age |
---|---|---|
A | Emma | 45 |
B | Emma | 30 |
C | Ryo | 30 |
A function that diverts to other functions
Or a function that returns different values in different scenarios, different results in different cases
Same effect can be achieved using multiple if/else statements, potentially getting really complicated
Using case_when()
you apply multiple conditions to create new variables, for example:
people <- mutate(people, age_group = case_when(
age < 18 ~ "Child",
age >= 18 & age < 65 ~ "Adult",
age >= 65 ~ "Senior"
))
The tilde ~
symbol separates a condition (on the left) from a value to return (on the right)
DST happens on a different day every year
#######################
# STEP 5. FORMAT DATE of DST
#######################
fbi_clean <- mutate(fbi_clean,
change_date=case_when(
year==2021 ~ "March 14 2021",
year==2022 ~ "March 13 2022"
)
)
The tilde ~
symbol separates a condition (on the left) from a value to return (on the right)
In the U.S. we write dates as:
month/day/year
In Europe, you might write the date as:
day/month/year
as.character(...)
-> "character" type
as.numeric(...)
-> "numeric" type
as.Date(...)
-> "function" type??
Some examples of date formatting:
as.Date("03/14/2021", format = "%m/%d/%Y")
as.Date("14-Mar-21", format = "%d-%b-%y")
as.Date("Sunday, March 14, 2021", format = "%A, %B %d, %Y")
Note: as.Date()
can only handle dates, not times!
MM/DD/YYYY
03/14/2021
is formatted as %m/%d/%Y
DD/MM/YYYY
14/03/2021
is formatted as %d/%m/%Y
YYYY-MM-DD
2021-03-14
is formated as %Y-%m-%d
YYYY年MM月DD日
2021年03月14日
is %Y年%m月%d日
March, 14 2021
is %B, %d %Y
We are converting a date from text to Date object
(an internal representation of a Date object in R)
# Converting text to Date objects
as.Date("March 14 2021", format = "%B %d %Y")
We need to tell R which format our dates are stored in
In this example the format is: "%B %d %Y"
Stands for string parse time
This is a very common way to format date and time
The origins go back to PWB/UNIX 1.0 released in 1977
https://en.wikipedia.org/wiki/C_date_and_time_functions#Historystrptime
Format CodesCode | Meaning | Example |
---|---|---|
%Y | 4-digit year | 2021 |
%y | 2-digit year | 21 |
%m | 2-digit month | 03 |
%B | Full month name | March |
%b | Abbreviated month | Mar |
%d | 2-digit day | 14 |
%A | Full weekday name | Sunday |
%a | Abbreviated weekday | Sun |
%j | Day of year (001–366) | 073 |
%% | Literal percent sign | % |
There is no year 0!
https://en.wikipedia.org/wiki/Year_zeroY2K - Year 2000 problem
https://en.wikipedia.org/wiki/Year_2000_problemThe year 2038 - a problem?
https://en.wikipedia.org/wiki/Year_2038_problemlubridate
Date ParsingSome example functions used to parse dates:
library(lubridate)
ymd("2021-03-14") # Year-Month-Day
dmy("14/03/2021") # Day-Month-Year
mdy("March 14, 2021") # Month-Day-Year
lubridate
Date ParsingLubridate library can also handle time!
library(lubridate)
# Can also handle times
# With hours, minutes, seconds
ymd_hms("2021-03-14 09:45:00")
# Parsing multiple formats in a vector
parse_date_time(
c("14-03-2021", "2021/03/14"),
orders = c("dmy", "ymd")
)
And you don't have to specify format like %Y-%m-%d
There's the wday()
to get the day of the week
year()
to easily extract year from date
month()
to easily extract month
day()
to easily extract day of month
Note: you do need the variable to be in date format
There are benefits to storing data in separate files:
Latency optimization in download speeds
Space optimization through sharding
Offer user the choice to select
Suppose we have some Great Blue Heron nest counts from two separate years, stored in two different tables:
library(tibble)
library(dplyr)
# Data from 2022
herons_2022 <- tibble(
location = c("Via Marina", "Palawan Way"),
nests = c(14, 9),
year = 2022
)
And the data collected in 2023:
# Data from 2023
herons_2023 <- tibble(
location = c("Via Marina", "Palawan Way"),
nests = c(10, 6),
year = 2023
)
bind_rows()
bind_rows()
stacks the rows of the two data frames together, one on top of the other, for example:
# Combine data using bind_rows()
herons_all <- bind_rows(herons_2022, herons_2023)
print(herons_all)
Now let's say the 2023 data also includes the number of chicks observed, we do not have that data for 2022:
# Updated 2023 data with an extra column
herons_2023 <- tibble(
location = c("Via Marina", "Palawan Way"),
nests = c(10, 6),
chicks = c(5, 3),
year = 2023
)
Rows from 2022 don't have a chicks
column, so R inserted NA
for those missing values
# Combine again
herons_all <- bind_rows(herons_2022, herons_2023)
print(herons_all)
Lead -> Health + Brain Development
Temperature -> Aggression
Crop Loss -> Civil Unrest
Sunlight -> Crime?
Rule used for daylight savings time:
Second Sunday in March we move our clocks forward one hour
Question:
After DST, does the sun set one hour later or one hour earlier than the day before?
PDT - Pacific Daylight Time
PST - Pacific Standard Time
1. Set up R (e.g. libraries, clear environment, directory)
2. Import two FBI data files
3. Append data files (new!)
4. Clean up data, i.e. paste year, month, and day together
5. Create date variables (new!)
6. Calculate days to DST
7. Extract day of week and drop weekends
8. Make two-line scatterplot (and export it!)
Going to be published on canvas today
Assignment is going to be due next Monday
Due Date: Monday May 12, 2025 at 11:59 pm PT