Gleb Satyukov
Senior Research Engineer | Data Science Instructor
R Basics 1: https://environ-175.com/basics/1
R Basics 2: https://environ-175.com/basics/2
R Basics 3: https://environ-175.com/basics/3
R Basics 4: https://environ-175.com/basics/4
R Basics 5: https://environ-175.com/basics/5
R Advanced 1: https://environ-175.com/advanced/1
R Advanced 2: https://environ-175.com/advanced/2
R Advanced 3: https://environ-175.com/advanced/3
R Advanced 4: https://environ-175.com/advanced/4
R Advanced 5: https://environ-175.com/advanced/5
R Spatial 1: https://environ-175.com/spatial/1
R Spatial 2: https://environ-175.com/spatial/2
R Spatial 3: https://environ-175.com/spatial/3
R Spatial 4: https://environ-175.com/spatial/4
R Spatial 5: https://environ-175.com/spatial/5
Reminder about the best practices
Troubleshooting / Debugging
Reading other types of files
Rounding numbers in R
MEPS Data Assignment
More about ggplot2
New function!
ggsave(filepath, height = 4, width = 6)
Class announcements are in #general
#classroom is used during class
#team-1 through #team-5
Use proper file paths
Clean your environment
Use proper code spacing
Use inline and block comments
Save charts programmatiaclly
Attention to detail
Review your raw or clean data
Inspect variables in the console
Read the error messages in the console
Inspect the variables in the Environment panel
Don't be afraid of error messages
Check the documentation with:
?help
my_function <- function(arg1, arg2 = "default") {
# Do something with arg1 and arg2
}
my_function
– the function name
arg1
– a required input
arg2 = "default"
– optional input with default value
It is the part of the function that tells you:
mean(data, trim = 0, na.rm = FALSE)
this is is the signature for the mean()
function
mean(data, trim = 0, na.rm = FALSE) -> Integer
This is the signature for the built-in mean()
function.
It tells you:
Different ways to load different types of data
Depending on type of data, amount of data
readr
package
library(readr)
data <- read_csv("data/environment.csv")
readr
package
library(readr)
data <- read_delim("data/environment.txt", delim = "\t")
readxl
package
library(readxl)
data <- read_excel("data/environment_data.xlsx", sheet = "Sheet1")
data <- read.csv("data/environment.csv")
sf
package
library(sf)
shape_data <- st_read("data/shapefile.shp")
raster
package
library(raster)
raster_data <- raster("data/satellite.tif")
Function | File Type | Package | Notes |
---|---|---|---|
read_csv() | .csv | readr | Faster |
read_delim() | .txt, .tsv | readr | Custom delimiters |
read_excel() | .xls, .xlsx | readxl | For Excel sheets |
read.csv() | .csv | base R | Slower |
st_read() | .shp | sf | Spatial data |
raster() | .tif | raster | Raster data |
Publishing data samples every Tuesday since 2018
https://github.com/rfordatascience/tidytuesday/tree/main/data/2025/2025-04-15
The first parameter is always data
aes = aesthetic
geom = geometry
labs = labels
...
data from 2019
data in CSV format
filename: meps_2019.csv
https://meps.ahrq.gov/mepsweb/Different Categories of expenses
We will look at these categories by age
1. Clean the environment
2. Import packages/libraries
3. Review and clean up the original data using
rename()
4. Add a new field using
mutate()
5. Collapse the data using
collap()
6. Visualize and save the chart with:
ggplot(data, aes(x, y, ...)) +
geom_point(...) +
geom_line(...) +
labs(...) +
theme...
ggsave(filepath, width = number, height = number)
This is one object with two Y variables, inpatient and emergency.
And, the X is 9 age bins.
Question: What are the dimensions of this data?
Correct answer:
3 columns and 9 rows
1. Add new column using
mutate()
2. Group the data using
collap()
Formula Syntax:
[Summarize these columns] ~ [Group by]
Example:
inpatient + emergency ~ age10
We need to round to the nearest 10 digit age
Rounding is usually easy for people, not so much with computers
If I told you to round the number 42 to the nearest 10, you'd know it is 40
Rounding is done with floor and ceiling functions
floor(4.2) = 4
ceiling(4.2) = 5
floor(42)
Correct answer:
Answer: 42
floor(42/10)
Correct answer:
Answer: 4
Multiply by 10 if we want the function to return 40
floor(42/10) * 10 = 40
Is already published on canvas
Assignment is going to be due this Monday
Due Date: Monday April 21, 2025 at 11:59 pm PT