Gleb Satyukov
Senior Research Engineer | Data Science Instructor
R Basics 1: https://environ-175.com/basics/1
R Basics 2: https://environ-175.com/basics/2
R Basics 3: https://environ-175.com/basics/3
R Basics 4: https://environ-175.com/basics/4
R Basics 5: https://environ-175.com/basics/5
R Advanced 1: https://environ-175.com/advanced/1
R Advanced 2: https://environ-175.com/advanced/2
R Advanced 3: https://environ-175.com/advanced/3
R Advanced 4: https://environ-175.com/advanced/4
R Advanced 5: https://environ-175.com/advanced/5
R Spatial 1: https://environ-175.com/spatial/1
R Spatial 2: https://environ-175.com/spatial/2
R Spatial 3: https://environ-175.com/spatial/3
R Spatial 4: https://environ-175.com/spatial/4
R Spatial 5: https://environ-175.com/spatial/5
GGplot
Debugging
Reshaping data
Hexadecimal Colors
New library and functions!
library(tidyr)
pivot_wider()
pivot_longer()
Attention to detail
Clean your environment
Use proper file paths
Use proper code spacing
Use inline and block comments!!
Use correct variable names (lowercase)
Save charts programmatiaclly with ggsave
Using Global Variables
Set a directory using path_main
Inspecting the data using head()
Keep data in a dedicated data folder
Be consistent with your use of quotes (' vs ")
And more best practices coming soon!
Follow instructions in the assignments exactly
geom_text()
Coercion refers to the implicit or explicit conversion of an object's type (class) to another, often to ensure compatibility with a function or operation.
For example, converting from Character to Numeric:
text <- "10"
number <- as.numeric(text)
class(number)
# [1] "numeric"
numbers <- c("1", "2", "3")
real_numbers <- as.numeric(numbers)
class(real_numbers)
# [1] "numeric"
If character values are not valid numbers, converting to numeric will return NA
and show you a warning:
as.numeric("asdf")
[1] NA
Warning message:
NAs introduced by coercion
Converting from Numeric to Character:
numbers <- c(10, 20, 30)
text <- as.character(numbers)
# [1] "10" "20" "30"
as.character("asdf")
# [1] "asdf"
as.character(FALSE)
# [1] "FALSE"
as.character(NA)
return?First mention of debugging:
The first parameter is always data
aes = aesthetic
geom = geometry
labs = labels
...
R Basics 4: https://environ-175.com/basics/4/#/30
ggplot(data = so2_data, aes(x = year, y = value)) +
geom_line(
data = filter(so2_data, dist_group == "Close"),
aes(color = "Close")
) +
geom_point(
data = filter(so2_data, dist_group == "Close"),
aes(color = "Close")
) +
labs(x = "Year", y = "SO2 Value")
"Close"
to the color aesthetic"Close"
is a string, not a variable — it is treated as a constant
ggplot(data, aes(x = year, y = value, color = dist_group)) +
geom_line(.....) +
geom_point(.....) +
labs(x = "Year", y = "Value")
dist_group
Counting continues with letters A through F
Starting at 0 being darkest, ending with F as lightest
0 1 2 3 4 5 6 7 8 9 A B C D E F
16 * 16 = 256
1 bit = 0 or 1
1 byte = 8 bits
Each byte has 2 ^ 8 = 256 unique options
#000000
- black
#FFFFFF
- white
#FF0000
- red
#00FF00
- green
#0000FF
- blue
#2774AE
- UCLA Blue
Make sure you are using an appropriate palette
https://colormoods.co/Brexit raised trade costs to other European countries
This should reduce exports from the UK to Europe
Brexit might have also reduced trade to non-EU countries because the EU had trade agreements
1. Leads to specialization in labor markets
2. Has a big impact on the environment
Goods being transported long distances would add to global pollution levels
Some countries have lax environmental regulations, so importing goods from abroad could "offshore" pollution to poorer countries
We will use WTO data for the years 2010 through 2022
This data include exports of most countries
https://stats.wto.org/This data is in a so-called long format:
country | year | exports |
---|---|---|
Italy | 2010 | 2.3 |
Italy | 2011 | 5.0 |
Italy | 2012 | 3.6 |
UK | 2010 | 4.4 |
UK | 2011 | 1.0 |
UK | 2012 | 2.9 |
This data is in a so-called wide format:
country | 2010 | 2011 | 2012 |
---|---|---|---|
Italy | 2.3 | 5.0 | 3.6 |
UK | 4.4 | 1.0 | 2.9 |
This data is also in a wide format:
year | Italy | UK | ... |
---|---|---|---|
2010 | 2.3 | 4.4 | ... |
2011 | 5.0 | 1.0 | ... |
2012 | 3.6 | 2.9 | ... |
You can conduct analysis on both long and wide format data, but long format is almost always preferred
People often store data in wide format, and convert it to long format after importing
The process for converting the data from wide to long is known as "reshaping"
We need to specify what the column names (not the values) represent
In this case, 2010, 2011, and 2012 - represent "year"
country | 2010 | 2011 | 2012 |
---|---|---|---|
Italy | 2.3 | 5.0 | 3.6 |
UK | 4.4 | 1.0 | 2.9 |
And these values here represent "export" data
country | 2010 | 2011 | 2012 |
---|---|---|---|
Italy | 2.3 | 5.0 | 3.6 |
UK | 4.4 | 1.0 | 2.9 |
This is our data after reshaping it from wide to long:
country | year | exports |
---|---|---|
Italy | 2010 | 2.3 |
Italy | 2011 | 5.0 |
Italy | 2012 | 3.6 |
UK | 2010 | 4.4 |
UK | 2011 | 1.0 |
UK | 2012 | 2.9 |
1. Clean the Environment
2. Load all required libraries
3. Import international trade data
4. Reshape year columns from wide to long
5. Collapse-sum exports by country and year
6. Rescale our exports variable
7. Make two-line scatterplot (and export/save it!)
Difference in differences
https://en.wikipedia.org/wiki/Difference_in_differences
We need to compare UK trade to a different country (control group) with similar trend before Brexit (i.e. treatment)
The hard part is choosing the correct control group
Going to be published on canvas today
Assignment is going to be due this Friday
Due Date: Friday May 9, 2025 at 11:59 pm PT