ENVIRON-175

Programming with Big Environmental Datasets



Gleb Satyukov
Senior Research Engineer | Data Science Instructor


https://app.sli.do/event/kPtPRhnZhMMjBjVWXPqWdU

Andorra


Population: ~87,486

Land Area: ~180 sq miles

Wikipedia: https://en.wikipedia.org/wiki/Andorra

Map: https://maps.app.goo.gl/E9FskZssZZZed9QQ9

Slides


R Basics 1: https://environ-175.com/basics/1

R Basics 2: https://environ-175.com/basics/2

R Basics 3: https://environ-175.com/basics/3

R Basics 4: https://environ-175.com/basics/4

R Basics 5: https://environ-175.com/basics/5

Slides


R Advanced 1: https://environ-175.com/advanced/1

R Advanced 2: https://environ-175.com/advanced/2

R Advanced 3: https://environ-175.com/advanced/3

R Advanced 4: https://environ-175.com/advanced/4

R Advanced 5: https://environ-175.com/advanced/5

Slides


R Spatial 1: https://environ-175.com/spatial/1

R Spatial 2: https://environ-175.com/spatial/2

R Spatial 3: https://environ-175.com/spatial/3

R Spatial 4: https://environ-175.com/spatial/4

R Spatial 5: https://environ-175.com/spatial/5

Schedule

Schedule

Reminder about the best practices

Clean your environment

Use proper file paths, use data folder

Use proper code spacing, use even more spacing!

Use inline and block comments!!

Use correct variable names (lowercase and underscores)

Save charts programmatiaclly with ggsave

Save final data programmatiaclly with write_csv

More Best Practices

Using Global Variables

Set a directory using path_main

Keep your data in a dedicated data folder

Inspect the data after loading using head()

Be consistent with your use of quotes (' vs ")

Make sure to export both graphs and final data (using write_csv(data, path))

Follow instructions in the assignments exactly

Agenda for today


Import raster data

Import vector data

Coordinate Reference Systems

Plot maps with rasters and vectors

New libraries!


library(terra)
library(tidyterra)
library(sf)
            

Learning Objectives


We will learn how to combine and manipulate different spatial data objects

As well as combining spatial objects with tables and other rectangular type data

ArcGIS (or QGIS)


Pros:
- Drawing elements, e.g. roads
- Viewing different layers
- Clicking on cells or shapes


Cons:
- ArcGIS is proprietory software
- GUI-Based

R Tools for GIS


Free and Open Source
- Easier for statistical analysis
- Code-driven, familiar tools
- Transparency, reproducibility, and automation
- Great for building data science workflows

Spatial Data


There are 2 distinct classes of spatial data:

Raster data

Vector data


Each type of data will need to be treated differently!

We will learn a new set of operations for spatial data

Raster files

Windy

Vector files

Vectors are a series of points

Rasters vs Vectors


One important difference between rasters and vectors is that rasters give a value for every pixel on the map

Convesely, vector points, lines, and polygons don't usually indicate a value

Point Data


a type of vector data

contains lat and long

can be stored as a csv






Shape files


Shape files are a common format for storing vector data

Shape files come with a set of different files

Typically they come in a single zipped bundle

Shape files


borders.zip

    - borders.cpg
    - borders.dbf
    - borders.prj
    - borders.shp
    - borders.shx

What are these different files?


File Extension Description
borders.shp .shp Main file — stores the actual shapes (geometry)
borders.shx .shx Index file — helps software locate features quickly
borders.dbf .dbf Attribute table — stores data about each shape
(like country names, population, etc.)
borders.prj .prj Projection info — defines the coordinate reference system
(CRS: for example WGS84 for latitude/longitude)
borders.cpg .cpg (Optional) Character encoding for text data

🌐 Coordinate Reference System


Some of the more common CRS are the World Geodetic System (WGS84), the North American Datum 1983 (NAD83), and Universal Transverse Mercator (UTM)


Overview of Coordinate Reference Systems in R

How to Read a Shapefile in R


You can load the shapefile using the sf package:


        library(sf)
        hawaii_borders <- st_read("path/to/folder/borders.shp")
          

  • R will automatically load all of the associated files
  • Make sure all files stay in the same folder and have the same base name (e.g. borders.*)

Viewing imported data


Note that instead of using our traditional View() function, we are now using the built-in or Base R plot() function to inspect our shapesfile data


#####################
# Inspecting the data
#####################
head(hawaii_borders)
plot(hawaii_borders)
          

Best Practice


Our new best practice is to check which CRS is used

CRS has to match across different spatial objects


#######################
# STEP 4. CHECK / FIX CRS
#######################

crs(plastics, describe=TRUE)
crs(hawaii_borders, describe=TRUE)
          

Shape files


Cartographic Boundary Files


We'll need to use Hawaii borders data from the Census:

https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html

Slido










🌐 Geographic Coordinate System


One of the most common systems is based on latitude and longitude:

- Latitude tells you how far north or south a place is from the Equator — the imaginary line that circles the Earth halfway between the poles
- Longitude tells you how far east or west a place is from the prime meridian, which runs from pole to pole through Greenwich, England

Map Projections


Mercator projection preserves direction and is useful for navigation. But distances and areas are distorted, especially near the polar regions: https://en.wikipedia.org/wiki/Mercator_projection

Gall-Peters projections: https://en.wikipedia.org/wiki/Gall%E2%80%93Peters_projection

Other projections: https://en.wikipedia.org/wiki/List_of_map_projections

Is there a perfect map projection?

There is no perfect map projection because we are represent a 3D surface of our spherical Earth onto a 2D surface — which will always introduce distortion

Each projection must sacrifice accuracy in at least one of these areas:

- Shape
- Area
- Distance
- Direction

Tracking

Plastics










Slido










Tethered caps


Mandatory in European Union starting July 2024

https://en.wikipedia.org/wiki/Bottle_cap#Tethered_Caps

EU restrictions on certain single-use plastics

Great Pacific Garbage Patch


The aim of this project is to locate the Great Pacific Garbage Patch

It is an accumulation of marine debris, rimarily consisting of plastics

This garbage patch is located somewhere in the North Pacific Ocean

https://en.wikipedia.org/wiki/Great_Pacific_Garbage_Patch

Yikes!


The Great Pacific Garbage Patch poses significant environmental threats

Animals can ingest the plastics, harm to marine life

Introduces harmful chemicals into the marine food chain

animals can get tangled up in the plastics and die

How big is it?


The patch is estimated to be larger than the size of Texas, though its exact size and boundaries can vary due to factors such as wind and ocean currents

We are going to see if we can identify it's size and location on June 1 2017, using NASA estimates derived from satellite data

Microplastics


The estimates of microsplastics concentrates come from NASA's CYGNSS project


Scientists Use NASA Satellite Data to Track Ocean Microplastics From Space: https://www.nasa.gov/centers-and-facilities/goddard/scientists-use-nasa-satellite-data-to-track-ocean-microplastics-from-space/

Tracking Plastics


Emergence of a neopelagic community through the establishment of coastal species on the high seas


Slido










Final Result

Interpreting the map


The varying shades of red illustrate concentration of plastics/microplastic

There are about 4 million microplastic particles (about 1mm in size) per square kilometer in the worst spots

Note: this data is not an aerial photograph or a satellite image as you might see from space

Workplan / Checklist


1. Clean up environment

2. Load required libraries

3. Import raster plastics data

4. Import vector border data

5. Check the Coordinate Reference System

6. Plot our map with ggplot()

7. Export plot with ggsave()

R Spatial 1: Assignment 11


Will be published on canvas today

Assignment is going to be due this Friday

Due Date: Friday May 23, 2025 at 11:59 pm PT