ENVIRON-175

Programming with Big Environmental Datasets



Gleb Satyukov
Senior Research Engineer | Data Science Instructor


Join Our Slack Community

click here to join

Slides


R Basics 1: https://environ-175.com/basics/1

R Basics 2: https://environ-175.com/basics/2

R Basics 3: https://environ-175.com/basics/3

R Basics 4: https://environ-175.com/basics/4

R Basics 5: https://environ-175.com/basics/5

Slides


R Advanced 1: https://environ-175.com/advanced/1

R Advanced 2: https://environ-175.com/advanced/2

R Advanced 3: https://environ-175.com/advanced/3

R Advanced 4: https://environ-175.com/advanced/4

R Advanced 5: https://environ-175.com/advanced/5

Slides


R Spatial 1: https://environ-175.com/spatial/1

R Spatial 2: https://environ-175.com/spatial/2

R Spatial 3: https://environ-175.com/spatial/3

R Spatial 4: https://environ-175.com/spatial/4

R Spatial 5: https://environ-175.com/spatial/5

Slido


https://app.sli.do/event/mafjaRPZwqeoqcL5xmTMoP

Schedule


Schedule


Updated Syllabus


Updated additional resources list!

https://environ-175.com/syllabus

Join Our Slack Community

click here to join

Agenda


🔥 File Paths

🔥 Saving in the Cloud

🔥 Creating New Variables

🔥 Function Signature Decomposition

🔥 File Formats: TXT vs CSV vs TSV

🔥 Mutate Function and more on ggplot

🔥 R Basics 3: Assignment Stuff

Filepaths


Each OS has the same file system approach (FS)

UNIX-based systems use forward slashes ('/')

Windows uses backward slashes ('\')

Paths


Absolute vs Relative paths

Example:



/Users/gleb/Documents/Environ-175/Basics-2/assignment-2.R
          

Example:



/Users/gleb/Documents/Environ-175/Basics-1/assignment-1.R
          

/Users/gleb/Documents/Environ-175/Basics-2/assignment-2.R
          

/Users/gleb/Documents/Environ-175/Basics-3/assignment-3.R
          

/Users/gleb/Documents/Environ-175/Basics-4/assignment-4.R
          

/Users/gleb/Documents/Environ-175/Basics-5/assignment-5.R
          

Example:



C:/Users/gleb/Documents/Environ-175/Basics-1/assignment-1.R
          

C:/Users/gleb/Documents/Environ-175/Basics-2/assignment-2.R
          

C:/Users/gleb/Documents/Environ-175/Basics-3/assignment-3.R
          

C:/Users/gleb/Documents/Environ-175/Basics-4/assignment-4.R
          

C:/Users/gleb/Documents/Environ-175/Basics-5/assignment-5.R
          

Examples on Windows:


Preferred way in R on Windows:


read_csv("C:/Users/johndoe/Documents/Environ-175/Basics-1/air_quality.csv")
          

Or double backslashes (less common):


read_csv("C:\\Users\\johndoe\\Documents\\Environ-175\\Basics-1\\air_quality.csv")
          

Examples on Unix-Based machines:


Example file path on any Ubuntu/CentOS/macOS/etc:



read_csv("/Users/janedoe/Documents/Environ-175/Basics-1/air-quality.csv")
          

Absolute Paths


Start at the root of the filesystem

Also known as the /

Relative Paths


🔥 Start at the "current" folder

🔥 Current folder is relative to the current file

🔥 The file that is the one being executed right now

Shortcuts


.          is the current directory

..       is one level up ↑

Shortcuts


./          is the current directory

../       is one level up ↑

❌ NOT OK (In This Class)


a/b

a//b

a///////b

a/./b

a/../a/b

When In Doubt


🔥 Absolute paths

🔥 Easier to understand

🔥 Explicit is better than implicit

Above the Cloud










Options

Microsoft OneDrive

Google Drive

Dropbox

iCloud

...

NordLocker

BackBlaze

Proton

Cloud Reliance


It's better than nothing?

It will affect your filepaths/ file structure

Choices: What do you save?


You can't back up the entire world

So which folders are you going to save?

Variables

Variable Names



first_name <- "Alice"
          


temperature_2025 <- 76.5
          

Variable Names


No abbreviations, please

Spell out whatever you are trying to say

Specificity


Make sure your variable names are specific

This way you avoid accidentally overwriting them


there are only two hard things in computer science

Example:


What does this code do?

(if you had to guess)


a1581f2a5 <- 20
bcc1f489b <- 35
cee4d0fe2 <- a1581f2a5 * bcc1f489b

print(cee4d0fe2)

          

Example:


What does this code do?



a <- 20
b <- 35
c <- a * b

print(c)

          

Example:


What does this code do?



hourly_wages <- 20
hours_worked <- 35
total_wages <- hourly_wages * hours_worked

print(total_wages)
          

Really Bad Example:


line 25:


data <- read_csv("/Users/Jim/Docs/ENV175/Basics-2/nlsy.csv")
          

line 26:


data <- collap(data, child_gpa ~ mom_hsgrad, FUN=fmean)
          

REALLY NOT OK

Slido

What kind of variable names are best?

Clearing your environment

Clearing "Programmatically"

How do you clear the environment?


rm(list=ls())
          

How do you clear the console?


cat("\014")
          

How do you clear any plots?


dev.off()
          

Functions




# ────────────────────────────────────────────────
# Clears Out Everything
# ────────────────────────────────────────────────
clear_all <- function() {
  rm(list = ls()) # clears the environment
  cat('\014') # clears the console
  dev.off() # clears the plots
}

# calling the function
clear_all()

          

What Is a Function?


A function is a reusable block of code that does something specific.

The keyword here really is to be reusable and specific

It's like a recipe that you can use again and again! 👨‍🍳

Function Decomposition


my_function <- function(arg1, arg2 = "default") {
  # Do something with arg1 and arg2
}
          

my_function – the function name

arg1 – a required input

arg2 = "default"

                     – optional input with default value

What is a Function Signature?

It is the part of the function that tells you:

  • What the function is called
  • What inputs (arguments) are expected
  • What the function returns (if anything) ➡️

        mean(x, trim = 0, na.rm = FALSE)
          

E.G. is the signature for the built-in mean() function.

For Example:


        mean(x, trim = 0, na.rm = FALSE) -> Integer
          

This is the signature for the built-in mean() function.

It tells you:

  • What the function is called
  • What necessary inputs (arguments) it expects
  • What optional inputs (arguments) it expects
  • What it returns (most of the time)

Pure Functions

What Is a Pure Function?

A pure function is a function that:

  • Doesn't change anything outside itself (no side effects)
  • Always returns the same output for the same input

        add_numbers <- function(x, y) {
          return(x + y)
        }
          

This is a pure function: no surprises

Mutate Function in R Studio

What Does mutate() Do?

mutate() adds new columns to a data frame

Originally: this is in the dplyr package.


  library(dplyr)
  new_data <- mutate(mtcars, km_per_liter = mpg * 0.425144)
          

Here, we're converting miles per gallon (mpg) to kilometers per liter (km/L) 🚗

Try Yourself: Add a New Column

Let's add a new column that estimates fuel cost per 100 miles:


        mutate(mtcars,
               cost_per_100mi = (100 / mpg) * 4.50)
          
  • Assuming gas is $4.50 per gallon
  • We're using the car's MPG to calculate cost

Add Multiple Columns


        mutate(mtcars,
               km_per_liter = mpg * 0.425144,
               cost_per_100mi = (100 / mpg) * 4.50)
          

Each column can build on others you just created in the same mutate() call

🌽 What's a Bushel?

A bushel is an old-school unit of measurement used mainly for dry stuff like grain, fruit, or veggies.

📏 It's equal to about 9? gallons (around 35 liters).

That's like 2 full backpacks worth of apples! 🍎🍎

Bushel

Why Should You Care?

  • Farmers use it to sell crops 🌾
  • Food data often reports in bushels 📊
  • It's part of understanding environmental and agricultural datasets 🧑‍🌾

So when we say "U.S. corn yield was 170 bushels per acre"… now you know! 🌽

R Basics 3: Assignment


Published already

Assignment is going to be due tonight

Tuesday April 15, 2025 at 11:59 pm PT