Skip to main content
NSF NEON, Operated by Battelle

Main navigation

  • About Us
    • Overview
      • Spatial and Temporal Design
      • History
    • Vision and Management
    • FAQ
    • Contact Us
      • Contact NEON Biorepository
      • Field Offices
    • User Accounts
    • Staff
    • Code of Conduct

    About Us

  • Data & Samples
    • Data Portal
      • Explore Data Products
      • Data Availability Charts
      • Spatial Data & Maps
      • Document Library
      • API & GraphQL
      • Prototype Data
      • External Lab Data Ingest (restricted)
    • Data Themes
      • Atmosphere
      • Biogeochemistry
      • Ecohydrology
      • Land Cover and Processes
      • Organisms, Populations, and Communities
    • Samples & Specimens
      • Discover and Use NEON Samples
        • Sample Types
        • Sample Repositories
        • Sample Explorer
        • Megapit and Distributed Initial Characterization Soil Archives
      • Sample Processing
      • Sample Quality
      • Taxonomic Lists
    • Collection Methods
      • Protocols & Standardized Methods
      • Airborne Remote Sensing
        • Flight Box Design
        • Flight Schedules and Coverage
        • Daily Flight Reports
          • AOP Flight Report Sign Up
        • Camera
        • Imaging Spectrometer
        • Lidar
      • Automated Instruments
        • Site Level Sampling Design
        • Sensor Collection Frequency
        • Instrumented Collection Types
          • Meteorology
          • Phenocams
          • Soil Sensors
          • Ground Water
          • Surface Water
      • Observational Sampling
        • Site Level Sampling Design
        • Sampling Schedules
        • Observation Types
          • Aquatic Organisms
            • Aquatic Microbes
            • Fish
            • Macroinvertebrates & Zooplankton
            • Periphyton, Phytoplankton, and Aquatic Plants
          • Terrestrial Organisms
            • Birds
            • Ground Beetles
            • Mosquitoes
            • Small Mammals
            • Soil Microbes
            • Terrestrial Plants
            • Ticks
          • Hydrology & Geomorphology
            • Discharge
            • Geomorphology
          • Biogeochemistry
          • DNA Sequences
          • Pathogens
          • Sediments
          • Soils
            • Soil Descriptions
        • Optimizing the Observational Sampling Designs
    • Data Notifications
    • Data Guidelines and Policies
      • Acknowledging and Citing NEON
      • Publishing Research Outputs
      • Usage Policies
    • Data Management
      • Data Availability
      • Data Formats and Conventions
      • Data Processing
      • Data Quality
      • Data Product Bundles
      • Data Product Revisions and Releases
        • Release 2021
        • Release 2022
        • Release 2023
        • Release 2024
        • Release-2025
      • NEON and Google
      • Externally Hosted Data

    Data & Samples

  • Field Sites
    • About Field Sites and Domains
    • Explore Field Sites
    • Site Management Data Product

    Field Sites

  • Impact
    • Observatory Blog
    • Case Studies
    • Papers & Publications
    • Newsroom
      • NEON in the News
      • Newsletter Archive
      • Newsletter Sign Up

    Impact

  • Resources
    • Getting Started with NEON Data & Resources
    • Documents and Communication Resources
      • Papers & Publications
      • Document Library
      • Outreach Materials
    • Code Hub
      • Code Resources Guidelines
      • Code Resources Submission
      • NEON's GitHub Organization Homepage
    • Learning Hub
      • Science Videos
      • Tutorials
      • Workshops & Courses
      • Teaching Modules
    • Research Support Services
      • Field Site Coordination
      • Letters of Support
      • Mobile Deployment Platforms
      • Permits and Permissions
      • AOP Flight Campaigns
      • Research Support FAQs
      • Research Support Projects
    • Funding Opportunities

    Resources

  • Get Involved
    • Advisory Groups
      • Science, Technology & Education Advisory Committee (STEAC)
      • Innovation Advisory Committee (IAC)
      • Technical Working Groups (TWG)
    • Upcoming Events
    • NEON Ambassador Program
      • Exploring NEON-Derived Data Products Workshop Series
    • Research and Collaborations
      • Environmental Data Science Innovation and Inclusion Lab
      • Collaboration with DOE BER User Facilities and Programs
      • EFI-NEON Ecological Forecasting Challenge
      • NEON Great Lakes User Group
      • NEON Science Summit
      • NCAR-NEON-Community Collaborations
        • NCAR-NEON Community Steering Committee
    • Community Engagement
      • How Community Feedback Impacts NEON Operations
    • Science Seminars and Data Skills Webinars
      • Past Years
    • Work Opportunities
      • Careers
      • Seasonal Fieldwork
      • Internships
        • Intern Alumni
    • Partners

    Get Involved

  • My Account
  • Search

Search

Getting Started with the R Programming Language

R is a versatile, open source programming language that was specifically designed for data analysis. R is extremely useful for data management, statistics and analyzing data.

This tutorial should be seem more as a reference on the basics of R and not a tutorial for learning to use R. Here we define many of the basics, however, this can get overwhelming if you are brand new to R.

Learning Objectives

After completing this tutorial, you will be able to:

  • Use basic R syntax
  • Explain the concepts of objects and assignment
  • Explain the concepts of vector and data types
  • Describe why you would or would not use factors
  • Use basic few functions

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.


Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.

The Very Basics of R

R is a versatile, open source programming language that was specifically designed for data analysis. R is extremely useful for data management, statistics and analyzing data.

**Cool Fact:** R was inspired by the programming language S.

R is:

  • Open source software under a GNU General Public License (GPL).
  • A good alternative to commercial analysis tools. R has over 5,000 user contributed packages (as of 2014) and is widely used both in academia and industry.
  • Available on all platforms.
  • Not just for statistics, but also general purpose programming.
  • Supported by a large and growing community of peers.

Introduction to R

You can use R alone or with a user interace like RStudio to write your code. Some people prefer RStudio as it provides a graphic interface where you can see what objects have been created and you can also set variables like your working directory, using menu options.

Learn more about RStudio with their online learning materials.

We want to use R to create code and a workflow is more reproducible. We can document everything that we do. Our end goal is not just to "do stuff" but to do it in a way that anyone can easily and exactly replicate our workflow and results -- this includes ourselves in 3 months when the paper reviews come back!

Code & Comments in R

Everything you type into an R script is code, unless you demark it otherwise.

Anything to the right of a # is ignored by R. Use these comments within the code to describe what it is that you code is doing. Comment liberally in your R scripts. This will help you when you return to it and will also help others understand your scripts and analyses.

# this is a comment. It allows text that is ignored by the program.
# for clean, easy to read comments, use a space between the # and text. 

# there is a line of code below this comment
 a <- 1 + 2

Basic Operations in R

Let's take a few moments to play with R. You can get output from R simply by typing in math

# basic math
3 + 5

## [1] 8

12 / 7

## [1] 1.714286

or by typing words, with the command writeLines(). Words that you want to be recognized as text (as opposed to a field name or other text that signifies an object) must be enclosed within quotes.

# have R write words

writeLines("Hello World")

## Hello World

We can assign our results to an object and name the object. Objects names cannot contain spaces.

# assigning values to objects 
secondsPerHour <- 60 * 60

hoursPerYear <- 365 * 24


# object names can't contain spaces.  Use a period, underscore, or camelCase to 
# create longer names
temp_HARV <- 90
par.OSBS <- 180

We can then return the value of an object we created.

secondsPerHour

## [1] 3600

hoursPerYear

## [1] 8760

Or create a new object with existing ones.

secondsPerYear <- secondsPerHour * hoursPerYear

secondsPerYear

## [1] 31536000

The result of the operation on the right hand side of <- is assigned to an object with the name specified on the left hand side of <-. The result could be any type of R object, including your own functions (see the Build & Work With Functions in R tutorial).

Assignment Operator: Drop the Equals Sign

The assignment operator is <-. It assigns values on the right to objects on the left. It is similar to = but there are some subtle differences. Learn to use <- as it is good programming practice. Using = in place of <- can lead to issues down the line.

# this is preferred syntax
a <- 1 + 2 

# this is NOT preferred syntax
a = 1 + 2 
**Typing Tip:** If you are using RStudio, you can use a keyboard shortcut for the assignment operator: **Windows/Linux: "Alt" + "-"** or **Mac: "Option" + "-"**.

List All Objects in the Environment

Some functions are the same as in other languages. These might be familiar from command line.

  • ls(): to list objects in your current environment.
  • rm(): remove objects from your current environment.

Now try them in the console.

# assign value "5" to object "x"
x <- 5
ls()
    
# remove x
rm(x)

# what is left?
ls()
    
# remove all objects
rm(list = ls())

ls()

Using rm(list=ls()), you combine several functions to remove all objects. If you typed x on the console now you will get Error: object 'x' not found'.

Data Types and Structures

To make the best of the R language, you'll need a strong understanding of the basic data types and data structures and how to operate on those. These are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners.

First, everything in R is an object. But there are different types of objects. One of the basic differences in in the data structures which are different ways data are stored.

R has many different data structures. These include

  • atomic vector
  • list
  • matrix
  • data frame
  • array

These data structures vary by the dimensionality of the data and if they can handle data elements of a simgle type (homogeneous) or multiple types (heterogeneous).

Dimensions Homogenous Heterogeneous
1-D atomic vector list
2-D matrix data frame
none array

Vectors

A vector is the most common and basic data structure in R and is the workhorse of R. Technically, vectors can be one of two types:

  • atomic vectors
  • lists

although the term "vector" most commonly refers to the atomic types not to lists.

Atomic Vectors

R has 6 atomic vector types.

  • character
  • numeric (real or decimal)
  • integer
  • logical
  • complex
  • raw (not discussed in this tutorial)

By atomic, we mean the vector only holds data of a single type.

  • character: "a", "swc"
  • numeric: 2, 15.5
  • integer: 2L (the L tells R to store this as an integer)
  • logical: TRUE, FALSE
  • complex: 1+4i (complex numbers with real and imaginary parts)

R provides many functions to examine features of vectors and other objects, for example

  1. typeof() - what is it?
  2. length() - how long is it? What about two dimensional objects?
  3. attributes() - does it have any metadata?

Let's look at some examples:

# assign word "april" to x"
x <- "april"

# return the type of the object
class(x)

## [1] "character"

# does x have any attributes?
attributes(x)

## NULL

# assign all integers 1 to 10 as an atomic vector to the object y
y <- 1:10
y

##  [1]  1  2  3  4  5  6  7  8  9 10

class(y)

## [1] "integer"

# how many values does the vector y contain?
length(y)

## [1] 10

# coerce the integer vector y to a numeric vector
# store the result in the object z
z <- as.numeric(y)
z

##  [1]  1  2  3  4  5  6  7  8  9 10

class(z)

## [1] "numeric"

A vector is a collection of elements that are most commonly character, logical, integer or numeric.

You can create an empty vector with vector(). (By default the mode is logical. You can be more explicit as shown in the examples below.) It is more common to use direct constructors such as character(), numeric(), etc.

x <- vector()
    
# Create vector with a length and type
vector("character", length = 10)

##  [1] "" "" "" "" "" "" "" "" "" ""

# create character vector with length of 5
character(5)

## [1] "" "" "" "" ""

# numeric vector length=5
numeric(5)

## [1] 0 0 0 0 0

# logical vector length=5
logical(5)

## [1] FALSE FALSE FALSE FALSE FALSE

# create a list or vector with combine `c()`
# this is the function used to create vectors and lists most of the time
x <- c(1, 2, 3)
x

## [1] 1 2 3

length(x)

## [1] 3

class(x)

## [1] "numeric"

x is a numeric vector. These are the most common kind. They are numeric objects and are treated as double precision real numbers (they can store decimal points). To explicitly create integers (no decimal points), add an L to each (or coerce to the integer type using as.integer().

# a numeric vector with integers (L)
x1 <- c(1L, 2L, 3L)
x1

## [1] 1 2 3

class(x1)

## [1] "integer"

# or using as.integer()
x2 <- as.integer(x)
class(x2)

## [1] "integer"

You can also have logical vectors.

# logical vector 
y <- c(TRUE, TRUE, FALSE, FALSE)
y

## [1]  TRUE  TRUE FALSE FALSE

class(y)

## [1] "logical"

Finally, you can have character vectors.

# character vector
z <- c("Sarah", "Tracy", "Jon")
z

## [1] "Sarah" "Tracy" "Jon"

# what class is it?
class(z)

## [1] "character"

#how many elements does it contain?
length(z)

## [1] 3

# what is the structure?
str(z)

##  chr [1:3] "Sarah" "Tracy" "Jon"

You can also add to a list or vector

# c function combines z and "Annette" into a single vector
# store result back to z
z <- c(z, "Annette")
z

## [1] "Sarah"   "Tracy"   "Jon"     "Annette"

More examples of how to create vectors

  • x <- c(0.5, 0.7)
  • x <- c(TRUE, FALSE)
  • x <- c("a", "b", "c", "d", "e")
  • x <- 9:100
  • x <- c(1 + (0 + 0i), 2 + (0 + 4i))

You can also create vectors as a sequence of numbers.

# simple series 
1:10

##  [1]  1  2  3  4  5  6  7  8  9 10

# use seq() 'sequence'
seq(10)

##  [1]  1  2  3  4  5  6  7  8  9 10

# specify values for seq()
seq(from = 1, to = 10, by = 0.1)

##  [1]  1.0  1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9  2.0  2.1  2.2  2.3  2.4  2.5
## [17]  2.6  2.7  2.8  2.9  3.0  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.0  4.1
## [33]  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.0  5.1  5.2  5.3  5.4  5.5  5.6  5.7
## [49]  5.8  5.9  6.0  6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9  7.0  7.1  7.2  7.3
## [65]  7.4  7.5  7.6  7.7  7.8  7.9  8.0  8.1  8.2  8.3  8.4  8.5  8.6  8.7  8.8  8.9
## [81]  9.0  9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9 10.0

You can also get non-numeric outputs.

  • Inf is infinity. You can have either positive or negative infinity.
  • NaN means Not a Number. It's an undefined value.

Try it out in the console.

# infinity return
1/0

## [1] Inf

# non numeric return
0/0

## [1] NaN

Indexing

Vectors have positions, these positions are ordered and can be called using object[index]

# index
z[2]

## [1] "Tracy"

# to call multiple items (a subset of our data), we can put a vector of which 
# items we want in the brackets
group1 <- c(1, 4)
z[group1]

## [1] "Sarah"   "Annette"

# this is especially useful with a sequence vector
z[1:3]

## [1] "Sarah" "Tracy" "Jon"

Objects can have attributes. Attribues are part of the object. These include:

  • names: the field or variable name within the object
  • dimnames:
  • dim:
  • class:
  • attributes: this contain metadata

You can also glean other attribute-like information such as length() (works on vectors and lists) or number of characters nchar() (for character strings).

# length of an object
length(1:10)

## [1] 10

length(x)

## [1] 3

# number of characters in a text string
nchar("NEON Data Skills")

## [1] 16

Heterogeneous Data - Mixing Types?

When you mix types, R will create a resulting vector that is the least common denominator. The coercion will move towards the one that's easiest to coerce to.

Guess what the following do:

  • m <- c(1.7, "a")
  • n <- c(TRUE, 2)
  • o <- c("a", TRUE)

Were you correct?

n <- c(1.7, "a")
n

## [1] "1.7" "a"

o <- c(TRUE, 2)
o

## [1] 1 2

p <- c("a", TRUE)
p

## [1] "a"    "TRUE"

This is called implicit coercion. You can also coerce vectors explicitly using the as.<class_name>.

# making values numeric
as.numeric("1")

## [1] 1

# make values charactor
as.character(1)

## [1] "1"

# make values 
as.factor(c("male", "female"))

## [1] male   female
## Levels: female male

Matrix

In R, matrices are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns.

# create an empty matrix that is 2x2
m <- matrix(nrow = 2, ncol = 2)
m

##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA

# what are the dimensions of m
dim(m)

## [1] 2 2

Matrices in R are by default filled column-wise. You can also use the byrow argument to specify how the matrix is filled.

# create a matrix. Notice R fills them by columns by default
m2 <- matrix(1:6, nrow = 2, ncol = 3)
m2

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

# set the byrow argument to TRUE to fill by rows
m2_row <- matrix(c(1:6), nrow = 2, ncol = 3, byrow = TRUE)
m2_row

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

dim() takes a vector and transform into a matrix with 2 rows and 5 columns. Another way to shape your matrix is to bind columns cbind() or rows rbind().

# create vector with 1:10
m3 <- 1:10
m3

##  [1]  1  2  3  4  5  6  7  8  9 10

class(m3)

## [1] "integer"

# set the dimensions so it becomes a matrix
dim(m3) <- c(2, 5)
m3

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

class(m3)

## [1] "matrix" "array"

# create matrix from two vectors
x <- 1:3
y <- 10:12

# cbind will bind the two by column
cbind(x, y)

##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12

# rbind will bind the two by row
rbind(x, y)

##   [,1] [,2] [,3]
## x    1    2    3
## y   10   11   12

Matrix Indexing

We can call elements of a matrix with square brackets just like a vector, except now we must specify a row and a column.

z <- matrix(c("a", "b", "c", "d", "e", "f"), nrow = 3, ncol = 2)
z

##      [,1] [,2]
## [1,] "a"  "d" 
## [2,] "b"  "e" 
## [3,] "c"  "f"

# call element in the third row, second column
z[3, 2]

## [1] "f"

# leaving the row blank will return contents of the whole column
# note: the column's contents are displayed as a vector (horizontally)
z[, 2]

## [1] "d" "e" "f"

class(z[, 2])

## [1] "character"

# return the contents of the second row
z[2, ]

## [1] "b" "e"

List

In R, lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called generic vectors, because the elements of a list can by of any type of R object, even lists containing further lists. This property makes them fundamentally different from atomic vectors.

A list is different from an atomic vector because each element can be a different type -- it can contain heterogeneous data types.

Create lists using list() or coerce other objects using as.list(). An empty list of the required length can be created using vector()

x <- list(1, "a", TRUE, 1 + (0 + 4i))
x

## [[1]]
## [1] 1
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] 1+4i

class(x)

## [1] "list"

x <- vector("list", length = 5)  ## empty list
length(x)

## [1] 5

#call the 1st element of list x
x[[1]]

## NULL

x <- 1:10
x <- as.list(x)

Questions:

  1. What is the class of x[1]?
  2. What about x[[1]]?

Try it out.

We can also give the elements of our list names, then call those elements with the $ operator.

# note 'iris' is an example data frame included with R
# the head() function simply calls the first 6 rows of the data frame
xlist <- list(a = "Karthik Ram", b = 1:10, data = head(iris))
xlist

## $a
## [1] "Karthik Ram"
## 
## $b
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $data
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

# see names of our list elements
names(xlist)

## [1] "a"    "b"    "data"

# call individual elements by name
xlist$a

## [1] "Karthik Ram"

xlist$b

##  [1]  1  2  3  4  5  6  7  8  9 10

xlist$data

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
  1. What is the length of this object? What about its structure?
  • Lists can be extremely useful inside functions. You can “staple” together lots of different kinds of results into a single object that a function can return.
  • A list does not print to the console like a vector. Instead, each element of the list starts on a new line.
  • Elements are indexed by double brackets. Single brackets will still return a(nother) list.

Factors

Factors are special vectors that represent categorical data. Factors can be ordered or unordered and are important for modelling functions such as lm() and glm() and also in plot() methods. Once created, factors can only contain a pre-defined set values, known as levels.

Factors are stored as integers that have labels associated the unique integers. While factors look (and often behave) like character vectors, they are actually integers under the hood. You need to be careful when treating them like strings. Some string methods will coerce factors to strings, while others will throw an error.

  • Sometimes factors can be left unordered. Example: male, female.
  • Other times you might want factors to be ordered (or ranked). Example: low,
    medium, high.
  • Underlying it's represented by numbers 1, 2, 3.
  • They are better than using simple integer labels because factors are what are called self describing. male and female is more descriptive than 1s and 2s. Helpful when there is no additional metadata.

Which is male? 1 or 2? You wouldn't be able to tell with just integer data. Factors have this information built in.

Factors can be created with factor(). Input is often a character vector.

x <- factor(c("yes", "no", "no", "yes", "yes"))
x

## [1] yes no  no  yes yes
## Levels: no yes

table(x) will return a frequency table counting the number of elements in each level.

If you need to convert a factor to a character vector, simply use

as.character(x)

## [1] "yes" "no"  "no"  "yes" "yes"

To see the integer version of the factor levels, use as.numeric

as.numeric(x)

## [1] 2 1 1 2 2

To convert a factor to a numeric vector, go via a character. Compare

fac <- factor(c(1, 5, 5, 10, 2, 2, 2))

levels(fac)       ## returns just the four levels present in our factor

## [1] "1"  "2"  "5"  "10"

as.numeric(fac)   ## wrong! returns the assigned integer for each level

## [1] 1 3 3 4 2 2 2

                ## integer corresponds to the position of that number in levels(f)

as.character(fac) ## returns a character string of each number

## [1] "1"  "5"  "5"  "10" "2"  "2"  "2"

as.numeric(as.character(fac)) ## coerce the character strings to numbers

## [1]  1  5  5 10  2  2  2

In modeling functions, it is important to know what the 'baseline' level is. This is the first factor, but by default the ordering is determined by alphanumerical order of elements. You can change this by speciying the levels (another option is to use the function relevel()).

# the default result (because N comes before Y alphabetically)
x <- factor(c("yes", "no", "yes"))
x

## [1] yes no  yes
## Levels: no yes

# now let's try again, this time specifying the order of our levels
x <- factor(c("yes", "no", "yes"), levels = c("yes", "no"))
x

## [1] yes no  yes
## Levels: yes no

Data Frames

A data frame is a very important data type in R. It's pretty much the de facto data structure for most tabular data and what we use for statistics.

  • A data frame is a special type of list where every element of the list has same length.
  • Data frames can have additional attributes such as rownames(), which can be useful for annotating data, like subject_id or sample_id. But most of the time they are not used.

Some additional information on data frames:

  • Usually created by read.csv() and read.table().
  • Can convert to matrix with data.matrix() (preferred) or as.matrix()
  • Coercion will be forced and not always what you expect.
  • Can also create with data.frame() function.
  • Find the number of rows and columns with nrow(dat) and ncol(dat), respectively.
  • Rownames are usually 1, 2, ..., n.

Manually Create Data Frames

You can manually create a data frame using data.frame.

# create a dataframe
dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat

##    id  x  y
## 1   a  1 11
## 2   b  2 12
## 3   c  3 13
## 4   d  4 14
## 5   e  5 15
## 6   f  6 16
## 7   g  7 17
## 8   h  8 18
## 9   i  9 19
## 10  j 10 20

Useful Data Frame Functions

  • head() - shown first 6 rows
  • tail() - show last 6 rows
  • dim() - returns the dimensions
  • nrow() - number of rows
  • ncol() - number of columns
  • str() - structure of each column
  • names() - shows the names attribute for a data frame, which gives the column names.

See that it is actually a special type of list:

list() 

## list()

is.list(iris)

## [1] TRUE

class(iris)

## [1] "data.frame"

Instead of a list of single items, a data frame is a list of vectors!

# see the class of a single variable column within iris: "Sepal.Length"
class(iris$Sepal.Length)

## [1] "numeric"

A recap of the different data types

Dimensions Homogenous Heterogeneous
1-D atomic vector list
2-D matrix data frame
none array

Functions

A function is an R object that takes inputs to perform a task. Functions take in information and may return desired outputs.

output <- name_of_function(inputs)

# create a list of 1 to 10
x <- 1:10 

# the sum of all x
y <- sum(x)
y

## [1] 55

Help

All functions come with a help screen. It is critical that you learn to read the help screens since they provide important information on what the function does, how it works, and usually sample examples at the very bottom. You can use help(function) or more simply ??function

# call up a help search
help.start()

# help (documentation) for a package
??ggplot2

# help for a function
??sum()

You can't ever learn all of R as it is ever changing with new packages and new tools, but once you have the basics and know how to find help to do the things that you want to do, you'll be able to use R in your science.

Sample Data

R comes with sample datasets. You will often find these as the date sets in documentation files or responses to inquires on public forums like StackOverflow. To see all available sample datasets you can type in data() to the console.

Packages in R

R comes with a set of functions or commands that perform particular sets of calculations. For example, in the equation 1+2, R knows that the "+" means to add the two numbers, 1 and 2 together. However, you can expand the capability of R by installing packages that contain suites of functions and compiled code that you can also use in your code.

The Relationship Between Raster Resolution, Spatial Extent & Number of Pixels

Learning Objectives:

After completing this activity, you will be able to:

  • Explain the key attributes required to work with raster data including: spatial extent, coordinate reference system and spatial resolution.
  • Describe what a spatial extent is and how it relates to resolution.
  • Explain the basics of coordinate reference systems.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • raster: install.packages("raster")
  • rgdal: install.packages("rgdal")

Data to Download

NEON Teaching Data Subset: Field Site Spatial Data

These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. The entire dataset can be accessed by request from the NEON Data Portal.

Download Dataset

The LiDAR and imagery data used to create the rasters in this dataset were collected over the San Joaquin field site located in California (NEON Domain 17) and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON website.

This data download contains several files used in related tutorials. The path to the files we will be using in this tutorial is: NEON-DS-Field-Site-Spatial-Data/SJER/.
You should set your working directory to the parent directory of the downloaded data to follow the code exactly.

This tutorial will overview the key attributes of a raster object, including spatial extent, resolution and coordinate reference system. When working within a GIS system often these attributes are accounted for. However, it is important to be more familiar with them when working in non-GUI environments such as R or even Python.

In order to correctly spatially reference a raster that is not already georeferenced, you will also need to identify:

  1. The lower left hand corner coordinates of the raster.
  2. The number of columns and rows that the raster dataset contains.

Spatial Resolution

A raster consists of a series of pixels, each with the same dimensions and shape. In the case of rasters derived from airborne sensors, each pixel represents an area of space on the Earth's surface. The size of the area on the surface that each pixel covers is known as the spatial resolution of the image. For instance, an image that has a 1 m spatial resolution means that each pixel in the image represents a 1 m x 1 m area.

The spatial resolution of a raster refers the size of each cell in meters. This size in turn relates to the area on the ground that the pixel represents. Source: National Ecological Observatory Network (NEON)
A raster at the same extent with more pixels will have a higher resolution (it looks more "crisp"). A raster that is stretched over the same extent with fewer pixels will look more blury and will be of lower resolution. Source: National Ecological Observatory Network (NEON)

Load the Data

Let's open up a raster in R to see how the attributes are stored. We are going to work with a Digital Terrain Model from the San Joaquin Experimental Range in California.

# load packages 
library(raster)  
library(rgdal)

# set working directory to data folder
#setwd("pathToDirHere")
wd <- ("~/Git/data/")
setwd(wd)

# Load raster in an R object called 'DEM'
DEM <- raster(paste0(wd, "NEON-DS-Field-Site-Spatial-Data/SJER/DigitalTerrainModel/SJER2013_DTM.tif"))  


# View raster attributes 
DEM

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : /Users/olearyd/Git/data/NEON-DS-Field-Site-Spatial-Data/SJER/DigitalTerrainModel/SJER2013_DTM.tif 
## names      : SJER2013_DTM

Note that this raster (in GeoTIFF format) already has an extent, resolution, and CRS defined. The resolution in both x and y directions is 1. The CRS tells us that the x,y units of the data are meters (m).

Spatial Extent

The spatial extent of a raster, represents the "X, Y" coordinates of the corners of the raster in geographic space. This information, in addition to the cell size or spatial resolution, tells the program how to place or render each pixel in 2 dimensional space. Tools like R, using supporting packages such as rgdal and associated raster tools have functions that allow you to view and define the extent of a new raster.

# View the extent of the raster
DEM@extent

## class      : Extent 
## xmin       : 254570 
## xmax       : 258869 
## ymin       : 4107302 
## ymax       : 4112362
If you double the extent value of a raster - the pixels will be stretched over the larger area making it look more "blury". Source: National Ecological Observatory Network (NEON)

Calculating Raster Extent

Extent and spatial resolution are closely connected. To calculate the extent of a raster, we first need the bottom left hand (X,Y) coordinate of the raster. In the case of the UTM coordinate system which is in meters, to calculate the raster's extent, we can add the number of columns and rows to the X,Y corner coordinate location of the raster, multiplied by the resolution (the pixel size) of the raster.

<figcaption>To be located geographically, a raster's location needs to be 
defined in geographic space (i.e., on a spatial grid). The spatial extent 
defines the four corners of a raster within a given coordinate reference 
system. Source: National Ecological Observatory Network. </figcaption>

Let's explore that next, using a blank raster that we create.

# create a raster from the matrix - a "blank" raster of 4x4
myRaster1 <- raster(nrow=4, ncol=4)

# assign "data" to raster: 1 to n based on the number of cells in the raster
myRaster1[]<- 1:ncell(myRaster1)

# view attributes of the raster
myRaster1

## class      : RasterLayer 
## dimensions : 4, 4, 16  (nrow, ncol, ncell)
## resolution : 90, 45  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : layer 
## values     : 1, 16  (min, max)

# is the CRS defined?
myRaster1@crs

## CRS arguments: +proj=longlat +datum=WGS84 +no_defs

Wait, why is the CRS defined on this new raster? This is the default values for something created with the raster() function if nothing is defined.

Let's get back to looking at more attributes.

# what is the raster extent?
myRaster1@extent

## class      : Extent 
## xmin       : -180 
## xmax       : 180 
## ymin       : -90 
## ymax       : 90

# plot raster
plot(myRaster1, main="Raster with 16 pixels")

Here we see our raster with the value of 1 to 16 in each pixel.

We can resample the raster as well to adjust the resolution. If we want a higher resolution raster, we will apply a grid with more pixels within the same extent. If we want a lower resolution raster, we will apply a grid with fewer pixels within the same extent.

One way to do this is to create a raster of the resolution you want and then resample() your original raster. The resampling will be done for either nearest neighbor assignments (for categorical data) or bilinear interpolation (for numerical data).

## HIGHER RESOLUTION
# Create 32 pixel raster
myRaster2 <- raster(nrow=8, ncol=8)

# resample 16 pix raster with 32 pix raster
# use bilinear interpolation with our numeric data
myRaster2 <- resample(myRaster1, myRaster2, method='bilinear')

# notice new dimensions, resolution, & min/max 
myRaster2

## class      : RasterLayer 
## dimensions : 8, 8, 64  (nrow, ncol, ncell)
## resolution : 45, 22.5  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : layer 
## values     : -0.25, 17.25  (min, max)

# plot 
plot(myRaster2, main="Raster with 32 pixels")

## LOWER RESOLUTION
myRaster3 <- raster(nrow=2, ncol=2)
myRaster3 <- resample(myRaster1, myRaster3, method='bilinear')
myRaster3

## class      : RasterLayer 
## dimensions : 2, 2, 4  (nrow, ncol, ncell)
## resolution : 180, 90  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : layer 
## values     : 3.5, 13.5  (min, max)

plot(myRaster3, main="Raster with 4 pixels")

## SINGLE PIXEL RASTER
myRaster4 <- raster(nrow=1, ncol=1)
myRaster4 <- resample(myRaster1, myRaster4, method='bilinear')
myRaster4

## class      : RasterLayer 
## dimensions : 1, 1, 1  (nrow, ncol, ncell)
## resolution : 360, 180  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : layer 
## values     : 7.666667, 7.666667  (min, max)

plot(myRaster4, main="Raster with 1 pixel")

To more easily compare them, let's create a graphic layout with 4 rasters in it. Notice that each raster has the same extent but each a different resolution because it has a different number of pixels spread out over the same extent.

# change graphical parameter to 2x2 grid
par(mfrow=c(2,2))

# arrange plots in order you wish to see them
plot(myRaster2, main="Raster with 32 pixels")
plot(myRaster1, main="Raster with 16 pixels")
plot(myRaster3, main="Raster with 4 pixels")
plot(myRaster4, main="Raster with 2 pixels")

# change graphical parameter back to 1x1 
par(mfrow=c(1,1))

Extent & Coordinate Reference Systems

The X and Y min and max values relate to the coordinate system that the file is in, see below.

Coordinate Reference System & Projection Information

A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities. -- Wikipedia

The earth is round. This is not an new concept by any means, however we need to remember this when we talk about coordinate reference systems associated with spatial data. When we make maps on paper or on a computer screen, we are moving from a 3 dimensional space (the globe) to 2 dimensions (our computer screens or a piece of paper). To keep this short, the projection of a dataset relates to how the data are "flattened" in geographic space so our human eyes and brains can make sense of the information in 2 dimensions.

The projection refers to the mathematical calculations performed to "flatten the data" in into 2D space. The coordinate system references to the x and y coordinate space that is associated with the projection used to flatten the data. If you have the same dataset saved in two different projections, these two files won't line up correctly when rendered together.

Maps of the United States in different projections. Notice the differences in shape associated with each different projection. These differences are a direct result of the calculations used to "flatten" the data onto a 2 dimensional map. Source: M. Corey, opennews.org

Read more about projections.

How Map Projections Can Fool the Eye

Check out this short video, by Buzzfeed, highlighting how map projections can make continents seems proportionally larger or smaller than they actually are!

What Makes Spatial Data Line Up On A Map?

There are lots of great resources that describe coordinate reference systems and projections in greater detail. However, for the purposes of this activity, what is important to understand is that data from the same location but saved in different projections will not line up in any GIS or other program. Thus it's important when working with spatial data in a program like R or Python to identify the coordinate reference system applied to the data, and to grab that information and retain it when you process / analyze the data.

For a library of CRS information: A great online library of CRS information.

CRS proj4 Strings

The rgdal package has all the common ESPG codes with proj4string built in. We can see them by creating an object of the function make_ESPG().

# make sure you loaded rgdal package at the top of your script

# create an object with all ESPG codes
epsg = make_EPSG()

# use View(espg) to see the full table - doesn't render on website well
#View(epsg)

# View top 5 entries
head(epsg, 5)

##   code   note                                                   prj4
## 1 3819 HD1909         +proj=longlat +ellps=bessel +no_defs +type=crs
## 2 3821  TWD67        +proj=longlat +ellps=aust_SA +no_defs +type=crs
## 3 3822  TWD97 +proj=geocent +ellps=GRS80 +units=m +no_defs +type=crs
## 4 3823  TWD97          +proj=longlat +ellps=GRS80 +no_defs +type=crs
## 5 3824  TWD97          +proj=longlat +ellps=GRS80 +no_defs +type=crs
##   prj_method
## 1     (null)
## 2     (null)
## 3     (null)
## 4     (null)
## 5     (null)

Define the extent

In the above raster example, we created several simple raster objects in R. R defaulted to a global lat/long extent. We can define the exact extent that we need to use too.

Let's create a new raster with the same projection as our original DEM. We know that our data are in UTM zone 11N. For the sake of this exercise, let say we want to create a raster with the left hand corner coordinate at:

  • xmin = 254570
  • ymin = 4107302

The resolution of this new raster will be 1 meter and we will be working in UTM (meters). First, let's set up the raster.

# create 10x20 matrix with values 1-8. 
newMatrix  <- (matrix(1:8, nrow = 10, ncol = 20))

# convert to raster
rasterNoProj <- raster(newMatrix)
rasterNoProj

## class      : RasterLayer 
## dimensions : 10, 20, 200  (nrow, ncol, ncell)
## resolution : 0.05, 0.1  (x, y)
## extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : layer 
## values     : 1, 8  (min, max)

Now we can define the new raster's extent by defining the lower left corner of the raster.

## Define the xmin and ymin (the lower left hand corner of the raster)

# 1. define xMin & yMin objects.
xMin = 254570
yMin = 4107302

# 2. grab the cols and rows for the raster using @ncols and @nrows
rasterNoProj@ncols

## [1] 20

rasterNoProj@nrows

## [1] 10

# 3. raster resolution
res <- 1.0

# 4. add the numbers of cols and rows to the x,y corner location, 
# result = we get the bounds of our raster extent. 
xMax <- xMin + (rasterNoProj@ncols * res)
yMax <- yMin + (rasterNoProj@nrows * res)

# 5.create a raster extent class
rasExt <- extent(xMin,xMax,yMin,yMax)
rasExt

## class      : Extent 
## xmin       : 254570 
## xmax       : 254590 
## ymin       : 4107302 
## ymax       : 4107312

# 6. apply the extent to our raster
rasterNoProj@extent <- rasExt

# Did it work? 
rasterNoProj

## class      : RasterLayer 
## dimensions : 10, 20, 200  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 254590, 4107302, 4107312  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : layer 
## values     : 1, 8  (min, max)

# or view extent only
rasterNoProj@extent

## class      : Extent 
## xmin       : 254570 
## xmax       : 254590 
## ymin       : 4107302 
## ymax       : 4107312

Now we have an extent associated with our raster which places it in space!

# plot new raster
plot(rasterNoProj, main="Raster in UTM coordinates, 1 m resolution")

Notice that the coordinates show up on our plot now.

## Challenges: Resample Rasters

Now apply your skills in a new way!

  • Resample rasterNoProj from 1 meter to 10 meter resolution. Plot it next to the 1 m resolution raster. Use: par(mfrow=c(1,2)) to create side by side plots.
  • What happens to the extent if you change the resolution to 1.5 when calculating the raster's extent properties??

Define Projection of a Raster

We can define the projection of a raster that has a known CRS already. Sometimes we download data that have projection information associated with them but the CRS is not defined either in the GeoTIFF tags or in the raster itself. If this is the case, we can simply assign the raster the correct projection.

Be careful doing this - it is not the same thing as reprojecting your data.

Let's define the projection for our newest raster using the DEM raster that already has defined CRS. NOTE: in this case we have to know that our raster is in this projection already so we don't run the risk of assigning the wrong projection to the data.

# view CRS from raster of interest
rasterNoProj@crs

## CRS arguments: NA

# view the CRS of our DEM object.
DEM@crs

## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs

# define the CRS using a CRS of another raster
rasterNoProj@crs <- DEM@crs

# look at the attributes
rasterNoProj

## class      : RasterLayer 
## dimensions : 10, 20, 200  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 254590, 4107302, 4107312  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : 1, 8  (min, max)

# view just the crs
rasterNoProj@crs

## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs

IMPORTANT: the above code does not reproject the raster. It simply defines the Coordinate Reference System based upon the CRS of another raster. If you want to actually change the CRS of a raster, you need to use the projectRaster function.

### Challenge: Assign CRS

You can set the CRS and extent of a raster using the syntax rasterWithoutReference@crs <- rasterWithReference@crs and rasterWithoutReference@extent <- rasterWithReference@extent. Using this information:

  • open band90.tif in the rasterLayers_tif folder and plot it. (You could consider looking at it in QGIS first to compare it to the other rasters.)
  • Does it line up with our DEM? Look closely at the extent and pixel size. Does anything look off?
  • Fix what is missing.
  • (Advanced step) Export a new GeoTIFF Do things line up in QGIS?

The code below creates a raster and seeds it with some data. Experiment with the code.

  • What happens to the resulting raster's resolution when you change the range of lat and long values to 5 instead of 10? Try 20, 50 and 100?
  • What is the relationship between the extent and the raster resolution?
## Challenge Example Code 

# set latLong
latLong <- data.frame(longitude=seq( 0,10,1), latitude=seq( 0,10,1))

# make spatial points dataframe, which will have a spatial extent
sp <- SpatialPoints( latLong[ c("longitude" , "latitude") ], proj4string = CRS("+proj=longlat +datum=WGS84") )

# make raster based on the extent of your data
r <- raster(nrow=5, ncol=5, extent( sp ) )

r[]  <- 1
r[]  <- sample(0:50,25)
r

## class      : RasterLayer 
## dimensions : 5, 5, 25  (nrow, ncol, ncell)
## resolution : 2, 2  (x, y)
## extent     : 0, 10, 0, 10  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : layer 
## values     : 3, 50  (min, max)

Reprojecting Data

If you run into multiple spatial datasets with varying projections, you can always reproject the data so that they are all in the same projection. Python and R both have reprojection tools that perform this task.

# reproject raster data from UTM to CRS of Lat/Long WGS84
reprojectedData1 <- projectRaster(rasterNoProj, 
                                 crs="+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs ")

# note that the extent has been adjusted to account for the NEW crs
reprojectedData1@crs

## CRS arguments: +proj=longlat +datum=WGS84 +no_defs

reprojectedData1@extent

## class      : Extent 
## xmin       : -119.761 
## xmax       : -119.7607 
## ymin       : 37.07988 
## ymax       : 37.08

# note the range of values in the output data
reprojectedData1

## class      : RasterLayer 
## dimensions : 13, 22, 286  (nrow, ncol, ncell)
## resolution : 1.12e-05, 9e-06  (x, y)
## extent     : -119.761, -119.7607, 37.07988, 37.08  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : layer 
## values     : 0.64765, 8.641957  (min, max)

# use nearest neighbor interpolation method to ensure that the values stay the same
reprojectedData2 <- projectRaster(rasterNoProj, 
                                 crs="+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs ", 
                                 method = "ngb")


# note that the min and max values have now been forced to stay within the same range.
reprojectedData2

## class      : RasterLayer 
## dimensions : 13, 22, 286  (nrow, ncol, ncell)
## resolution : 1.12e-05, 9e-06  (x, y)
## extent     : -119.761, -119.7607, 37.07988, 37.08  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : layer 
## values     : 1, 8  (min, max)

Create A Square Buffer Around a Plot Centroid in R

Want to use plot centroid values (marking the center of a plot) in x,y format to get the plot boundaries of a certain size around the centroid? This tutorial is for you!

If the plot is a circle, we can generate the plot boundary using a buffer function in R or a GIS program. However, creating a square boundary around a centroid requires an alternate approach. This tutorial presents a way to create square polygons of a given radius (referring to half of the plot's width) for each plot centroid location in a dataset.

Special thanks to jbaums from StackOverflow for helping with the SpatialPolygons code!

Learning Objectives

After completing this activity, you will be able to:

  • Create square polygons around a centroid point.
  • Export shapefiles from R using the writeOGR() function.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • rgdal: install.packages("rgdal")
  • sp: install.packages("sp")

More on Packages in R – Adapted from Software Carpentry.

Download Data

NEON Teaching Data Subset: Field Site Spatial Data

These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. The entire dataset can be accessed by request from the NEON Data Portal.

Download Dataset

This data download contains several files. You will only need the SJERPlotCentroids.csv file for this tutorial. The path to this file is: NEON-DS-Field-Site-Spatial-Data/SJER/PlotCentroids/SJERPlotCentroids.csv . The other data files in the downloaded data directory are used for related tutorials. You should set your working directory to the parent directory of the downloaded data to follow the code exactly.


Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.

Our x,y coordinate centroids come in a ".csv" (Comma Separated Value) file with the plot ID that goes with the data. The data we are using today were collected at the National Ecological Observatory Network field site at the San Joaquin Experimental Range (SJER) in California.

Load .csv, Setup Plots

To work with our spatial data in R, we can use the rgdal package and the sp package. Once we've loaded these packages and set the working directory to the where our .csv file with the data is located, we can load our data.

# load the sp and rgdal packages

library(sp)
library(rgdal)

# set working directory to data folder
#setwd("pathToDirHere")
wd <- ("~/Git/data/")
setwd(wd)

# read in the NEON plot centroid data 
# `stringsAsFactors=F` ensures character strings don't import as factors
centroids <- read.csv(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/PlotCentroids/SJERPlotCentroids.csv"), stringsAsFactors=FALSE)

Let's look at our data. This can be done several ways but one way is to view the structure (str()) of the data.

# view data structure
str(centroids)

## 'data.frame':	18 obs. of  5 variables:
##  $ Plot_ID : chr  "SJER1068" "SJER112" "SJER116" "SJER117" ...
##  $ Point   : chr  "center" "center" "center" "center" ...
##  $ northing: num  4111568 4111299 4110820 4108752 4110476 ...
##  $ easting : num  255852 257407 256839 256177 255968 ...
##  $ Remarks : logi  NA NA NA NA NA NA ...

We can see that our data consists of five distinct types of data:

  • Plot_ID: denotes the plot
  • Point: denotes where the point is taken -- all are centroids
  • northing: northing coordinate for point
  • easting: easting coordinate for point
  • Remarks: any other remarks from those collecting the data

It would be nice to have a metadata file with this .csv to confirm the coordinate reference system (CRS) that the points are in, however, without one, based on the numbers, we can assume it is in Universal Transverse Mercator (UTM). And since we know the data are from the San Joaquin Experimental Range, that is in UTM zone 11N.

Part 1: Create Plot Boundary

Now that we understand our centroid data file, we need to set how large our plots are going to be. The next piece of code sets the "radius"" for the plots. This radius will later be used to calculate vertex locations that define the plot perimeter.

In this case, let's use a radius of 20m. This means that the edge of each plot (not the corner) is 20m from the centroid. Overall this will create a 40 m x 40 m square plot.

Units: Radius is in meters, matching the UTM CRS. If you're coordinates were in lat/long or some other CRS than you'd need to modify the code.

Plot Orientation: Our code is based on simple geometry and assumes that plots are oriented North-South. If you wanted a different orientation, adjust the math accordingly to find the corners.

# set the radius for the plots
radius <- 20 # radius in meters

# define the plot edges based upon the plot radius. 

yPlus <- centroids$northing+radius
xPlus <- centroids$easting+radius
yMinus <- centroids$northing-radius
xMinus <- centroids$easting-radius

When combining the coordinates for the vertices, it is important to close the polygon. This means that a square will have 5 instead of 4 vertices. The fifth vertex is identical to the first vertex. Thus, by repeating the first vertex coordinate (xMinus,yPlus) the polygon will be closed.

The cbind() function allows use to combine or bind together data by column. Make sure to create the vertices in an order that makes sense. We recommend starting at the NE and proceeding clockwise.

# calculate polygon coordinates for each plot centroid. 
square=cbind(xMinus,yPlus,  # NW corner
	xPlus, yPlus,  # NE corner
	xPlus,yMinus,  # SE corner
	xMinus,yMinus, # SW corner
	xMinus,yPlus)  # NW corner again - close ploygon

Next, we will associate the centroid plot ID, from the .csv file, with the plot perimeter polygon that we create below. First, we extract the Plot_ID from our data. Note that because we set stringsAsFactor to false when importing, we can extract the Plot_IDs using the code below. If we hadn't do that, our IDs would come in as factors and we'd thus have to use the code ID=as.character(centroids$Plot_ID).

# Extract the plot ID information
ID=centroids$Plot_ID

We are now left with two key "groups" of data:

  • a dataframe square which has the points for our new 40x40m plots
  • a listID with the Plot_IDs for each new 40x40m plot

If all we wanted to do was get these points, we'd be done. But no, we want to be able to create maps with our new plots as polygons and have them as spatial data objects for later analyses.

Part 2: Create Spatial Polygons

Now we need to convert our dataframe square into a SpatialPolygon object. This particular step is somewhat confusing. Please consider reading up on the SpatialPolygon object in R in the sp package documentation (pg 86) or check out this StackOverflow thread.

Two general consideration:

First, spatial polygons require a list of lists. Each list contains the xy coordinates of each vertex in the polygon - in order. It is always important to include the closing vertex as we discussed above -- you'll have to repeat the first vertex coordinate.

Second, we need to specify the CRS string for our new polygon. We will do this with a proj4string. We can either type in the proj4string (as we do below) or we can grab the string from another file that has CRS information. To do this, we'd use the syntax:

proj4string =CRS(as.character(FILE-NAME@crs))

For example, if we imported a GeoTIFF file called "canopy" that was in a UTM coordinate system, we could type proj4string-CRS(as.character(canopy@crs)).

Method 1: mapply function

We'll do this in two different ways. The first, using the mapply() function is far more efficient. However, the function hides a bit of what is going on so next we'll show how it is done without the function so you understand it.

# create spatial polygons from coordinates
polys <- SpatialPolygons(mapply(function(poly, id) {
	  xy <- matrix(poly, ncol=2, byrow=TRUE)
	  Polygons(list(Polygon(xy)), ID=id)
	  }, 
	split(square, row(square)), ID),
	proj4string=CRS(as.character("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")))

Let's create a simple plot to see our new SpatialPolygon data.

# plot the new polygons
plot(polys)

Yay! We created polygons for all of our plots!

Method 2: Using loops

Let's do the process again with simpler R code so that we understand how the process works. Keep in mind that loops are less efficient to process your data but don't hide as much under the box. Once you understand how this works, we recommend the mapply() function for your actual data processing.

# First, initialize a list that will later be populated
# a, as a placeholder, since this is temporary
a <- vector('list', length(2))

# loop through each centroid value and create a polygon
# this is where we match the ID to the new plot coordinates
for (i in 1:nrow(centroids)) {  # for each for in object centroids
	  a[[i]]<-Polygons(list(Polygon(matrix(square[i, ], ncol=2, byrow=TRUE))), ID[i]) 
	  # make it an Polygon object with the Plot_ID from object ID
	}

# convert a to SpatialPolygon and assign CRS
polysB<-SpatialPolygons(a,proj4string=CRS(as.character("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")))

Let's see if it worked with another simple plot.

# plot the new polygons
plot(polysB)

Good. The two methods return the same plots. We now have our new plots saved as a SpatialPolygon but how do we share that with our colleagues? One way is to turn them into shapefiles, which can be read into R, Python, QGIS, ArcGIS, and many other programs.

Part 3: Export to Shapefile

Before you can export a shapefile, you need to convert the SpatialPolygons to a SpatialPolygonDataFrame. Note that in this step you could add additional attribute data if you wanted to!

# Create SpatialPolygonDataFrame -- this step is required to output multiple polygons.
polys.df <- SpatialPolygonsDataFrame(polys, data.frame(id=ID, row.names=ID))

Let's check out the results before we export. And we can add color this time.

plot(polys.df, col=rainbow(50, alpha=0.5))

When we want to export a spatial object from R as a shapefile, writeOGR() is a nice function. It writes not only the shapefile, but also the associated Coordinate Reference System (CRS) information as long as it is associated with the spatial object (e.g., if it was identified when creating the SpatialPolygons object).

To do this we need the following arguments:

  1. the name of the spatial object (polys.df)
  2. file path from the current working directory for the directory where we want to save our shapefile. If we want it in our current directory we can simply use '.'. 3.the name of the new shapefile (2014Plots_SJER)
  3. the driver which specifies the file format (ESRI Shapefile)

We can now export the spatial object as a shapefile.

# write the shapefiles 
writeOGR(polys.df, '.', '2014Plots_SJER', 'ESRI Shapefile')

And there you have it -- a shapefile with a square plot boundary around your centroids. Bring this shapefile into QGIS or whatever GIS package you prefer and have a look!

For more on working with shapefiles in R, check out our Working with Vector Data in R series .

Installing & Updating Packages in R

This tutorial provides the basics of installing and working with packages in R.

Learning Objectives

After completing this tutorial, you will be able to:

  • Describe the basics of an R package
  • Install a package in R
  • Call (use) an installed R package
  • Update a package in R
  • View the packages installed on your computer

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.


Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.


Additional Resources

  • More on packages from Quick-R.
  • Article on R-bloggers about installing packages in R.

About Packages in R

Packages are collections of R functions, data, and compiled code in a well-defined format. When you install a package it gives you access to a set of commands that are not available in the base R set of functions. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.

Installing Packages in R

To install a package you have to know where to get the package. Most established packages are available from "CRAN" or the Comprehensive R Archive Network.

Packages download from specific CRAN "mirrors"" where the packages are saved (assuming that a binary, or set of installation files, is available for your operating system). If you have not set a preferred CRAN mirror in your options(), then a menu will pop up asking you to choose a location from which you'd like to install your packages.

To install any package from CRAN, you use install.packages(). You only need to install packages the first time you use R (or after updating to a new version).

# install the ggplot2 package
install.packages("ggplot2") 
**R Tip:** You can just type this into the command line of R to install each package. Once a package is installed, you don't have to install it again while using the version of R!

Use a Package

Once a package is installed (basically the functions are downloaded to your computer), you need to "call" the package into the current session of R. This is essentially like saying, "Hey R, I will be using these functions now, please have them ready to go". You have to do this ever time you start a new R session, so this should be at the top of your script.

When you want to call a package, use library(PackageNameHere). You may also see some people using require() -- while that works in most cases, it does function slightly differently and best practice is to use library().

# load the package
library(ggplot2)

What Packages are Installed Now?

If you want to use a package, but aren't sure if you've installed it before, you can check! In code you, can use installed.packages().

# check installed packages
installed.packages()

If you are using RStudio, you can also check out the Packages tab. It will list all the currently installed packages and have a check mark next to them if they are currently loaded and ready to use. You can also update and install packages from this tab. While you can "call" a package from here too by checking the box I wouldn't recommend this as calling the package isn't in your script and you if you run the script again this could trip you up!

Updating Packages

Sometimes packages are updated by the users who created them. Updating packages can sometimes make changes to both the package and also to how your code runs. ** If you already have a lot of code using a package, be cautious about updating packages as some functionality may change or disappear.**

Otherwise, go ahead and update old packages so things are up to date.

In code you, can use old.packages() to check to see what packages are out of date.

update.packages() will update all packages in the known libraries interactively. This can take a while if you haven't done it recently! To update everything without any user intervention, use the ask = FALSE argument.

If you only want to update a single package, the best way to do it is using install.packages() again.

# list all packages where an update is available
old.packages()

# update all available packages
update.packages()

# update, without prompts for permission/clarification
update.packages(ask = FALSE)

# update only a specific package use install.packages()
install.packages("plotly")

In RStudio, you can also manage packages using Tools -> Install Packages.

Challenge: Installing Packages

Check to see if you can install the dplyr package or a package of interest to you.

  1. Check to see if the dplyr package is installed on your computer.
  2. If it is not installed, install the "dplyr" package in R.
  3. If installed, is it up to date?

What is a CHM, DSM and DTM? About Gridded, Raster LiDAR Data

LiDAR Point Clouds

Each point in a LiDAR dataset has a X, Y, Z value and other attributes. The points may be located anywhere in space are not aligned within any particular grid.

Representative point cloud data. Source: National Ecological Observatory Network (NEON)

LiDAR point clouds are typically available in a .las file format. The .las file format is a compressed format that can better handle the millions of points that are often associated with LiDAR data point clouds.

Common LiDAR Data Products

The Digital Terrain Model (DTM) product represents the elevation of the ground, while the Digital Surface Model (DSM) product represents the elevation of the tallest surfaces at that point. Imagine draping a sheet over the canopy of a forest, the Digital Elevation Model (DEM) contours with the heights of the trees where there are trees but the elevation of the ground when there is a clearing in the forest.

DSM and DTM Visualizations

The Canopy height model represents the difference between a Digital Terrain Model and a Digital Surface Model (DSM - DTM = CHM) and gives you the height of the objects (in a forest, the trees) that are on the surface of the earth.

DSM, DTM, and CHM

Free Point Cloud Viewers for LiDAR Point Clouds

  • Fusion: US Forest Service
  • Cloud compare
  • Plas.io website

For more on viewing LiDAR point cloud data using the Plas.io online viewer, see our tutorial Plas.io: Free Online Data Viz to Explore LiDAR Data.

Check out our Structural Diversity tutorial for another useful LiDAR point cloud viewer available through RStudio, Calculating Forest Structural Diversity Metrics from NEON LiDAR Data

3D Models of NEON Site: SJER (San Joaquin Experimental Range)

Click on the images to view interactive 3D models of San Joaquin Experimental Range site.

3D models derived from LiDAR Data. Left: Digital Terrain Model (DTM), Middle: Digital Surface Model (DSM), Right: Canopy Height Model (CHM). Source: National Ecological Observatory Network (NEON)

Gridded, or Raster, LiDAR Data Products

LiDAR data products are most often worked within a gridded or raster data format. A raster file is a regular grid of cells, all of which are the same size.

A few notes about rasters:

  • Each cell is called a pixel.
  • And each pixel represents an area on the ground.
  • The resolution of the raster represents the area that each pixel represents on the ground. So, for instance if the raster is 1 m resolution, that simple means that each pixel represents a 1m by 1m area on the ground.
Raster or “gridded” data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. Source: National Ecological Observatory Network (NEON)

Raster data can have attributes associated with them as well. For instance in a LiDAR-derived digital elevation model (DEM), each cell might represent a particular elevation value. In a LIDAR-derived intensity image, each cell represents a LIDAR intensity value.

LiDAR Related Metadata

In short, when you go to download LiDAR data the first question you should ask is what format the data are in. Are you downloading point clouds that you might have to process? Or rasters that are already processed for you. How do you know?

  1. Check out the metadata!
  2. Look at the file format - if you are downloading a .las file, then you are getting points. If it is .tif, then it is a post-processing raster file.

Create Useful Data Products from LiDAR Data

Classify LiDAR Point Clouds

LiDAR data points are vector data. LiDAR point clouds are useful because they tell us something about the heights of objects on the ground. However, how do we know whether a point reflected off of a tree, a bird, a building or the ground? In order to develop products like elevation models and canopy height models, we need to classify individual LiDAR points. We might classify LiDAR points into classes including:

  • Ground
  • Vegetation
  • Buildings

LiDAR point cloud classification is often already done when you download LiDAR point clouds but just know that it’s not to be taken for granted! Programs such as lastools, fusion and terrascan are often used to perform this classification. Once the points are classified, they can be used to derive various LiDAR data products.

Create A Raster From LiDAR Point Clouds

There are different ways to create a raster from LiDAR point clouds.

Point to Raster Methods - Basic Gridding

Let's look one of the most basic ways to create a raster file points - basic gridding. When you perform a gridding algorithm, you are simply calculating a value, using point data, for each pixels in your raster dataset.

  1. To begin, a grid is placed on top of the LiDAR data in space. Each cell in the grid has the same spatial dimensions. These dimensions represent that particular area on the ground. If we want to derive a 1 m resolution raster from the LiDAR data, we overlay a 1m by 1m grid over the LiDAR data points.
  2. Within each 1m x 1m cell, we calculate a value to be applied to that cell, using the LiDAR points found within that cell. The simplest method of doing this is to take the max, min or mean height value of all lidar points found within the 1m cell. If we use this approach, we might have cells in the raster that don't contains any lidar points. These cells will have a "no data" value if we process our raster in this way.
Animation showing the general process of taking LiDAR point clouds and converting them to a raster format. Source: Tristan Goulden, National Ecological Observatory Network (NEON)

Point to Raster Methods - Interpolation

A different approach is to interpolate the value for each cell.

  1. In this approach we still start with placing the grid on top of the LiDAR data in space.
  2. Interpolation considers the values of points outside of the cell in addition to points within the cell to calculate a value. Interpolation is useful because it can provide us with some ability to predict or calculate cell values in areas where there are no data (or no points). And to quantify the error associated with those predictions which is useful to know, if you are doing research.

For learning more on how to work with LiDAR and Raster data more generally in R, please refer to the Data Carpentry's Introduction to Geospatial Raster and Vector Data with R lessons.

Plas.io: Free Online Data Viz to Explore LiDAR Data

In this tutorial, we will explore LiDAR point cloud data using the free, online Plas.io viewer.

Learning Objectives

At the end of this tutorial, you will be able to:

  • Visualize lidar point clouding using the free online data viewer plas.io
  • Describe some of the attributes associated with discrete return lidar points, including intensity, classification and RGB values.
  • Explain the use of and difference between the .las and .laz lidar file formats (standard lidar point cloud formats).

Things You’ll Need To Complete This Tutorial

  • Access to the internet so you can access the plas.io website.

Download Data

NEON Teaching Data Subset: Sample LiDAR Point Cloud Data (.las)

This .las file contains sample LiDAR point cloud data collected by National Ecological Observatory Network's Airborne Observation Platform group. The .las file format is a commonly used file format to store LIDAR point cloud data. NEON Discrete Return LiDAR Point Cloud Data are available on the NEON Data Portal.

Download NEON Teaching Data Subset: Sample LiDAR Point Cloud Data (.las)

Example visualization of LiDAR data

LiDAR data collected over Grand Mesa, Colorado as a part of instrument testing and calibration by the National Ecological Observatory Network 's Airborne Observation Platform (NEON AOP). Source: National Ecological Observatory Network (NEON)

LiDAR File Formats

LiDAR data are most often available as discrete points. Although, remember that these data can be collected by the lidar instrument, in either discrete or full waveform, formats. A collection of discrete return LiDAR points is known as a LiDAR point cloud.

.las is the commonly used file format to store LIDAR point cloud data. This format is supported by the American Society of Photogrammetry and Remote Sensing (ASPRS). The .laz format was developed by Martin Isenberg of LAStools . Laz is a highly compressed version of .las.

In this tutorial, you will open a .las file, in the plas.io free online lidar data viewer. You will then explore some of the attributes associated with a lidar data point cloud.

LiDAR Attribute Data

Remember that not all lidar data are created equally. Different lidar data may have different attributes. In this tutorial, we will look at data that contain both intensity values and a ground vs non ground classification.

Plas.io Viewer

We will use the plas.io website. in this tutorial. As described on their plas.io github page:

Plasio is a project by Uday Verma and Howard Butler that implements point cloud rendering capability in a browser. Specifically, it provides a functional implementation of the ASPRS LAS format, and it can consume LASzip-compressed data using LASzip NaCl module. Plasio is Chrome-only at this time, but it is hoped that other contributors can step forward to bring it to other browsers.

It is expected that most WebGL-capable browsers should be able to support plasio, and it contains nothing that is explicitly Chrome-specific beyond the optional NaCL LASzip module.

This tool is useful because you don't need to install anything to use it! Drag and drop your lidar data directly into the tool and begin to play! The website also provides access to some prepackaged datasets if you want to experiment on your own.

Enough reading, let's open some NEON LiDAR data!

1. Open a .las file in plas.io

  1. Download the NEON prepackaged lidar dataset (above in Download the Data) if you haven't already.
  2. The file is named: NEON-DS-Sample-LiDAR-Point-Cloud.las
  3. When the download is complete, drag the file NEON-DS-Sample-LiDAR-Point-Cloud.las into the plas.io website. window.
  4. Zoom and pan around the data
  5. Use the particle size slider to adjust the size of each individual lidar point. NOTE: the particle size slider is located a little more than half way down the plas.io toolbar in the "Data" section.

NICE! You should see something similar to the screenshot below:

NEON lidar data in the plas.io online tool.

Navigation in Plas.io

You might prefer to use a mouse to explore your data in plas.io. Let's test the navigation out.

  1. Left click on the screen and drag the data on the screen. Notice that this tilts the data up and down.
  2. Right click on the screen and drag noticing that this moves the entire dataset around
  3. Use the scroll bar on your mouse to zoom in and out.

How The Points are Colored

Why is everything grey when the data are loaded?

Notice that the data, upon initial view, are colored in a black - white color scheme. These colors represent the data's intensity values. Remember that the intensity value, for each LiDAR point, represents the amount of light energy that reflected off of an object and returned to the sensor. In this case, darker colors represent LESS light energy returned. Lighter colors represent MORE light returned.

Lidar intensity values represent the amount of light energy that reflected off of an object and returned to the sensor.

2. Adjust the intensity threshold

Next, scroll down through the tools in plas.io. Look for the Intensity Scaling slider. The intensity scaling slider allows you to define the thresholds of light to dark intensity values displayed in the image (similar to stretching values in an image processing software or even in Photoshop).

Drag the slider back and forth. Notice that you can brighten up the data using the slider.

The intensity scaling slider is located below the color map tool so it's easy to miss. Drag the slider back and forth to adjust the range of intensity values and to brighten up the lidar point clouds.

3. Change the lidar point cloud color options to Classification

In addition to intensity values, these lidar data also have a classification value. Lidar data classification values are numeric, ranging from 0-20 or higher. Some common classes include:

  • 0 Not classified
  • 1 Unassigned
  • 2 Ground
  • 3 Low vegetation
  • 4 Medium vegetation
  • 5 High Vegetation
  • 6 Building
Blue and Orange gradient color scheme submitted by Kendra Sand. What color scheme is your favorite?

In this case, these data are classified as either ground, or non-ground. To view the points, colored by class:

  • Change the "colorization" setting to "Classification
  • Change the intensity blending slider to "All Color"
  • For kicks - play with the various colormap options to change the colors of the points.
Set the colorization to 'classified' and then adjust the intensity blending to view the points, colored by ground and non-ground classification.

4. Spend Some Time Exploring - Do you See Any Trees?

Finally, spend some time exploring the data. what features do you see in this dataset? What does the topography look like? Is the site flat? Hilly? Mountainous? What do the lidar data tell you, just upon initial inspection?

Summary

  • The plas.io online point cloud viewer allows you to quickly view and explore lidar data point clouds.
  • Each lidar data point will have an associated set of attributes. You can check the metadata to determine which attributes the dataset contains. NEON data, provided above, contain both classification and intensity values.
  • Classification values represent the type of object that the light energy reflected off of. Classification values are often ground vs non ground. Some lidar data files might have buildings, water bodies and other natural and man-made elements classified.
  • LiDAR data often has an intensity value associated with it. This represents the amount of light energy that reflected off an object and returned to the sensor.

Additional Resources:

  • What is .las? From laspy - the las Python library
  • LAS v1.4 specifications

The Basics of LiDAR - Light Detection and Ranging - Remote Sensing

LiDAR or Light Detection and Ranging is an active remote sensing system that can be used to measure vegetation height across wide areas. This page will introduce fundamental LiDAR (or lidar) concepts including:

  1. What LiDAR data are.
  2. The key attributes of LiDAR data.
  3. How LiDAR data are used to measure trees.

The Story of LiDAR

Key Concepts

Why LiDAR

Scientists often need to characterize vegetation over large regions to answer research questions at the ecosystem or regional scale. Therefore, we need tools that can estimate key characteristics over large areas because we don’t have the resources to measure each and every tree or shrub.

Conventional, on-the-ground methods to measure trees are resource intensive and limit the amount of vegetation that can be characterized! Source: National Geographic

Remote sensing means that we aren’t actually physically measuring things with our hands. We are using sensors which capture information about a landscape and record things that we can use to estimate conditions and characteristics. To measure vegetation or other data across large areas, we need remote sensing methods that can take many measurements quickly, using automated sensors.

LiDAR data collected at the Soaproot Saddle site by the National Ecological Observatory Network's Airborne Observation Platform (NEON AOP).

LiDAR, or Light Detection And Ranging (sometimes also referred to as active laser scanning) is one remote sensing method that can be used to map structure including vegetation height, density and other characteristics across a region. LiDAR directly measures the height and density of vegetation on the ground making it an ideal tool for scientists studying vegetation over large areas.

How LiDAR Works

How Does LiDAR Work?

LiDAR is an active remote sensing system. An active system means that the system itself generates energy - in this case, light - to measure things on the ground. In a LiDAR system, light is emitted from a rapidly firing laser. You can imagine light quickly strobing (or pulsing) from a laser light source. This light travels to the ground and reflects off of things like buildings and tree branches. The reflected light energy then returns to the LiDAR sensor where it is recorded.

A LiDAR system measures the time it takes for emitted light to travel to the ground and back, called the two-way travel time. That time is used to calculate distance traveled. Distance traveled is then converted to elevation. These measurements are made using the key components of a lidar system including a GPS that identifies the X,Y,Z location of the light energy and an Inertial Measurement Unit (IMU) that provides the orientation of the plane in the sky (roll, pitch, and yaw).

How Light Energy Is Used to Measure Trees

Light energy is a collection of photons. As photon that make up light moves towards the ground, they hit objects such as branches on a tree. Some of the light reflects off of those objects and returns to the sensor. If the object is small, and there are gaps surrounding it that allow light to pass through, some light continues down towards the ground. Because some photons reflect off of things like branches but others continue down towards the ground, multiple reflections (or "returns") may be recorded from one pulse of light.

LiDAR waveforms

The distribution of energy that returns to the sensor creates what we call a waveform. The amount of energy that returned to the LiDAR sensor is known as "intensity". The areas where more photons or more light energy returns to the sensor create peaks in the distribution of energy. Theses peaks in the waveform often represent objects on the ground like - a branch, a group of leaves or a building.

An example LiDAR waveform returned from two trees and the ground. Source: NEON .

How Scientists Use LiDAR Data

There are many different uses for LiDAR data.

  • LiDAR data classically have been used to derive high resolution elevation data models
LiDAR data have historically been used to generate high resolution elevation datasets. Source: National Ecological Observatory Network .
  • LiDAR data have also been used to derive information about vegetation structure including:
    • Canopy Height
    • Canopy Cover
    • Leaf Area Index
    • Vertical Forest Structure
    • Species identification (if a less dense forests with high point density LiDAR)
Cross section showing LiDAR point cloud data superimposed on the corresponding landscape profile. Source: National Ecological Observatory Network.

Discrete vs. Full Waveform LiDAR

A waveform or distribution of light energy is what returns to the LiDAR sensor. However, this return may be recorded in two different ways.

  1. A Discrete Return LiDAR System records individual (discrete) points for the peaks in the waveform curve. Discrete return LiDAR systems identify peaks and record a point at each peak location in the waveform curve. These discrete or individual points are called returns. A discrete system may record 1-11+ returns from each laser pulse.
  2. A Full Waveform LiDAR System records a distribution of returned light energy. Full waveform LiDAR data are thus more complex to process, however they can often capture more information compared to discrete return LiDAR systems. One example research application for full waveform LiDAR data includes mapping or modelling the understory of a canopy.

LiDAR File Formats

Whether it is collected as discrete points or full waveform, most often LiDAR data are available as discrete points. A collection of discrete return LiDAR points is known as a LiDAR point cloud.

The commonly used file format to store LIDAR point cloud data is called ".las" which is a format supported by the American Society of Photogrammetry and Remote Sensing (ASPRS). Recently, the .laz format has been developed by Martin Isenberg of LasTools. The differences is that .laz is a highly compressed version of .las.

Data products derived from LiDAR point cloud data are often raster files that may be in GeoTIFF (.tif) formats.

LiDAR Data Attributes: X, Y, Z, Intensity and Classification

LiDAR data attributes can vary, depending upon how the data were collected and processed. You can determine what attributes are available for each lidar point by looking at the metadata. All lidar data points will have an associated X,Y location and Z (elevation) values. Most lidar data points will have an intensity value, representing the amount of light energy recorded by the sensor.

Some LiDAR data will also be "classified" -- not top secret, but with specifications about what the data represent. Classification of LiDAR point clouds is an additional processing step. Classification simply represents the type of object that the laser return reflected off of. So if the light energy reflected off of a tree, it might be classified as "vegetation" point. And if it reflected off of the ground, it might be classified as "ground" point.

Some LiDAR products will be classified as "ground/non-ground". Some datasets will be further processed to determine which points reflected off of buildings and other infrastructure. Some LiDAR data will be classified according to the vegetation type.

Exploring 3D LiDAR data in a free Online Viewer

Check out our tutorial on viewing LiDAR point cloud data using the Plas.io online viewer: Plas.io: Free Online Data Viz to Explore LiDAR Data. The Plas.io viewer used in this tutorial was developed by Martin Isenberg of Las Tools and his colleagues.

Summary

  • A LiDAR system uses a laser, a GPS and an IMU to estimate the heights of objects on the ground.
  • Discrete LiDAR data are generated from waveforms -- each point represent peak energy points along the returned energy.
  • Discrete LiDAR points contain an x, y and z value. The z value is what is used to generate height.
  • LiDAR data can be used to estimate tree height and even canopy cover using various methods.

Additional Resources

  • What is the LAS format?
  • Using .las with Python? las: python ingest
  • Specifications for las v1.3

Create a Canopy Height Model from Lidar-derived rasters in R

A common analysis using lidar data are to derive top of the canopy height values from the lidar data. These values are often used to track changes in forest structure over time, to calculate biomass, and even leaf area index (LAI). Let's dive into the basics of working with raster formatted lidar data in R!

Learning Objectives

After completing this tutorial, you will be able to:

  • Work with digital terrain model (DTM) & digital surface model (DSM) raster files.
  • Create a canopy height model (CHM) raster from DTM & DSM rasters.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • terra: install.packages("terra")
  • neonUtilities: install.packages("neonUtilities")

More on Packages in R - Adapted from Software Carpentry.

Download Data

Lidar elevation raster data are downloaded using the R neonUtilities::byTileAOP function in the script.

These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. The entire datasets can be accessed from the NEON Data Portal.

This tutorial is designed for you to set your working directory to the directory created by unzipping this file.


Set Working Directory: This lesson will walk you through setting the working directory before downloading the datasets from neonUtilities.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges to reinforce skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.


Recommended Reading

What is a CHM, DSM and DTM? About Gridded, Raster LiDAR Data

Create a lidar-derived Canopy Height Model (CHM)

The National Ecological Observatory Network (NEON) will provide lidar-derived data products as one of its many free ecological data products. These products will come in the GeoTIFF format, which is a .tif raster format that is spatially located on the earth.

In this tutorial, we create a Canopy Height Model. The Canopy Height Model (CHM), represents the heights of the trees on the ground. We can derive the CHM by subtracting the ground elevation from the elevation of the top of the surface (or the tops of the trees).

We will use the terra R package to work with the the lidar-derived Digital Surface Model (DSM) and the Digital Terrain Model (DTM).

# Load needed packages

library(terra)

library(neonUtilities)

Set the working directory so you know where to download data.

wd="~/data/" #This will depend on your local environment

setwd(wd)

We can use the neonUtilities function byTileAOP to download a single DTM and DSM tile at SJER. Both the DTM and DSM are delivered under the Elevation - LiDAR (DP3.30024.001) data product.

You can run help(byTileAOP) to see more details on what the various inputs are. For this exercise, we'll specify the UTM Easting and Northing to be (257500, 4112500), which will download the tile with the lower left corner (257000,4112000). By default, the function will check the size total size of the download and ask you whether you wish to proceed (y/n). You can set check.size=FALSE if you want to download without a prompt. This example will not be very large (~8MB), since it is only downloading two single-band rasters (plus some associated metadata).

byTileAOP(dpID='DP3.30024.001',

          site='SJER',

          year='2021',

          easting=257500,

          northing=4112500,

          check.size=TRUE, # set to FALSE if you don't want to enter y/n

          savepath = wd)

This file will be downloaded into a nested subdirectory under the ~/data folder, inside a folder named DP3.30024.001 (the Data Product ID). The files should show up in these locations: ~/data/DP3.30024.001/neon-aop-products/2021/FullSite/D17/2021_SJER_5/L3/DiscreteLidar/DSMGtif/NEON_D17_SJER_DP3_257000_4112000_DSM.tif and ~/data/DP3.30024.001/neon-aop-products/2021/FullSite/D17/2021_SJER_5/L3/DiscreteLidar/DTMGtif/NEON_D17_SJER_DP3_257000_4112000_DTM.tif.

Now we can read in the files. You can move the files to a different location (eg. shorten the path), but make sure to change the path that points to the file accordingly.

# Define the DSM and DTM file names, including the full path

dsm_file <- paste0(wd,"DP3.30024.001/neon-aop-products/2021/FullSite/D17/2021_SJER_5/L3/DiscreteLidar/DSMGtif/NEON_D17_SJER_DP3_257000_4112000_DSM.tif")

dtm_file <- paste0(wd,"DP3.30024.001/neon-aop-products/2021/FullSite/D17/2021_SJER_5/L3/DiscreteLidar/DTMGtif/NEON_D17_SJER_DP3_257000_4112000_DTM.tif")

First, we will read in the Digital Surface Model (DSM). The DSM represents the elevation of the top of the objects on the ground (trees, buildings, etc).

# assign raster to object

dsm <- rast(dsm_file)



# view info about the raster.

dsm

## class       : SpatRaster 
## dimensions  : 1000, 1000, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 257000, 258000, 4112000, 4113000  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## source      : NEON_D17_SJER_DP3_257000_4112000_DSM.tif 
## name        : NEON_D17_SJER_DP3_257000_4112000_DSM

# plot the DSM

plot(dsm, main="Lidar Digital Surface Model \n SJER, California")

Note the resolution, extent, and coordinate reference system (CRS) of the raster. To do later steps, our DTM will need to be the same.

Next, we will import the Digital Terrain Model (DTM) for the same area. The DTM represents the ground (terrain) elevation.

# import the digital terrain model

dtm <- rast(dtm_file)



plot(dtm, main="Lidar Digital Terrain Model \n SJER, California")

With both of these rasters now loaded, we can create the Canopy Height Model (CHM). The CHM represents the difference between the DSM and the DTM or the height of all objects on the surface of the earth.

To do this we perform some basic raster math to calculate the CHM. You can perform the same raster math in a GIS program like QGIS.

When you do the math, make sure to subtract the DTM from the DSM or you'll get trees with negative heights!

# use raster math to create CHM

chm <- dsm - dtm



# view CHM attributes

chm

## class       : SpatRaster 
## dimensions  : 1000, 1000, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 257000, 258000, 4112000, 4113000  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## source(s)   : memory
## varname     : NEON_D17_SJER_DP3_257000_4112000_DSM 
## name        : NEON_D17_SJER_DP3_257000_4112000_DSM 
## min value   :                                 0.00 
## max value   :                                24.13

plot(chm, main="Lidar CHM - SJER, California")

We've now created a CHM from our DSM and DTM. What do you notice about the canopy cover at this location in the San Joaquin Experimental Range?

Challenge: Basic Raster Math

Convert the CHM from meters to feet and plot it.

We can write out the CHM as a GeoTiff using the writeRaster() function.

# write out the CHM in tiff format. 

writeRaster(chm,paste0(wd,"CHM_SJER.tif"),"GTiff")

We've now successfully created a canopy height model using basic raster math -- in R! We can bring the CHM_SJER.tif file into QGIS (or any GIS program) and look at it.


Consider checking out the tutorial Compare tree height measured from the ground to a Lidar-based Canopy Height Model to compare a LiDAR-derived CHM with ground-based observations!

Introduction to the National Ecological Observatory Network (NEON)

Here we will provide an overview of the National Ecological Observatory Network (NEON). Please carefully read through these materials and links that discuss NEON’s mission and design.

Learning Objectives

At the end of this activity, you will be able to:

  • Explain the mission of the National Ecological Observatory Network (NEON).
  • Explain the how sites are located within the NEON project design.
  • Explain the different types of data that will be collected and provided by NEON.

The NEON Project Mission & Design

To capture ecological heterogeneity across the United States, NEON’s design divides the continent into 20 statistically different eco-climatic domains. Each NEON field site is located within an eco-climatic domain.

The Science and Design of NEON

To gain a better understanding of the broad scope fo NEON watch this 4 minute long video.

Please, read the following page about NEON's mission.

Data Institute Participants -- Thought Question: How might/does the NEON project intersect with your current research or future career goals?

NEON's Spatial Design

The Spatial Design of NEON

Watch this 4:22 minute video exploring the spatial design of NEON field sites.

Please read the following page about NEON's Spatial Design:

Read this primer on NEON's Sampling Design

Read about the different types of field sites - core and relocatable

NEON Field Site Locations

Explore the NEON Field Site map taking note of the locations of

  1. Aquatic & terrestrial field sites.
  2. Core & relocatable field sites.
Click here to view the NEON Field Site Map

Explore the NEON field site map. Do the following:

  • Zoom in on a study area of interest to see if there are any NEON field sites that are nearby.
  • Use the menu below the map to filter sites by name, type, domain, or state.
  • Select one field site of interest.
    • Click on the marker in the map.
    • Then click on Site Details to jump to the field site landing page.

Data Institute Participant -- Thought Questions: Use the map above to answer these questions. Consider the research question that you may explore as your Capstone Project at the Institute or about a current project that you are working on and answer the following questions:

  • Are there NEON field sites that are in study regions of interest to you?
  • What domains are the sites located in?
  • What NEON field sites do your current research or Capstone Project ideas coincide with?
  • Is the site(s) core or relocatable?
  • Is it/are they terrestrial or aquatic?
  • Are there data available for the NEON field site(s) that you are most interested in? What kind of data are available?

Data Tip: You can download maps, kmz, or shapefiles of the field sites here.

NEON Data

How NEON Collects Data

Watch this 3:06 minute video exploring the data that NEON collects.

Read the Data Collection Methods page to learn more about the different types of data that NEON collects and provides. Then, follow the links below to learn more about each collection method:

  • Aquatic Observation System (AOS)
  • Aquatic Instrument System (AIS)
  • Terrestrial Instrument System (TIS) -- Flux Tower
  • Terrestrial Instrument System (TIS) -- Soil Sensors and Measurements
  • Terrestrial Organismal System (TOS)
  • Airborne Observation Platform (AOP)

All data collection protocols and processing documents are publicly available. Read more about the standardized protocols and how to access these documents.

Specimens & Samples

NEON also collects samples and specimens from which the other data products are based. These samples are also available for research and education purposes. Learn more: NEON Biorepository.

Airborne Remote Sensing

Watch this 5 minute video to better understand the NEON Airborne Observation Platform (AOP).

Data Institute Participant – Thought Questions: Consider either your current or future research or the question you’d like to address at the Institute.

  • Which types of NEON data may be more useful to address these questions?
  • What non-NEON data resources could be combined with NEON data to help address your question?
  • What challenges, if any, could you foresee when beginning to work with these data?

Data Tip: NEON also provides support to your own research including proposals to fly the AOP over other study sites, a mobile tower/instrumentation setup and others. Learn more here the Assignable Assets programs .

Access NEON Data

NEON data are processed and go through quality assurance quality control checks at NEON headquarters in Boulder, CO. NEON carefully documents every aspect of sampling design, data collection, processing and delivery. This documentation is freely available through the NEON data portal.

  • Visit the NEON Data Portal - data.neonscience.org
  • Read more about the quality assurance and quality control processes for NEON data and how the data are processed from raw data to higher level data products.
  • Explore NEON Data Products. On the page for each data product in the catalog you can find the basic information about the product, find the data collection and processing protocols, and link directly to downloading the data.
  • Additionally, some types of NEON data are also available through the data portals of other organizations. For example, NEON Terrestrial Insect DNA Barcoding Data is available through the Barcode of Life Datasystem (BOLD). Or NEON phenocam images are available from the Phenocam network site. More details on where else the data are available from can be found in the Availability and Download section on the Product Details page for each data product (visit Explore Data Products to access individual Product Details pages).

Pathways to access NEON Data

There are several ways to access data from NEON:

  1. Via the NEON data portal. Explore and download data. Note that much of the tabular data is available in zipped .csv files for each month and site of interest. To combine these files, use the neonUtilities package (R tutorial, Python tutorial).
  2. Use R or Python to programmatically access the data. NEON and community members have created code packages to directly access the data through an API. Learn more about the available resources by reading the Code Resources page or visiting the NEONScience GitHub repo.
  3. Using the NEON API. Access NEON data directly using a custom API call.
  4. Access NEON data through partner's portals. Where NEON data directly overlap with other community resources, NEON data can be accessed through the portals. Examples include Phenocam, BOLD, Ameriflux, and others. You can learn more in the documentation for individual data products.

Data Institute Participant – Thought Questions: Use the Data Portal tools to investigate the data availability for the field sites you’ve already identified in the previous Thought Questions.

  • What types of aquatic/terrestrial data are currently available? Remote sensing data?
  • Of these, what type of data are you most interested in working with for your project while at the Institute.
  • For what time period does the data cover?
  • What format is the downloadable file available in?
  • Where is the metadata to support this data?

Data Institute Participants: Intro to NEON Culmination Activity

Write up a brief summary of a project that you might want to explore while at the Data Institute in Boulder, CO. Include the types of NEON (and other data) that you will need to implement this project. Save this summary as you will be refining and adding to your ideas over the next few weeks.

The goal of this activity if for you to begin to think about a Capstone Project that you wish to work on at the end of the Data Institute. This project will ideally be performed in groups, so over the next few weeks you'll have a chance to view the other project proposals and merge projects to collaborate with your colleagues.

Set up GitHub Working Directory - Quick Intro to Bash

Checklist

Once you have Git and Bash installed, you are ready to configure Git.

On this page you will:

  • Create a directory for all future GitHub repositories created on your computer

To ensure Git is properly installed and to create a working directory for GitHub, you will need to know a bit of shell -- brief crash course below.

Crash Course on Shell

The Unix shell has been around longer than most of its users have been alive. It has survived so long because it’s a power tool that allows people to do complex things with just a few keystrokes. More importantly, it helps them combine existing programs in new ways and automate repetitive tasks so they aren’t typing the same things over and over again. Use of the shell is fundamental to using a wide range of other powerful tools and computing resources (including “high-performance computing” supercomputers).

This section is an abbreviated form of Software Carpentry’s The Unix Shell for Novice’s workshop lesson series. Content and wording (including all the above) is heavily copied and credit is due to those creators (full author list).

Our goal with shell is to:

  • Set up the directory where we will store all of the GitHub repositories during the Institute,
  • Make sure Git is installed correctly, and
  • Gain comfort using bash so that we can use it to work with Git & GitHub.

Accessing Shell

How one accesses the shell depends on the operating system being used.

  • OS X: The bash program is called Terminal. You can search for it in Spotlight.
  • Windows: Git Bash came with your download of Git for Windows. Search Git Bash.
  • Linux: Default is usually bash, if not, type bash in the terminal.

Bash Commands

$ 

The dollar sign is a prompt, which shows us that the shell is waiting for input; your shell may use a different character as a prompt and may add information before the prompt.

When typing commands, either from these tutorials or from other sources, do not type the prompt ($), only the commands that follow it. In these tutorials, subsequent lines that follow a prompt and do not start with $ are the output of the command.

listing contents - ls

Next, let's find out where we are by running a command called pwd -- print working directory. At any moment, our current working directory is our current default directory. I.e., the directory that the computer assumes we want to run commands in unless we explicitly specify something else. Here, the computer's response is /Users/neon, which is NEON’s home directory:

$ pwd

/Users/neon
**Data Tip:** Home Directory Variation - The home directory path will look different on different operating systems. On Linux it may look like `/home/neon`, and on Windows it will be similar to `C:\Documents and Settings\neon` or `C:\Users\neon`. (It may look slightly different for different versions of Windows.) In future examples, we've used Mac output as the default, Linux and Windows output may differ slightly, but should be generally similar.

If you are not, by default, in your home directory, you get there by typing:


$ cd ~

Now let's learn the command that will let us see the contents of our own file system. We can see what's in our home directory by running ls --listing.

$ ls

Applications   Documents   Library   Music   Public
Desktop        Downloads   Movies    Pictures

(Again, your results may be slightly different depending on your operating system and how you have customized your filesystem.)

ls prints the names of the files and directories in the current directory in alphabetical order, arranged neatly into columns.

**Data Tip:** What is a directory? That is a folder! Read the section on Directory vs. Folder if you find the wording confusing.

Change directory -- cd

Now we want to move into our Documents directory where we will create a directory to host our GitHub repository (to be created in Week 2). The command to change locations is cd followed by a directory name if it is a sub-directory in our current working directory or a file path if not. cd stands for "change directory", which is a bit misleading: the command doesn't change the directory, it changes the shell's idea of what directory we are in.

To move to the Documents directory, we can use the following series of commands to get there:

$ cd Documents

These commands will move us from our home directory into our Documents directory. cd doesn't print anything, but if we run pwd after it, we can see that we are now in /Users/neon/Documents.

If we run ls now, it lists the contents of /Users/neon/Documents, because that's where we now are:

$ pwd

/Users/neon/Documents
$ ls


data/  elements/  animals.txt  planets.txt  sunspot.txt

To use cd, you need to be familiar with paths, if not, read the section on Full, Base, and Relative Paths .

Make a directory -- mkdir

Now we can create a new directory called GitHub that will contain our GitHub repositories when we create them later. We can use the command mkdir NAME-- “make directory”

$ mkdir GitHub

There is not output.

Since GitHub is a relative path (i.e., doesn't have a leading slash), the new directory is created in the current working directory:

$ ls

data/  elements/  GitHub/  animals.txt  planets.txt  sunspot.txt
**Data Tip:** This material is a much abbreviated form of the Software Carpentry Unix Shell for Novices workhop. Want a better understanding of shell? Check out the full series!

Is Git Installed Correctly?

All of the above commands are bash commands, not Git specific commands. We still need to check to make sure git installed correctly. One of the easiest ways is to check to see which version of git we have installed.

Git commands start with git. We can use git --version to see which version of Git is installed

$ git --version

git version 2.5.4 (Apple Git-61)

If you get a git version number, then Git is installed!

If you get an error, Git isn’t installed correctly. Reinstall and repeat.

Setup Git Global Configurations

Now that we know Git is correctly installed, we can get it set up to work with.

The text below is modified slightly from Software Carpentry's Setting up Git lesson.

When we use Git on a new computer for the first time, we need to configure a few things. Below are a few examples of configurations we will set as we get started with Git:

  • our name and email address,
  • to colorize our output,
  • what our preferred text editor is,
  • and that we want to use these settings globally (i.e. for every project)

On a command line, Git commands are written as git verb, where verb is what we actually want to do.

Set up you own git with the following command, using your own information instead of NEON's.

$ git config --global user.name "NEON Science"
$ git config --global user.email "neon@BattelleEcology.org"
$ git config --global color.ui "auto"

Then set up your favorite text editor following this table:

Editor Configuration command
nano $ git config --global core.editor "nano -w"
Text Wrangler $ git config --global core.editor "edit -w"
Sublime Text (Mac) $ git config --global core.editor "subl -n -w"
Sublime Text (Win, 32-bit install) $ git config --global core.editor "'c:/program files (x86)/sublime text 3/sublime_text.exe' -w"
Sublime Text (Win, 64-bit install) $ git config --global core.editor "'c:/program files/sublime text 3/sublime_text.exe' -w"
Notepad++ (Win) $ git config --global core.editor "'c:/program files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
Kate (Linux) $ git config --global core.editor "kate"
Gedit (Linux) $ git config --global core.editor "gedit -s -w"
emacs $ git config --global core.editor "emacs"
vim $ git config --global core.editor "vim"

The four commands we just ran above only need to be run once: the flag --global tells Git to use the settings for every project in your user account on this computer.

You can check your settings at any time:

$ git config --list

You can change your configuration as many times as you want; just use the same commands to choose another editor or update your email address.

Now that Git is set up, you will be ready to start the Week 2 materials to learn about version control and how Git & GitHub work.

**Data Tip:** GitDesktop is a GUI (one of many) for using GitHub that is free and available for both Mac and Windows operating systems. In NEON Data Skills workshops & Data Institutes will only teach how to use Git through command line, and not support use of GitDesktop (or any other GUI), however, you are welcome to check it out and use it if you would like to.

Pagination

  • First page
  • Previous page
  • …
  • Page 52
  • Page 53
  • Page 54
  • Page 55
  • Page 56
  • Page 57
  • Page 58
  • Current page 59
  • Page 60
  • Next page
  • Last page
Subscribe to
NSF NEON, Operated by Battelle

Follow Us:

Join Our Newsletter

Get updates on events, opportunities, and how NEON is being used today.

Subscribe Now

Footer

  • About Us
  • Newsroom
  • Contact Us
  • Terms & Conditions
  • Careers
  • Code of Conduct

Copyright © Battelle, 2025

The National Ecological Observatory Network is a major facility fully funded by the U.S. National Science Foundation.

Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the U.S. National Science Foundation.