Skip to main content
NSF NEON | Open Data to Understand our Ecosystems logo
Sign In

Main navigation

  • About Us
    • Overview
      • Spatial and Temporal Design
      • History
    • Vision and Management
    • Advisory Groups
      • Advisory Committee: STEAC
      • Technical Working Groups (TWGs)
    • FAQ
    • Contact Us
      • Field Offices
    • User Accounts
    • Staff

    About Us

  • Data & Samples
    • Data Portal
      • Explore Data Products
      • Data Availability Charts
      • Spatial Data & Maps
      • Document Library
      • API & GraphQL
      • Prototype Data
      • External Lab Data Ingest (restricted)
    • Samples & Specimens
      • Discover and Use NEON Samples
        • Sample Types
        • Sample Repositories
        • Sample Explorer
        • Megapit and Distributed Initial Characterization Soil Archives
        • Excess Samples
      • Sample Processing
      • Sample Quality
      • Taxonomic Lists
    • Collection Methods
      • Protocols & Standardized Methods
      • AIrborne Remote Sensing
        • Flight Box Design
        • Flight Schedules and Coverage
        • Daily Flight Reports
        • Camera
        • Imaging Spectrometer
        • Lidar
      • Automated Instruments
        • Site Level Sampling Design
        • Sensor Collection Frequency
        • Instrumented Collection Types
          • Meteorology
          • Phenocams
          • Soil Sensors
          • Ground Water
          • Surface Water
      • Observational Sampling
        • Site Level Sampling Design
        • Sampling Schedules
        • Observation Types
          • Aquatic Organisms
            • Aquatic Microbes
            • Fish
            • Macroinvertebrates & Zooplankton
            • Periphyton, Phytoplankton, and Aquatic Plants
          • Terrestrial Organisms
            • Birds
            • Ground Beetles
            • Mosquitoes
            • Small Mammals
            • Soil Microbes
            • Terrestrial Plants
            • Ticks
          • Hydrology & Geomorphology
            • Discharge
            • Geomorphology
          • Biogeochemistry
          • DNA Sequences
          • Pathogens
          • Sediments
          • Soils
            • Soil Descriptions
    • Data Notifications
    • Data Guidelines and Policies
      • Acknowledging and Citing NEON
      • Publishing Research Outputs
      • Usage Policies
    • Data Management
      • Data Availability
      • Data Formats and Conventions
      • Data Processing
      • Data Quality
      • Data Product Revisions and Releases
        • Release 2021
        • Release 2022
      • Externally Hosted Data

    Data & Samples

  • Field Sites
    • About Field Sites and Domains
    • Explore Field Sites
    • Site Management Data Product

    Field Sites

  • Impact
    • Observatory Blog
    • Case Studies
    • Spotlights
    • Papers & Publications
    • Newsroom
      • NEON in the News
      • Newsletter Archive

    Impact

  • Resources
    • Getting Started with NEON Data & Resources
    • Documents and Communication Resources
      • Papers & Publications
      • Document Library
      • Outreach Materials
    • Code Hub
      • Code Resources Guidelines
      • Code Resources Submission
      • NEON's GitHub Organization Homepage
    • Learning Hub
      • Science Videos
      • Tutorials
      • Workshops & Courses
      • Teaching Modules
      • Faculty Mentoring Networks
      • Data Education Fellows
    • Research Support and Assignable Assets
      • Field Site Coordination
      • Letters of Support
      • Mobile Deployment Platforms
      • Permits and Permissions
      • AOP Flight Campaigns
      • Excess Samples
      • Assignable Assets FAQs
    • Funding Opportunities

    Resources

  • Get Involved
    • Advisory Groups
    • Upcoming Events
    • Past Events
    • NEON Ambassador Program
    • Collaborative Works
      • EFI-NEON Ecological Forecasting Challenge
      • NCAR-NEON-Community Collaborations
      • NEON Science Summit
    • Community Engagement
    • Work Opportunities
      • Careers
      • Seasonal Fieldwork
      • Postdoctoral Fellows
      • Internships
        • Intern Alumni
    • Partners

    Get Involved

  • My Account
  • Search

Search

Learning Hub

  • Science Videos
  • Tutorials
  • Workshops & Courses
  • Teaching Modules
  • Faculty Mentoring Networks
  • Data Education Fellows

Breadcrumb

  1. Resources
  2. Learning Hub
  3. Tutorials
  4. Introduction to Light Detection and Ranging (LiDAR) – Explore Point Clouds and Work with LiDAR Raster Data in R

Series

Introduction to Light Detection and Ranging (LiDAR) – Explore Point Clouds and Work with LiDAR Raster Data in R

The tutorials in this series introduces Light Detection and Ranging (LiDAR). Concepts covered include how LiDAR data is collected, LiDAR as gridded, raster data and an introduction to digital models derived from LiDAR data (Canopy Height Models (CHM), Digital Surface Models (DSM), and Digital Terrain Models (DTM)). The series introduces the concepts through videos, graphical examples, and text. The series continues with visualization of LiDAR-derived raster data using plas.io, plot.ly and R, three free, open-source tools.

Data used in this series are from the National Ecological Observatory Network (NEON) and are in .las, GeoTIFF and .csv formats.

Series Objectives

After completing the series you will:

  • Know what LiDAR data are
  • Understand key attributes of LiDAR data
  • Know what LiDAR-derived DTM, DSM, and CHM digital models are
  • Be able to visualize LiDAR-derived data in .las format using plas.io
  • Be able to create a Canopy Height Model in R
  • Be able to create an interactive plot.ly map of LiDAR-derived data

Things You’ll Need To Complete This Series

Setup RStudio

To complete some of the tutorials in this series, you will need an updated version of R and, preferably, RStudio installed on your computer.

R is a programming language that specializes in statistical computing. It is a powerful tool for exploratory data analysis. To interact with R, we strongly recommend RStudio, an interactive development environment (IDE).

Download Data

Data is available for download in those tutorials that focus on teaching data skills.

The Basics of LiDAR - Light Detection and Ranging - Remote Sensing

Authors: Leah A. Wasser

Last Updated: Oct 7, 2020

LiDAR or Light Detection and Ranging is an active remote sensing system that can be used to measure vegetation height across wide areas. This page will introduce fundamental LiDAR (or lidar) concepts including:

  1. What LiDAR data are.
  2. The key attributes of LiDAR data.
  3. How LiDAR data are used to measure trees.

The Story of LiDAR

<iframe width="640" height="360" src="//www.youtube.com/embed/m7SXoFv6Sdc?rel=0" frameborder="0" allowfullscreen></iframe>

Key Concepts

Why LiDAR

Scientists often need to characterize vegetation over large regions to answer research questions at the ecosystem or regional scale. Therefore, we need tools that can estimate key characteristics over large areas because we don’t have the resources to measure each and every tree or shrub.

Conventional, on-the-ground methods to measure trees are resource intensive and limit the amount of vegetation that can be characterized! Source: National Geographic

Remote sensing means that we aren’t actually physically measuring things with our hands. We are using sensors which capture information about a landscape and record things that we can use to estimate conditions and characteristics. To measure vegetation or other data across large areas, we need remote sensing methods that can take many measurements quickly, using automated sensors.

LiDAR data collected at the Soaproot Saddle site by the National Ecological Observatory Network's Airborne Observation Platform (NEON AOP).

LiDAR, or light detection ranging (sometimes also referred to as active laser scanning) is one remote sensing method that can be used to map structure including vegetation height, density and other characteristics across a region. LiDAR directly measures the height and density of vegetation on the ground making it an ideal tool for scientists studying vegetation over large areas.

How LiDAR Works

How Does LiDAR Work?

<iframe width="640" height="360" src="//www.youtube.com/embed/EYbhNSUnIdU?rel=0" frameborder="0" allowfullscreen></iframe>

LiDAR is an active remote sensing system. An active system means that the system itself generates energy - in this case, light - to measure things on the ground. In a LiDAR system, light is emitted from a rapidly firing laser. You can imagine light quickly strobing from a laser light source. This light travels to the ground and reflects off of things like buildings and tree branches. The reflected light energy then returns to the LiDAR sensor where it is recorded.

A LiDAR system measures the time it takes for emitted light to travel to the ground and back. That time is used to calculate distance traveled. Distance traveled is then converted to elevation. These measurements are made using the key components of a lidar system including a GPS that identifies the X,Y,Z location of the light energy and an Internal Measurement Unit (IMU) that provides the orientation of the plane in the sky.

How Light Energy Is Used to Measure Trees

Light energy is a collection of photons. As photon that make up light moves towards the ground, they hit objects such as branches on a tree. Some of the light reflects off of those objects and returns to the sensor. If the object is small, and there are gaps surrounding it that allow light to pass through, some light continues down towards the ground. Because some photons reflect off of things like branches but others continue down towards the ground, multiple reflections may be recorded from one pulse of light.

LiDAR waveforms

<iframe width="640" height="360" src="//www.youtube.com/embed/uSESVm59uDQ?rel=0" frameborder="0" allowfullscreen></iframe>

The distribution of energy that returns to the sensor creates what we call a waveform. The amount of energy that returned to the LiDAR sensor is known as "intensity". The areas where more photons or more light energy returns to the sensor create peaks in the distribution of energy. Theses peaks in the waveform often represent objects on the ground like - a branch, a group of leaves or a building.

An example LiDAR waveform returned from two trees and the ground. Source: NEON .

How Scientists Use LiDAR Data

There are many different uses for LiDAR data.

  • LiDAR data classically have been used to derive high resolution elevation data
LiDAR data have historically been used to generate high resolution elevation datasets. Source: National Ecological Observatory Network .
  • LiDAR data have also been used to derive information about vegetation structure including
    • Canopy Height
    • Canopy Cover
    • Leaf Area Index
    • Vertical Forest Structure
    • Species identification (if a less dense forests with high point density LiDAR)
Cross section showing LiDAR point cloud data superimposed on the corresponding landscape profile. Source: National Ecological Observatory Network.

Discrete vs. Full Waveform LiDAR

A waveform or distribution of light energy is what returns to the LiDAR sensor. However, this return may be recorded in two different ways.

  1. A Discrete Return LiDAR System records individual (discrete) points for the peaks in the waveform curve. Discrete return LiDAR systems identify peaks and record a point at each peak location in the waveform curve. These discrete or individual points are called returns. A discrete system may record 1-4 (and sometimes more) returns from each laser pulse.
  2. A Full Waveform LiDAR System records a distribution of returned light energy. Full waveform LiDAR data are thus more complex to process however they can often capture more information compared to discrete return LiDAR systems.

LiDAR File Formats

Whether it is collected as discrete points or full waveform, most often LiDAR data are available as discrete points. A collection of discrete return LiDAR points is known as a LiDAR point cloud.

The commonly used file format to store LIDAR point cloud data is called .las which is a format supported by the American Society of Photogrammetry and Remote Sensing (ASPRS). Recently, the .laz format has been developed by Martin Isenberg of LasTools. The differences is that .laz is a highly compressed version of .las.

Data products derived from LiDAR point cloud data are often raster files that may be in GeoTIFF (.tif) formats.

LiDAR Data Attributes: X, Y, Z, Intensity and Classification

LiDAR data attributes can vary, depending upon how the data were collected and processed. You can determine what attributes are available for each lidar point by looking at the metadata. All lidar data points will have an associated X,Y location and Z (elevation) values. Most lidar data points will have an intensity value, representing the amount of light energy recorded by the sensor.

Some LiDAR data will also be "classified" -- not top secret, but with specifications about what the data are. Classification of LiDAR point clouds is an additional processing step. Classification simply represents the type of object that the laser return reflected off of. So if the light energy reflected off of a tree, it might be classified as "vegetation". And if it reflected off of the ground, it might be classified as "ground".

Some LiDAR products will be classified as "ground/non-ground". Some datasets will be further processed to determine which points reflected off of buildings and other infrastructure. Some LiDAR data will be classified according to the vegetation type.

Exploring 3D LiDAR data in a free Online Viewer

Check out our tutorial on viewing LiDAR point cloud data using the Plas.io online viewer: Plas.io: Free Online Data Viz to Explore LiDAR Data. The Plas.io viewer used in this tutorial was developed by Martin Isenberg of Las Tools and his colleagues.

Summary

  • A LiDAR system uses a laser, a GPS and an IMU to estimate the heights of objects on the ground.
  • Discrete LiDAR data are generated from waveforms -- each point represent peak energy points along the returned energy.
  • Discrete LiDAR points contain an x, y and z value. The z value is what is used to generate height.
  • LiDAR data can be used to estimate tree height and even canopy cover using various methods.

Additional Resources

  • What is the LAS format?
  • Using .las with Python? las: python ingest
  • Specifications for las v1.3

What is a CHM, DSM and DTM? About Gridded, Raster LiDAR Data

Authors: Leah A. Wasser

Last Updated: Oct 7, 2020

LiDAR Point Clouds

Each point in a LiDAR dataset has a X, Y, Z value and other attributes. The points may be located anywhere in space are not aligned within any particular grid.

Representative point cloud data. Source: National Ecological Observatory Network (NEON)

LiDAR point clouds are typically available in a .las file format. The .las file format is a compressed format that can better handle the millions of points that are often associated with LiDAR data point clouds.

Common LiDAR Data Products

The Digital Terrain Model (DTM) product represents the elevation of the ground, while the Digital Surface Model (DSM) product represents the elevation of the tallest surfaces at that point. Imagine draping a sheet over the canopy of a forest, the Digital Elevation Model (DEM) contours with the heights of the trees where there are trees but the elevation of the ground when there is a clearing in the forest.

  • DSM and DTM

The Canopy height model represents the difference between a Digital Terrain Model and a Digital Surface Model (DSM - DTM = CHM) and gives you the height of the objects (in a forest, the trees) that are on the surface of the earth. *DSM, DTM and CHM

Free Point Cloud Viewers for LiDAR Point Clouds

  • Fusion: US Forest Service RSAC
  • Cloud compare
  • Plas.io website

For more on viewing LiDAR point cloud data using the Plas.io online viewer, see our tutorial Plas.io: Free Online Data Viz to Explore LiDAR Data.

Check out our Structural Diversity tutorial for another useful LiDAR point cloud viewer available through RStudio, Calculating Forest Structural Diversity Metrics from NEON LiDAR Data

3D Models of NEON Site: SJER (San Juaquin Experimental Range)

Click on the images to view interactive 3D models of San Juaquin Experimental Range site.

DTM DSM CHM
<figcaption> 3D models of a LiDAR-derived Digital Terrain Model (DTM;top), 
 Digital Surface Model (DSM; middle), and Canopy Height Model (CHM; bottom).
</figcaption>

Gridded, or Raster, LiDAR Data Products

LiDAR data products are most often worked within a gridded or raster data format. A raster file is a regular grid of cells, all of which are the same size.

A few notes about rasters:

  • Each cell is called a pixel.
  • And each pixel represents an area on the ground.
  • The resolution of the raster represents the area that each pixel represents on the ground. So, for instance if the raster is 1 m resolution, that simple means that each pixel represents a 1m by 1m area on the ground.
Raster or “gridded” data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. Source: National Ecological Observatory Network (NEON)

Raster data can have attributes associated with them as well. For instance in a LiDAR-derived digital elevation model (DEM), each cell might represent a particular elevation value. In a LIDAR-derived intensity image, each cell represents a LIDAR intensity value.

LiDAR Related Metadata

In short, when you go to download LiDAR data the first question you should ask is what format the data are in. Are you downloading point clouds that you might have to process? Or rasters that are already processed for you. How do you know?

  1. Check out the metadata!
  2. Look at the file format - if you are downloading a .las file, then you are getting points. If it is .tif, then it is a post-processing raster file.

Create Useful Data Products from LiDAR Data

Classify LiDAR Point Clouds

LiDAR data points are vector data. LiDAR point clouds are useful because they tell us something about the heights of objects on the ground. However, how do we know whether a point reflected off of a tree, a bird, a building or the ground? In order to develop products like elevation models and canopy height models, we need to classify individual LiDAR points. We might classify LiDAR points into classes including:

  • Ground
  • Vegetation
  • Buildings

LiDAR point cloud classification is often already done when you download LiDAR point clouds but just know that it’s not to be taken for granted! Programs such as lastools, fusion and terrascan are often used to perform this classification. Once the points are classified, they can be used to derive various LiDAR data products.

Create A Raster From LiDAR Point Clouds

There are different ways to create a raster from LiDAR point clouds.

Point to Raster Methods - Basic Gridding

Let's look one of the most basic ways to create a raster file points - basic gridding. When you perform a gridding algorithm, you are simply calculating a value, using point data, for each pixels in your raster dataset.

  1. To begin, a grid is placed on top of the LiDAR data in space. Each cell in the grid has the same spatial dimensions. These dimensions represent that particular area on the ground. If we want to derive a 1 m resolution raster from the LiDAR data, we overlay a 1m by 1m grid over the LiDAR data points.
  2. Within each 1m x 1m cell, we calculate a value to be applied to that cell, using the LiDAR points found within that cell. The simplest method of doing this is to take the max, min or mean height value of all lidar points found within the 1m cell. If we use this approach, we might have cells in the raster that don't contains any lidar points. These cells will have a "no data" value if we process our raster in this way.
Animation showing the general process of taking LiDAR point clouds and converting them to a raster format. Source: Tristan Goulden, National Ecological Observatory Network (NEON)

Point to Raster Methods - Interpolation

A different approach is to interpolate the value for each cell.

  1. In this approach we still start with placing the grid on top of the LiDAR data in space.
  2. Interpolation considers the values of points outside of the cell in addition to points within the cell to calculate a value. Interpolation is useful because it can provide us with some ability to predict or calculate cell values in areas where there are no data (or no points). And to quantify the error associated with those predictions which is useful to know, if you are doing research.

For learning more on how to work with LiDAR data, consider going through the Introduction to Working With Raster Data in R tutorial series.

Plas.io: Free Online Data Viz to Explore LiDAR Data

Authors: Leah A. Wasser

Last Updated: Apr 8, 2021

In this tutorial, we will explore LiDAR point cloud data using the free, online Plas.io viewer.

Learning Objectives

At the end of this tutorial, you will be able to:

  • visualize lidar point clouding using the free online data viewer plas.io
  • describe some of the attributes associated with discrete return lidar points, including intensity, classification and RGB values.
  • explain the use of and difference between the .las and .laz lidar file formats (standard lidar point cloud formats).

Things You’ll Need To Complete This Tutorial

  • Access to the internet so you can access the plas.io website.

Download Data

NEON Teaching Data Subset: Sample LiDAR Point Cloud Data (.las)

This .las file contains sample LiDAR point cloud data collected by National Ecological Observatory Network's Airborne Observation Platform group. The .las file format is a commonly used file format to store LIDAR point cloud data. NEON data are available on the NEON data portal.

Download NEON Teaching Data Subset: Sample LiDAR Point Cloud Data (.las)

Example visualization of LiDAR data

LiDAR data collected over Grand Mesa, Colorado as a part of instrument testing and calibration by the National Ecological Observatory Network Airborne Observation Platform (NEON AOP). Source: National Ecological Observatory Network (NEON)

LiDAR File Formats

LiDAR data are most often available as discrete points. Although, remember that these data can be collected by the lidar instrument, in either discrete or full waveform, formats. A collection of discrete return LiDAR points is known as a LiDAR point cloud.

".las" is the commonly used file format to store LIDAR point cloud data. This format is supported by the American Society of Photogrammetry and Remote sensing (ASPRS). Recently, the .laz format has been developed by Martin Isenberg of LasTools. Laz is a highly compressed version of .las.

In this tutorial, you will open a .las file, in the plas.io free online lidar data viewer. You will then explore some of the attributes associated with a lidar data point cloud.

LiDAR Attribute Data

Remember that not all lidar data are created equally. Different lidar data may have different attributes. In this tutorial, we will look at data that contain both intensity values and a ground vs non ground classification.

Plas.io Viewer

We will use the plas.io website. in this tutorial. As described on their plas.io github page:

Plasio is a project by Uday Verma and Howard Butler that implements point cloud rendering capability in a browser. Specifically, it provides a functional implementation of the ASPRS LAS format, and it can consume LASzip-compressed data using LASzip NaCl module. Plasio is Chrome-only at this time, but it is hoped that other contributors can step forward to bring it to other browsers.

It is expected that most WebGL-capable browsers should be able to support plasio, and it contains nothing that is explicitly Chrome-specific beyond the optional NaCL LASzip module.

This tool is useful because you don't need to install anything to use it! Drag and drop your lidar data directly into the tool and begin to play! The website also provides access to some prepackaged datasets if you want to experiment on your own.

Enough reading, let's open some NEON LiDAR data!

1. Open a .las file in plas.io

  1. Download the NEON prepackaged lidar dataset (above in Download the Data) if you haven't already.
  2. The file is named: NEON-DS-Sample-LiDAR-Point-Cloud.las
  3. When the download is complete, drag the file NEON-DS-Sample-LiDAR-Point-Cloud.las into the plas.io website. window.
  4. Zoom and pan around the data
  5. Use the particle size slider to adjust the size of each individual lidar point. NOTE: the particle size slider is located a little more than half way down the plas.io toolbar in the "Data" section.

NICE! You should see something similar to the screenshot below:

NEON lidar data in the plas.io online tool.

Navigation in Plas.io

You might prefer to use a mouse to explore your data in plas.io. Let's test the navigation out.

  1. Left click on the screen and drag the data on the screen. Notice that this tilts the data up and down.
  2. Right click on the screen and drag noticing that this moves the entire dataset around
  3. Use the scroll bar on your mouse to zoom in and out.

How The Points are Colored

Why is everything grey when the data are loaded?

Notice that the data, upon initial view, are colored in a black - white color scheme. These colors represent the data's intensity values. Remember that the intensity value, for each LiDAR point, represents the amount of light energy that reflected off of an object and returned to the sensor. In this case, darker colors represent LESS light energy returned. Lighter colors represent MORE light returned.

Lidar intensity values represent the amount of light energy that reflected off of an object and returned to the sensor.

2. Adjust the intensity threshold

Next, scroll down through the tools in plas.io. Look for the Intensity Scaling slider. The intensity scaling slider allows you to define the thresholds of light to dark intensity values displayed in the image (similar to stretching values in an image processing software or even in Photoshop).

Drag the slider back and forth. Notice that you can brighten up the data using the slider.

The intensity scaling slider is located below the color map tool so it's easy to miss. Drag the slider back and forth to adjust the range of intensity values and to brighten up the lidar point clouds.

3. Change the lidar point cloud color options to Classification

In addition to intensity values, these lidar data also have a classification value. Lidar data classification values are numeric, ranging from 0-20 or higher. Some common classes include:

  • 0 Not classified
  • 1 Unassigned
  • 2 Ground
  • 3 Low vegetation
  • 4 Medium vegetation
  • 5 High Vegetation
  • 6 Building
Blue and Orange gradient color scheme submitted by Kendra Sand. What color scheme is your favorite?

In this case, these data are classified as either ground, or non-ground. To view the points, colored by class:

  • Change the "colorization" setting to "Classification
  • Change the intensity blending slider to "All Color"
  • For kicks - play with the various colormap options to change the colors of the points.
Set the colorization to 'classified' and then adjust the intensity blending to view the points, colored by ground and non-ground classification.

4. Spend Some Time Exploring - Do you See Any Trees?

Finally, spend some time exploring the data. what features do you see in this dataset? What does the topography look like? Is the site flat? Hilly? Mountainous? What do the lidar data tell you, just upon initial inspection?

Summary

  • The plas.io online point cloud viewer allows you to quickly view and explore lidar data point clouds.
  • Each lidar data point will have an associated set of attributes. You can check the metadata to determine which attributes the dataset contains. NEON data, provided above, contain both classification and intensity values.
  • Classification values represent the type of object that the light energy reflected off of. Classification values are often ground vs non ground. Some lidar data files might have buildings, water bodies and other natural and man made elements classified.
  • LiDAR data often has an intensity value associated with it. This represents the amount of light energy that reflected off an object and returned to the sensor.

Additional Resources:

  • What is .las? From laspy - the las Python library
  • las v1.3 specifications

Create a Canopy Height Model from lidar-derived Rasters in R

Authors: Edmund Hart, Leah A. Wasser

Last Updated: Apr 8, 2021

A common analysis using lidar data are to derive top of the canopy height values from the lidar data. These values are often used to track changes in forest structure over time, to calculate biomass, and even leaf area index (LAI). Let's dive into the basics of working with raster formatted lidar data in R!

Learning Objectives

After completing this tutorial, you will be able to:

  • Work with digital terrain model (DTM) & digital surface model (DSM) raster files.
  • Create a canopy height model (CHM) raster from DTM & DSM rasters.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • raster: install.packages("raster")
  • rgdal: install.packages("rgdal")

More on Packages in R - Adapted from Software Carpentry.

Download Data

NEON Teaching Data Subset: Field Site Spatial Data

These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. The entire dataset can be accessed by request from the NEON Data Portal.

Download Dataset

This tutorial is designed for you to set your working directory to the directory created by unzipping this file.


Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.


Recommended Reading

What is a CHM, DSM and DTM? About Gridded, Raster lidar Data

Create a lidar-derived Canopy Height Model (CHM)

The National Ecological Observatory Network (NEON) will provide lidar-derived data products as one of its many free ecological data products. These products will come in the GeoTIFF format, which is a .tif raster format that is spatially located on the earth.

In this tutorial, we create a Canopy Height Model. The Canopy Height Model (CHM), represents the heights of the trees on the ground. We can derive the CHM by subtracting the ground elevation from the elevation of the top of the surface (or the tops of the trees).

We will use the raster R package to work with the the lidar-derived digital surface model (DSM) and the digital terrain model (DTM).

# Load needed packages
library(raster)
library(rgdal)

# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd="~/Git/data/" #This will depend on your local environment
setwd(wd)

First, we will import the Digital Surface Model (DSM). The DSM represents the elevation of the top of the objects on the ground (trees, buildings, etc).

# assign raster to object
dsm <- raster(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/DigitalSurfaceModel/SJER2013_DSM.tif"))

# view info about the raster.
dsm

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : /Users/olearyd/Git/data/NEON-DS-Field-Site-Spatial-Data/SJER/DigitalSurfaceModel/SJER2013_DSM.tif 
## names      : SJER2013_DSM

# plot the DSM
plot(dsm, main="Lidar Digital Surface Model \n SJER, California")

Note the resolution, extent, and coordinate reference system (CRS) of the raster. To do later steps, our DTM will need to be the same.

Next, we will import the Digital Terrain Model (DTM) for the same area. The DTM represents the ground (terrain) elevation.

# import the digital terrain model
dtm <- raster(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/DigitalTerrainModel/SJER2013_DTM.tif"))

plot(dtm, main="Lidar Digital Terrain Model \n SJER, California")

With both of these rasters now loaded, we can create the Canopy Height Model (CHM). The CHM represents the difference between the DSM and the DTM or the height of all objects on the surface of the earth.

To do this we perform some basic raster math to calculate the CHM. You can perform the same raster math in a GIS program like QGIS.

When you do the math, make sure to subtract the DTM from the DSM or you'll get trees with negative heights!

# use raster math to create CHM
chm <- dsm - dtm

# view CHM attributes
chm

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : -1.399994, 40.29001  (min, max)

plot(chm, main="Lidar Canopy Height Model \n SJER, California")

We've now created a CHM from our DSM and DTM. What do you notice about the canopy cover at this location in the San Joaquin Experimental Range?

### Challenge: Basic Raster Math

Convert the CHM from meters to feet. Plot it.

If, in your work you need to create lots of CHMs from different rasters, an efficient way to do this would be to create a function to create your CHMs.

# Create a function that subtracts one raster from another
# 
canopyCalc <- function(DTM, DSM) {
  return(DSM -DTM)
  }
    
# use the function to create the final CHM
chm2 <- canopyCalc(dsm,dtm)
chm2

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : -40.29001, 1.399994  (min, max)

# or use the overlay function
chm3 <- overlay(dsm,dtm,fun = canopyCalc) 
chm3 

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : -40.29001, 1.399994  (min, max)

As with any raster, we can write out the CHM as a GeoTiff using the writeRaster() function.

# write out the CHM in tiff format. 
writeRaster(chm,paste0(wd,"chm_SJER.tif"),"GTiff")

We've now successfully created a canopy height model using basic raster math -- in R! We can bring the chm_SJER.tif file into QGIS (or any GIS program) and look at it.


Consider going onto the next tutorial Extract Values from a Raster in R to compare this lidar-derived CHM with ground-based observations!

Get Lesson Code

create-canopy-height-model-in-R.R

Extract Values from a Raster in R

Authors: Edmund Hart, Leah A. Wasser, Donal O'Leary

Last Updated: May 13, 2021

In this tutorial, we go through three methods for extracting data from a raster in R:

  • from circular buffers around points,
  • from square buffers around points, and
  • from shapefiles.

In doing so, we will also learn to convert x,y locations in tabluar format (.csv, .xls, .txt) into SpatialPointsDataFrames which can be used with other spatial data.

Learning Objectives

After completing this activity, you will be able to:

  • Convert x,y point locations to SpatialPointsDataFrames
  • Assign a Coordinate Reference System (CRS) to a SpatialPointsDataFrame
  • Extract values from raster files.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • raster: install.packages("raster")
  • sp: install.packages("sp")
  • rgdal: install.packages("rgdal")
  • maptools: install.packages("maptools")
  • rgeos: install.packages("rgeos")
  • dplyr: install.packages("dplyr")
  • ggplot2: install.packages("ggplot2")
More on Packages in R - Adapted from Software Carpentry

Download Data

NEON Teaching Data Subset: Field Site Spatial Data

These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. The entire dataset can be accessed by request from the NEON Data Portal.

Download Dataset

This tutorial is designed for you to set your working directory to the directory created by unzipping this file.


Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.


Recommended Reading

What is a CHM, DSM and DTM? About Gridded, Raster lidar Data

Let's say we are studying canopy structure at San Joaquin Experimental Range in California. Last year we went out and laboriously collected field measured height of several trees surrounding each of several randomly collected points. It took many sweaty days to complete, now we find out the NEON is collecting lidar data over this same area and will be doing to for the duration of our study! Using this data will save us tons of time and $ -- but how does the data compare.

Let's extract the data from the NEON provided raster (learning three different methods) and then compare them to our ground measured tree heights.

Convert x,y Locations to Spatial Data

Let's say we have our insitu data in two separate .csv (comma separate value) files:

  • SJER/VegetationData/D17_2013_vegStr.csv: contains our vegetation structure data for each plot.
  • SJER/PlotCentroids/SJERPlotCentroids.csv: contains the plot centroid location information (x,y) where we measured trees.

Let's start by plotting the plot locations where we measured trees (in red) on a map.

We will need to convert the plot centroids to a spatial points dataset in R. This is why we need to load two packages - the spatial package sp –- and a data manipulation package dplyr -- in addition to working with the raster package.

NOTE: the sp library typically installs when you install the raster package.

install.packages("maptools")

## 
## The downloaded binary packages are in
## 	/var/folders/bn/w43q_t8s3_xckn5j4plhb289fqhhfx/T//Rtmp6ITj5Y/downloaded_packages

# Load needed packages
library(raster)
library(rgdal)
library(dplyr)

# Method 3:shapefiles
library(maptools)

# plotting
library(ggplot2)

# set working directory to ensure R can find the file we wish to import and where
wd <- "~/Git/data/" #This will depend on your local environment
setwd(wd)

Let's get started with the insitu vegetation data!

# import the centroid data and the vegetation structure data
# this means all strings of letter coming in will remain character
options(stringsAsFactors=FALSE)

# read in plot centroids
centroids <- read.csv(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/PlotCentroids/SJERPlotCentroids.csv"))
str(centroids)

## 'data.frame':	18 obs. of  5 variables:
##  $ Plot_ID : chr  "SJER1068" "SJER112" "SJER116" "SJER117" ...
##  $ Point   : chr  "center" "center" "center" "center" ...
##  $ northing: num  4111568 4111299 4110820 4108752 4110476 ...
##  $ easting : num  255852 257407 256839 256177 255968 ...
##  $ Remarks : logi  NA NA NA NA NA NA ...

# read in vegetation heights
vegStr <- read.csv(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/VegetationData/D17_2013_vegStr.csv"))
str(vegStr)

## 'data.frame':	362 obs. of  26 variables:
##  $ siteid               : chr  "SJER" "SJER" "SJER" "SJER" ...
##  $ sitename             : chr  "San Joaquin" "San Joaquin" "San Joaquin" "San Joaquin" ...
##  $ plotid               : chr  "SJER128" "SJER2796" "SJER272" "SJER112" ...
##  $ easting              : num  257086 256048 256723 257421 256720 ...
##  $ northing             : num  4111382 4111548 4112170 4111308 4112177 ...
##  $ taxonid              : chr  "PISA2" "ARVI4" "ARVI4" "ARVI4" ...
##  $ scientificname       : chr  "Pinus sabiniana" "Arctostaphylos viscida" "Arctostaphylos viscida" "Arctostaphylos viscida" ...
##  $ indvidualid          : int  1485 1622 1427 1511 1431 1507 1433 1620 1425 1506 ...
##  $ pointid              : chr  "center" "NE" "center" "center" ...
##  $ individualdistance   : num  9.7 5.8 6 17.2 9.9 15.1 6.8 10.5 2.6 15.9 ...
##  $ individualazimuth    : num  135.6 31.4 65.9 57.1 17.7 ...
##  $ dbh                  : num  67.4 NA NA NA 17.1 NA NA 18.6 NA NA ...
##  $ dbhheight            : num  130 130 130 130 10 130 130 1 130 130 ...
##  $ basalcanopydiam      : int  0 43 23 22 0 105 107 0 73 495 ...
##  $ basalcanopydiam_90deg: num  0 31 14 12 0 43 66 0 66 126 ...
##  $ maxcanopydiam        : num  15.1 5.7 5.9 2.5 5.2 8.5 3.3 6.5 3.3 7.5 ...
##  $ canopydiam_90deg     : num  12.4 4.8 4.3 2.1 4.6 6.1 2.5 5.2 2.1 6.9 ...
##  $ stemheight           : num  18.2 3.3 1.7 2.1 3 3.1 1.7 3.8 1.4 3.1 ...
##  $ stemremarks          : chr  "" "3 stems" "2 stems" "6 stems" ...
##  $ stemstatus           : chr  "" "" "" "" ...
##  $ canopyform           : chr  "" "Hemisphere" "Hemisphere" "Sphere" ...
##  $ livingcanopy         : int  100 70 35 70 80 85 0 85 85 55 ...
##  $ inplotcanopy         : int  100 100 100 100 100 100 100 100 100 100 ...
##  $ materialsampleid     : chr  "" "f095" "" "f035" ...
##  $ dbhqf                : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ stemmapqf            : int  0 0 0 0 0 0 0 0 0 0 ...

Now let's load the Canopy Height Model raster. Note, if you completed the Create a Canopy Height Model from lidar-derived Rasters in R tutorial this is the same object chm you can created. You do not need to reload the data.

# import the digital terrain model
chm <- raster(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/CHM_SJER.tif"))

# plot raster
plot(chm, main="Lidar Canopy Height Model \n SJER, California")

Since both files have eastings and northings we can use this data to plot onto our existing raster.

## overlay the centroid points and the stem locations on the CHM plot
# plot the chm
myCol <- terrain.colors(6)
plot(chm,col=myCol, main="Plot & Tree Locations", breaks=c(-2,0,2,10,40))

## plotting details: cex = point size, pch 0 = square
# plot square around the centroid
points(centroids$easting,centroids$northing, pch=0, cex = 2 )
# plot location of each tree measured
points(vegStr$easting,vegStr$northing, pch=19, cex=.5, col = 2)

Now we have a plot of our CHM showing trees of different (categorical) heights. Why might we have chosen these breaks?

On this CHM plot we've marked the locations of the plot centers. Note the black box isn't the plot boundary, but determined by the plot marker we chose so that we can see the centroids that would otherwise be "under" the tree height points. We've also plotted the locations of individual trees we measured (red overlapping circles).

Plotting Tips: use help(points) to read about the options for plotting points. Or to see a list of pch values (symbols), check out this website.

Spatial Data Need a Coordinate Reference System

We plotted the easting and northing of the points accurately on the map, but our data doesn't yet have a specific Coordinate Reference System attached to it. The CRS is information that allows a program like QGIS to determine where the data are located, in the world. Read more about CRS here

We need to assign a Coordinate Reference System to our insitu data. In this case, we know these data are all in the same projection as our original CHM. We can quickly figure out what projection an object is in, using object@crs.

# check CHM CRS
chm@crs

## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs

So our data are in UTM Zone 11 which is correct for California. We can use this CRS to make our data points into a Spatial Points Data Frame which then allows the points to be treated as spatial objects.

## create SPDF: SpatialPointsDataFrame()
# specify the easting (column 4) & northing (columns 3) in that order
# specify CRS proj4string: borrow CRS from chm 
# specify raster
centroid_spdf <- SpatialPointsDataFrame(
  centroids[,4:3], proj4string=chm@crs, centroids)

# check centroid CRS
# note SPDFs don't have a crs slot so `object@crs` won't work
centroid_spdf

## class       : SpatialPointsDataFrame 
## features    : 18 
## extent      : 254738.6, 258497.1, 4107527, 4112168  (xmin, xmax, ymin, ymax)
## crs         : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## variables   : 5
## names       :  Plot_ID,  Point,    northing,    easting, Remarks 
## min values  : SJER1068, center, 4107527.074, 254738.618,      NA 
## max values  :  SJER952, center, 4112167.778, 258497.102,      NA

We now have our centroid data as a spatial points data frame. This will allow us to work with them as spatial data along with other spatial data -- like rasters.

Extract CMH Data from Buffer Area

In order to accomplish a goal of comparing the CHM with our ground data, we want to extract the CHM height at the point for each tree we measured. To do this, we will create a boundary region (called a buffer) representing the spatial extent of each plot (where trees were measured). We will then extract all CHM pixels that fall within the plot boundary to use to estimate tree height for that plot.

When a circular buffer is applied to a raster, some pixels fall fully within the buffer but some are partially excluded. Values for all pixels in the specified raster that fall within the circular buffer are extracted.

There are a few ways to go about this task. As our plots are circular, we'll use the extract function in R allows you to specify a circular buffer with a given radius around an x,y point location. Values for all pixels in the specified raster that fall within the circular buffer are extracted. In this case, we can tell R to extract the maximum value of all pixels using the fun=max argument.

Method 1: Extract Data From a Circular Buffer

In the first, example we'll presume our insitu sampling took place within a circular plot with a 20m radius. Therefore, we will use a buffer of 20m.

When we use the extract() function with fun=max, R returns a dataframe containing the max height calculated from all pixels in the buffer for each plot.

There are a few other popular packages that have a function called extract(), so we will specify to use the function from the raster package using the "::" notation.

# extract circular, 20m buffer

cent_max <- raster::extract(chm,             # raster layer
	centroid_spdf,   # SPDF with centroids for buffer
	buffer = 20,     # buffer size, units depend on CRS
	fun=max,         # what to value to extract
	df=TRUE)         # return a dataframe? 

# view
cent_max

##    ID  CHM_SJER
## 1   1 18.940002
## 2   2 24.189972
## 3   3 13.299988
## 4   4 10.989990
## 5   5  5.690002
## 6   6 19.079987
## 7   7 16.299988
## 8   8 11.959991
## 9   9 19.120026
## 10 10 11.149994
## 11 11  9.290009
## 12 12 18.329987
## 13 13 11.080017
## 14 14  9.140015
## 15 15  2.619995
## 16 16 24.250000
## 17 17 18.250000
## 18 18  6.019989

Ack! We've lost our PlotIDs, how will we match them up?

# grab the names of the plots from the centroid_spdf
cent_max$plot_id <- centroid_spdf$Plot_ID

#fix the column names
names(cent_max) <- c('ID','chmMaxHeight','plot_id')

# view again - we have plot_ids
cent_max

##    ID chmMaxHeight  plot_id
## 1   1    18.940002 SJER1068
## 2   2    24.189972  SJER112
## 3   3    13.299988  SJER116
## 4   4    10.989990  SJER117
## 5   5     5.690002  SJER120
## 6   6    19.079987  SJER128
## 7   7    16.299988  SJER192
## 8   8    11.959991  SJER272
## 9   9    19.120026 SJER2796
## 10 10    11.149994 SJER3239
## 11 11     9.290009   SJER36
## 12 12    18.329987  SJER361
## 13 13    11.080017   SJER37
## 14 14     9.140015    SJER4
## 15 15     2.619995    SJER8
## 16 16    24.250000  SJER824
## 17 17    18.250000  SJER916
## 18 18     6.019989  SJER952

# merge the chm data into the centroids data.frame
centroids <- merge(centroids, cent_max, by.x = 'Plot_ID', by.y = 'plot_id')

# have a look at the centroids dataFrame
head(centroids)

##    Plot_ID  Point northing  easting Remarks ID chmMaxHeight
## 1 SJER1068 center  4111568 255852.4      NA  1    18.940002
## 2  SJER112 center  4111299 257407.0      NA  2    24.189972
## 3  SJER116 center  4110820 256838.8      NA  3    13.299988
## 4  SJER117 center  4108752 256176.9      NA  4    10.989990
## 5  SJER120 center  4110476 255968.4      NA  5     5.690002
## 6  SJER128 center  4111389 257078.9      NA  6    19.079987

Excellent. We now have the maximum "tree" height for each plot based on the CHM.

Extract All Pixel Heights

If we want to explore the data distribution of pixel height values in each plot, we could remove the fun call to max and generate a list.

It's good to look at the distribution of values we've extracted for each plot. Then you could generate a histogram for each plot hist(cent_ovrList[[2]]). If we wanted, we could loop through several plots and create histograms using a for loop.

# extract all
cent_heightList <- raster::extract(chm,centroid_spdf,buffer = 20)

# create histograms for the first 5 plots of data
# using a for loop

for (i in 1:5) {
  hist(cent_heightList[[i]], main=(paste("plot",i)))
  }

Looking at these distributions, the area has some pretty short trees -- plot 5 (really, SJER120 since we didn't match up the plotIDs) looks almost bare!

### Challenge: For Loops & Plotting Parameters Seeing as we have 18 trees in our `cent_heightList`, it would be nice to show all 18 plots organized into 6 rows of plots with 3 plots in each row. Modify the `for loop` above to plot all 18 histograms.

Improve upon the plot's final appearance to make a readable final figure. Hint: one way to setup a layout with multiple plots in R is: par(mfrow=c(2,3)) which gives a 2 rows, 3 columns layout.

Method 2: Square Plots

To complete this next method, you need to first create square plots around a point to create a R object called polys. Directions for how to do this are contained in this tutorial:

Create A Square Buffer Around a Plot Centroid in R.

Once you have the SpatialPolygon object polys, you can use the same extract() function as we did for the circular plots, but this time with no buffer since we already have a polygon to use.

#Will need to load 'sp' package 'library(sp)'
square_max <- raster::extract(chm,             # raster layer
	polys,   # spatial polygon for extraction
	fun=max,         # what to value to extract
	df=TRUE)         # return a dataframe? 

However, if you're going this route with your data, we recommend using the next method!

Method 3: Extract Values Using a Shapefile

If our plot boundaries are saved in a shapefile, we can use them to extract the data.

In our data, we have two different shapefiles (SJER/PlotCentroids) for this area

  • SJERPlotCentroids_Buffer
  • SJERPlotCentroids_BuffSquare

To import a shapefile into R we must have the maptools package, which requires the rgeos package, installed.

# load shapefile data
centShape <- readOGR(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/PlotCentroids/SJERPlotCentroids_Buffer.shp"))

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/olearyd/Git/data/NEON-DS-Field-Site-Spatial-Data/SJER/PlotCentroids/SJERPlotCentroids_Buffer.shp", layer: "SJERPlotCentroids_Buffer"
## with 18 features
## It has 6 fields

plot(centShape)

Then we can simply use the extract function again. Here we specify not weighting the values returned and we directly add the data to our centroids file instead of having it be a separate data frame that we later have to match up.

# extract max from chm for shapefile buffers
centroids$chmMaxShape <- raster::extract(chm, centShape, weights=FALSE, fun=max)

# view
head(centroids)

##    Plot_ID  Point northing  easting Remarks ID chmMaxHeight
## 1 SJER1068 center  4111568 255852.4      NA  1    18.940002
## 2  SJER112 center  4111299 257407.0      NA  2    24.189972
## 3  SJER116 center  4110820 256838.8      NA  3    13.299988
## 4  SJER117 center  4108752 256176.9      NA  4    10.989990
## 5  SJER120 center  4110476 255968.4      NA  5     5.690002
## 6  SJER128 center  4111389 257078.9      NA  6    19.079987
##   chmMaxShape
## 1   18.940002
## 2   24.189972
## 3   13.299988
## 4   10.989990
## 5    5.690002
## 6   19.079987

Which was faster, extracting from a SpatialPolgygon object (polys) or extracting with a SpatialPolygonsDataFrame (centShape)? Keep this in mind when doing future work--the SpatialPolgyonsDataFrame is more efficient.

### Challenge: Square Shapefile Plots

Compare the values from cent_max and square_max. Are they the same? Why might they differ?

Extract Summary Data from Ground Measures

In our final step, we will extract summary height values from our field data (vegStr). We can do this using the base R packages (Method 1) or more efficiently, using the dplyr package (Method 2).

Method 1: Use Base R

We'll start by find the maximum ground measured stem height value for each plot. We will compare this value to the max CHM value.

First, use the aggregate() function to summarize our data of interest stemheight. The arguments of which are:

  • the data on which you want to calculate something ~ the grouping variable
  • the FUNction we want from the data

Then we'll assign cleaner names to the new data.

# find max stemheight
maxStemHeight <- aggregate( vegStr$stemheight ~ vegStr$plotid, 
														FUN = max )  

# view
head(maxStemHeight)

##   vegStr$plotid vegStr$stemheight
## 1      SJER1068              19.3
## 2       SJER112              23.9
## 3       SJER116              16.0
## 4       SJER117              11.0
## 5       SJER120               8.8
## 6       SJER128              18.2

#Assign cleaner names to the columns
names(maxStemHeight) <- c('plotid','insituMaxHeight')

# view
head(maxStemHeight)

##     plotid insituMaxHeight
## 1 SJER1068            19.3
## 2  SJER112            23.9
## 3  SJER116            16.0
## 4  SJER117            11.0
## 5  SJER120             8.8
## 6  SJER128            18.2

Bonus: Add in 95% height, while combining the above steps into one line of code.

# add the max and 95th percentile height value for all trees within each plot
insitu <- cbind(maxStemHeight,'quant'=tapply(vegStr$stemheight, 
	vegStr$plotid, quantile, prob = 0.95))

# view
head(insitu)

##            plotid insituMaxHeight  quant
## SJER1068 SJER1068            19.3  8.600
## SJER112   SJER112            23.9 19.545
## SJER116   SJER116            16.0 13.300
## SJER117   SJER117            11.0 10.930
## SJER120   SJER120             8.8  8.680
## SJER128   SJER128            18.2 12.360

Method 2: Extract using dplyr

You can also achieve the same results in a more efficient manner using the R package dplyr. Additionally, the dplyr workflow is more similar to a typical database approach.

For more on using the dplyr package see our tutorial, Filter, Piping and GREPL Using R DPLYR - An Intro.

# find the max stem height for each plot
maxStemHeight_d <- vegStr %>% 
  group_by(plotid) %>% 
  summarise(max = max(stemheight))

# view
head(maxStemHeight_d)

## # A tibble: 6 x 2
##   plotid     max
##   <chr>    <dbl>
## 1 SJER1068  19.3
## 2 SJER112   23.9
## 3 SJER116   16  
## 4 SJER117   11  
## 5 SJER120    8.8
## 6 SJER128   18.2

# fix names
names(maxStemHeight_d) <- c("plotid","insituMaxHeight")
head(maxStemHeight_d)

## # A tibble: 6 x 2
##   plotid   insituMaxHeight
##   <chr>              <dbl>
## 1 SJER1068            19.3
## 2 SJER112             23.9
## 3 SJER116             16  
## 4 SJER117             11  
## 5 SJER120              8.8
## 6 SJER128             18.2

And the bonus code with dplyr.

# one line of nested commands, 95% height value
insitu_d <- vegStr %>%
	filter(plotid %in% centroids$Plot_ID) %>% 
	group_by(plotid) %>% 
	summarise(max = max(stemheight), quant = quantile(stemheight,.95))

# view
head(insitu_d)

## # A tibble: 6 x 3
##   plotid     max quant
##   <chr>    <dbl> <dbl>
## 1 SJER1068  19.3  8.6 
## 2 SJER112   23.9 19.5 
## 3 SJER116   16   13.3 
## 4 SJER117   11   10.9 
## 5 SJER120    8.8  8.68
## 6 SJER128   18.2 12.4

Combine Ground & Remote Sensed Data

Once we have our summarized insitu data, we can merge it into the centroids data.frame. The merge() function requires two data.frames and the names of the columns containing the unique ID that we will merge the data on. In this case, we will merge the data on the Plot ID (plotid, Plot_ID) column. Notice that it's spelled slightly differently in both data.frames so we'll need to tell R what it's called in each data.frame.

If you plan your data collection, entry, and analyses ahead of time you can standardize your names to avoid potential confusion like this!

# merge the insitu data into the centroids data.frame
centroids <- merge(centroids, maxStemHeight, by.x = 'Plot_ID', by.y = 'plotid')

# view
head(centroids)

##    Plot_ID  Point northing  easting Remarks ID chmMaxHeight
## 1 SJER1068 center  4111568 255852.4      NA  1    18.940002
## 2  SJER112 center  4111299 257407.0      NA  2    24.189972
## 3  SJER116 center  4110820 256838.8      NA  3    13.299988
## 4  SJER117 center  4108752 256176.9      NA  4    10.989990
## 5  SJER120 center  4110476 255968.4      NA  5     5.690002
## 6  SJER128 center  4111389 257078.9      NA  6    19.079987
##   chmMaxShape chmMaxSquareShape     diff insituMaxHeight
## 1   18.940002         18.940002 0.000000            19.3
## 2   24.189972         24.189972 0.000000            23.9
## 3   13.299988         13.299988 0.000000            16.0
## 4   10.989990         10.989990 0.000000            11.0
## 5    5.690002          7.380005 1.690002             8.8
## 6   19.079987         19.079987 0.000000            18.2

Plot Remote Sensed vs Ground Data

Now we can create a plot that illustrates the relationship between in situ measured tree height values and lidar-derived max canopy height values.

We can make a simple plot using the base R plot() function:

#create basic plot
plot(x = centroids$chmMaxShape, y=centroids$insituMaxHeight)

Or we can use the ggplot() function from the ggplot2 package. For more on using the ggplot2 package see our tutorial, Plot Time Series with ggplot2 in R.

In reality, we know that the trees in these plots are the same height regardless of if we measure them with lidar or from the ground. However, there may be certain biases in our measurements, and it will be interesting to see if one method measures the trees as being taller than the other. To make this comparison, we will add what is called a "1:1" line, i.e., the line where all of the points would fall if both methods measured the trees as exactly the same height. Let's make this "1:1" line dashed and slightly transparent so that it doesn't obscure any of our points.

# create plot

ggplot(centroids,aes(x=chmMaxShape, y =insituMaxHeight )) + 
  geom_abline(slope=1, intercept = 0, alpha=.5, lty=2)+ # plotting our "1:1" line
  geom_point() + 
  theme_bw() + 
  ylab("Maximum measured height") + 
  xlab("Maximum lidar pixel")

We can also add a regression fit to our plot. Explore the ggplot() options and customize your plot.

# plot with regression

ggplot(centroids, aes(x=chmMaxShape, y=insituMaxHeight)) +
  geom_abline(slope=1, intercept=0, alpha=.5, lty=2) + #plotting our "1:1" line
  geom_point() +
  geom_smooth(method = lm) + # add regression line and confidence interval
  ggtitle("Lidar CHM-derived vs. Measured Tree Height") +
  ylab("Maximum Measured Height") +
  xlab("Maximum Lidar Height") +
  theme(panel.background = element_rect(colour = "grey"),
        plot.title = element_text(family="sans", face="bold", size=20, vjust=1.19),
        axis.title.x = element_text(family="sans", face="bold", size=14, angle=00, hjust=0.54, vjust=-.2),
        axis.title.y = element_text(family="sans", face="bold", size=14, angle=90, hjust=0.54, vjust=1))

## `geom_smooth()` using formula 'y ~ x'

You have now successfully compared lidar derived vegetation height, within plots, to actual measured tree height data! By comparing the regression line against the 1:1 line, it appears as though lidar underestimates tree height for shorter trees, and overestimates tree height for taller trees.. Or could it be that human observers underestimate the height of very tall trees because it's hard to see the crown from the ground? Or perhaps the lidar-based method mis-judges the elevation of the ground, which would throw off the accuracy of the CHM? As you can see, there are many potential factors leading to this disagreement in height between observation methods, which the savvy researcher would be sure to investigate if tree height is important for their particular pursuits.

If you want to make this an interactive plot, you could use Plotly to do so. For more on using the plotly package to create interactive plots, see our tutorial Interactive Data Vizualization with R and Plotly.

### Challenge: Plot Data

Create a plot of lidar 95th percentile value vs insitu max height. Or lidar 95th percentile vs insitu 95th percentile.

Compare this plot to the previous one with max height. Which would you prefer to use for your analysis? Why?

## `geom_smooth()` using formula 'y ~ x'

Get Lesson Code

extract-raster-data-R.R

Going On The Grid -- An Intro to Gridding & Spatial Interpolation

Authors: Leah Wasser, Tristan Goulden

Last Updated: Apr 8, 2021

In this tutorial was originally created for an ESA brown-bag workshop. Here we present the main graphics and topics covered in the workshop.

Additional Resources

  • Learn more about LiDAR data in our tutorial The Basics of LiDAR - Light Detection and Ranging - Remote Sensing.
  • Learn more about Digital Surface Models, Digital Terrain Models and Canopy Height Models in our tutorial What is a CHM, DSM and DTM? About Gridded, Raster LiDAR Data.

How Does LiDAR Works?

<iframe width="640" height="360" src="//www.youtube.com/embed/EYbhNSUnIdU?rel=0" frameborder="0" allowfullscreen></iframe>

About Rasters

For a full tutorial on rasters & raster data, please go through the Intro to Raster Data in R tutorial which provides a foundational concepts even if you aren't working with R.

A raster is a dataset made up of cells or pixels. Each pixel represents a value associated with a region on the earth’s surface.

The spatial resolution of a raster refers the size of each cell in meters. This size in turn relates to the area on the ground that the pixel represents. Source: National Ecological Observatory Network

For more on raster resolution, see our tutorial on The Relationship Between Raster Resolution, Spatial Extent & Number of Pixels.

Creating Surfaces (Rasters) From Points

There are several ways that we can get from the point data collected by lidar to the surface data that we want for different Digital Elevation Models or similar data we use in analyses and mapping.

Basic gridding does not allow for the recovery/classification of data in any area where data are missing. Interpolation (including Triangulated Irregular Network (TIN) Interpolation) allows for gaps to be covered so that there are not holes in the resulting raster surface.

Interpolation can be done in a number of different ways, some of which are deterministic and some are probabilistic.

When converting a set of sample points to a grid, there are many different approaches that should be considered. Source: National Ecological Observatory Network

Gridding Points

When creating a raster, you may chose to perform a direct gridding of the data. This means that you calculate one value for every cell in the raster where there are sample points. This value may be a mean of all points, a max, min or some other mathematical function. All other cells will then have no data values associated with them. This means you may have gaps in your data if the point spacing is not well distributed with at least one data point within the spatial coverage of each raster cell.

When you directly grid a dataset, values will only be calculated for cells that overlap with data points. Thus, data gaps will not be filled. Source: National Ecological Observatory Network

We can create a raster from points through a process called gridding. Gridding is the process of taking a set of points and using them to create a surface composed of a regular grid.

Animation showing the general process of taking lidar point clouds and converting them to a raster format. Source: Tristan Goulden, National Ecological Observatory Network

Spatial Interpolation

Spatial interpolation involves calculating the value for a query point (or a raster cell) with an unknown value from a set of known sample point values that are distributed across an area. There is a general assumption that points closer to the query point are more strongly related to that cell than those farther away. However this general assumption is applied differently across different interpolation functions.

Interpolation methods will estimate values for cells where no known values exist.

Deterministic & Probabilistic Interpolators

There are two main types of interpolation approaches:

  • Deterministic: create surfaces directly from measured points using a weighted distance or area function.
  • Probabilistic (Geostatistical): utilize the statistical properties of the measured points. Probabilistic techniques quantify the spatial auto-correlation among measured points and account for the spatial configuration of the sample points around the prediction location.

Different methods of interpolation are better for different datasets. This table lays out the strengths of some of the more common interpolation methods.

Abbreviations: TIN=Triangulated Irregular Network, IDW=Inverse Distance Weighted interpolation. Source: National Ecological Observatory Network

We will focus on deterministic methods in this tutorial.

Deterministic Interpolation Methods

Let's look at a few different deterministic interpolation methods to understand how different methods can affect an output raster.

Inverse Distance Weighted (IDW)

Inverse distance weighted (IDW) interpolation calculates the values of a query point (a cell with an unknown value) using a linearly weighted combination of values from nearby points.

IDW interpolation calculates the value of an unknown cell center value (a query point) using surrounding points with the assumption that closest points will be the most similar to the value of interest. Source: QGIS

Key Attributes of IDW Interpolation

  • The raster is derived based upon an assumed linear relationship between the location of interest and the distance to surrounding sample points. In other words, sample points closest to the cell of interest are assumed to be more related to its value than those further away.ID
  • Bounded/exact estimation, hence can not interpolate beyond the min/max range of data point values. This estimate within the range of existing sample point values can yield "flattened" peaks and valleys -- especially if the data didn't capture those high and low points.
  • Interpolated points are average values
  • Good for point data that are equally distributed and dense. Assumes a consistent trend or relationship between points and does not accommodate trends within the data(e.g. east to west, elevational, etc).
IDW interpolation looks at the linear distance between the unknown value and surrounding points. Source: J. Abrecht, CUNY

Power

The power value changes the "weighting" of the IDW interpolation by specifying how strongly points further away from the query point impact the calculated value for that point. Power values range from 0-3+ with a default settings generally being 2. A larger power value produces a more localized result - values further away from the cell have less impact on it's calculated value, values closer to the cell impact it's value more. A smaller power value produces a more averaged result where sample points further away from the cell have a greater impact on the cell's calculated value.

The solid line represents greater power and the dashed line represents lower power. The greater the power, the more localized an affect a sample point's value has on the resulting surface. A smaller power value yields a smoothed or more averaged surface. Source: J. Abrecht, CUNY

The impacts of power:

  • lower power values more averaged result, potential for a smoother surface. As power decreases, the influence of sample points is larger. This yields a smoother surface that is more averaged.

  • greater power values: more localized result, potential for more peaks and valleys around sample point locations. As power increases, the influence of sample points falls off more rapidly with distance. The query cell values become more localized and less averaged.

IDW Take Home Points

IDW is good for:

  • Data whose distribution is strongly (and linearly) correlated with distance. For example, noise falls off very predictably with distance.
  • Providing explicit control over the influence of distance (compared to Spline or Kriging).

IDW is not so good for:

  • Data whose distribution depends on more complex sets of variables because it can account only for the effects of distance.

Other features:

  • You can create a smoother surface by decreasing the power, increasing the number of sample points used or increasing the search (sample points) radius.
  • You can create a surface that more closely represents the peaks and dips of your sample points by decreasing the number of sample points used, decreasing the search radius or increasing the power.
  • You can increase IDW surface accuracy by adding breaklines to the interpolation process that serve as barriers. Breaklines represent abrupt changes in elevation, such as cliffs.

Spline

Spline interpolation fits a curved surface through the sample points of your dataset. Imagine stretching a rubber sheet across your points and gluing it to each sample point along the way -- what you get is a smooth stretched sheet with bumps and valleys. Unlike IDW, spline can estimate values above and below the min and max values of your sample points. Thus it is good for estimating high and low values not already represented in your data.

Spline interpolation fits a surface between the sample points of known values to estimate a value for the unknown cell. Source: J. Abrecht, CUNY

Regularized & Tension Spline

There are two types of curved surfaces that can be fit when using spline interpolation:

  1. Tension spline: a flatter surface that forces estimated values to stay closer to the known sample points.

  2. Regularized spline: a more elastic surface that is more likely to estimate above and below the known sample points.

<figcaption>
Regular vs tension spline.
</figcaption>

For more on spline interpolation, see ESRI's How Splines Work background materials.

Spline Take Home Points

Spline is good for:

  • Estimating values outside of the range of sample input data.
  • Creating a smooth continuous surface.

Spline is not so good for:

  • Points that are close together and have large value differences. Slope calculations can yield over and underestimation.
  • Data with large, sudden changes in values that need to be represented (e.g., fault lines, extreme vertical topographic changes, etc). NOTE: some tools like ArcGIS have introduces a spline with barriers function in recent years.

Natural Neighbor Interpolation

Natural neighbor interpolation finds the closest subset of data points to the query point of interest. It then applies weights to those points to calculate an average estimated value based upon their proportionate areas derived from their corresponding Voronoi polygons (or see figure below for definition). The natural neighbor interpolator adapts locally to the input data using points surrounding the query point of interest. Thus there is no radius, number of points or other settings needed when using this approach.

This interpolation method works equally well on regular and irregularly spaced data.

A Voronoi diagram is created by taking pairs of points that are close together and drawing a line that is equidistant between them and perpendicular to the line connecting them. Source: Wikipedia

Natural neighbor interpolation uses the area of each Voronoi polygon associated with the surrounding points to derive a weight that is then used to calculate an estimated value for the query point of interest.

To calculate the weighted area around a query point, a secondary Voronoi diagram is created using the immediately neighboring points and mapped on top of the original Voronoi diagram created using the known sample points (image below).

A secondary Voronoi diagram is created using the immediately neighboring points and mapped on top of the original Voronoi diagram created using the known sample points to created a weighted Natural neighbor interpolated raster. Image Source: ESRI

Natural Neighbor Take Home Points

Natural Neighbor is good for:

  • Data where spatial distribution is variable (and data that are equally distributed).
  • Categorical data.
  • Providing a smoother output raster.

Natural Neighbor is not as good for:

  • Data where the interpolation needs to be spatially constrained (to a particular number of points of distance).
  • Data where sample points further away from or beyond the immediate "neighbor points" need to be considered in the estimation.

Other features:

  • Good as a local Interpolator
  • Interpolated values fall within the range of values of the sample data
  • Surface passes through input samples; not above or below
  • Supports breaklines

Triangulated Irregular Network (TIN)

The Triangulated Irregular Network (TIN) is a vector based surface where sample points (nodes) are connected by a series of edges creating a triangulated surface. The TIN format remains the most true to the point distribution, density and spacing of a dataset. It also may yield the largest file size!

A TIN creating from LiDAR data collected by the NEON AOP over the NEON San Joaquin (SJER) field site.
  • For more on the TIN process see this information from ESRI: Overview of TINs

Interpolation in R, GrassGIS, or QGIS

These additional resources point to tools and information for gridding in R, GRASS GIS and QGIS.

R functions

The packages and functions maybe useful when creating grids in R.

  • gstat::idw()
  • stats::loess()
  • akima::interp()
  • fields::Tps()
  • fields::splint()
  • spatial::surf.ls()
  • geospt::rbf()

QGIS tools

  • Check out the documentation on different interpolation plugins Interpolation
  • Check out the documentation on how to convert a vector file to a raster: Rasterize (vector to raster).

The QGIS processing toolbox provides easy access to Grass commands.

GrassGIS commands

  • Check out the documentation on GRASS GIS Integration Starting the GRASS plugin

The following commands may be useful if you are working with GrassGIS.

  • v.surf.idw - Surface interpolation from vector point data by Inverse Distance Squared Weighting
  • v.surf.bspline - Bicubic or bilinear spline interpolation with Tykhonov regularization
  • v.surf.rst - Spatial approximation and topographic analysis using regularized spline with tension
  • v.to.rast.value - Converts (rasterize) a vector layer into a raster layer
  • v.delaunay - Creates a Delaunay triangulation from an input vector map containing points or centroids
  • v.voronoi - Creates a Voronoi diagram from an input vector layer containing points
NEON Logo

Follow Us:

Join Our Newsletter

Get updates on events, opportunities, and how NEON is being used today.

Subscribe Now

Footer

  • My Account
  • About Us
  • Newsroom
  • Contact Us
  • Terms & Conditions
  • Careers

Copyright © Battelle, 2019-2020

The National Ecological Observatory Network is a major facility fully funded by the National Science Foundation.

Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the National Science Foundation.