Series
Introduction to Working with Raster Data in R
The tutorials in this series cover how to open, work with and plot raster-format spatial data in R. Additional topics include working with spatial metadata (extent and coordinate reference system), reprojecting spatial data and working with raster time series data.
Data used in this series cover NEON Harvard Forest and San Joaquin Experimental Range field sites and are in GeoTIFF and .csv formats.
Learning Objectives
After completing the series you will:
-
Raster 00
- Understand what a raster dataset is and its fundamental attributes.
- Know how to explore raster attributes in
R
. - Be able to import rasters into
R
using theraster
package. - Be able to quickly plot a raster file in
R
. - Understand the difference between single- and multi-band rasters.
-
Raster 01
- Know how to plot a single band raster in R.
- Know how to layer a raster dataset on top of a hillshade to create an elegant basemap.
-
Raster 02
- Be able to reproject a raster in R.
-
Raster 03
- Be able to to perform a subtraction (difference) between two rasters using raster math.
- Know how to perform a more efficient subtraction (difference) between two
rasters using the raster
overlay()
function in R.
-
Raster 04
- Know how to identify a single vs. a multi-band raster file.
- Be able to import multi-band rasters into
R
using theraster
package. - Be able to plot multi-band color image rasters in
R
usingplotRGB
. - Understand what a
NoData
value is in a raster.
-
Raster 05
- Understand the format of a time series raster dataset.
- Know how to work with time series rasters.
- Be able to efficiently import a set of rasters stored in a single directory.
- Be able to plot and explore time series raster data using the
plot()
function inR
.
-
Raster 06
- Be able to assign custom names to bands in a RasterStack for prettier plotting.
- Understand advanced plotting of rasters using the
rasterVis
package andlevelplot
.
-
Raster 07
- Be able to extract summary pixel values from a raster.
- Know how to save summary values to a .csv file.
- Be able to plot summary pixel values using
ggplot()
. - Have experience comparing NDVI values between two different sites.
Things You’ll Need To Complete This Series
Setup RStudio
To complete the tutorial series you will need an updated version of R
and,
preferably, RStudio installed on your computer.
R
is a programming language that specializes in statistical computing. It is a
powerful tool for exploratory data analysis. To interact with R
, we strongly
recommend
RStudio,
an interactive development environment (IDE).
Install R Packages
You can chose to install packages with each lesson or you can download all
of the necessary R
Packages now.
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
rasterVis:
install.packages("rasterVis")
-
ggplot2:
install.packages("ggplot2")
-
More on Packages in R – Adapted from Software Carpentry.
Working with Raster Data in R Tutorial Series: This tutorial series is part of a
Data Carpentry workshop
on using spatio-temporal in R.
Other related series include:
intro to spatio-temporal data and data management,
working with vector data in R,
and
working with tabular time series data in R.
Raster 00: Intro to Raster Data in R
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
In this tutorial, we will review the fundamental principles, packages and metadata/raster attributes that are needed to work with raster data in R. We discuss the three core metadata elements that we need to understand to work with rasters in R: CRS, extent and resolution. It also explores missing and bad data values as stored in a raster and how R handles these elements. Finally, it introduces the GeoTiff file format.
Learning Objectives
After completing this tutorial, you will be able to:
- Understand what a raster dataset is and its fundamental attributes.
- Know how to explore raster attributes in R.
- Be able to import rasters into R using the
raster
package. - Be able to quickly plot a raster file in R.
- Understand the difference between single- and multi-band rasters.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
More on Packages in R – Adapted from Software Carpentry.
Download Data
NEON Teaching Data Subset: Airborne Remote Sensing Data
The LiDAR and imagery data used to create this raster teaching data subset were collected over the National Ecological Observatory Network's Harvard Forest and San Joaquin Experimental Range field sites and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON Data Portal.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Read more about the raster
package in R.
About Raster Data
Raster or "gridded" data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth's surface.

Raster Data in R
Let's first import a raster dataset into R and explore its metadata.
To open rasters in R, we will use the raster
and rgdal
packages.
# load libraries
library(raster)
library(rgdal)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment
# be sure that the downloaded file is in this directory
setwd(wd)
Open a Raster in R
We can use the raster("path-to-raster-here")
function to open a raster in R.
# Load raster into R
DSM_HARV <- raster(paste0(wd, "NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif"))
# View raster structure
DSM_HARV
## class : RasterLayer
## dimensions : 1367, 1697, 2319799 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : 731453, 733150, 4712471, 4713838 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif
## names : HARV_dsmCrop
## values : 305.07, 416.07 (min, max)
# plot raster
# note \n in the title forces a line break in the title
plot(DSM_HARV,
main="NEON Digital Surface Model\nHarvard Forest")
Types of Data Stored in Raster Format
Raster data can be continuous or categorical. Continuous rasters can have a range of quantitative values. Some examples of continuous rasters include:
- Precipitation maps.
- Maps of tree height derived from LiDAR data.
- Elevation values for a region.
The raster we loaded and plotted earlier was a digital surface model, or a map of the elevation for Harvard Forest derived from the NEON AOP LiDAR sensor. Elevation is represented as a continuous numeric variable in this map. The legend shows the continuous range of values in the data from around 300 to 420 meters.
# render DSM for tutorial content background
DSM_HARV <- raster(paste0(wd, "NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif"))
# code output here - DEM rendered on the screen
plot(DSM_HARV, main="Continuous Elevation Map\n NEON Harvard Forest Field Site")
Some rasters contain categorical data where each pixel represents a discrete class such as a landcover type (e.g., "forest" or "grassland") rather than a continuous value such as elevation or temperature. Some examples of classified maps include:
- Landcover/land-use maps.
- Tree height maps classified as short, medium, tall trees.
- Elevation maps classified as low, medium and high elevation.
Categorical Landcover Map for the United States

Categorical Elevation Map of the NEON Harvard Forest Site
The legend of this map shows the colors representing each discrete class.
# Demonstration image for the tutorial
# add a color map with 5 colors
col=terrain.colors(3)
# add breaks to the colormap (4 breaks = 3 segments)
brk <- c(250,350, 380,500)
# Expand right side of clipping rect to make room for the legend
par(xpd = FALSE,mar=c(5.1, 4.1, 4.1, 4.5))
# DEM with a custom legend
plot(DSM_HARV,
col=col,
breaks=brk,
main="Classified Elevation Map\nNEON Harvard Forest Field Site",
legend = FALSE
)
# turn xpd back on to force the legend to fit next to the plot.
par(xpd = TRUE)
# add a legend - but make it appear outside of the plot
legend( par()$usr[2], 4713700,
legend = c("High Elevation", "Middle","Low Elevation"),
fill = rev(col))
What is a GeoTIFF??
Raster data can come in many different formats. In this tutorial, we will use the
geotiff format which has the extension .tif
. A .tif
file stores metadata
or attributes about the file as embedded tif tags
. For instance, your camera
might
store a tag that describes the make and model of the camera or the date the
photo was taken when it saves a .tif
. A GeoTIFF is a standard .tif
image
format with additional spatial (georeferencing) information embedded in the file
as tags. These tags can include the following raster metadata:
- A Coordinate Reference System (
CRS
) - Spatial Extent (
extent
) - Values that represent missing data (
NoDataValue
) - The
resolution
of the data
In this tutorial we will discuss all of these metadata tags.
More about the .tif
format:
Coordinate Reference System
The Coordinate Reference System or CRS
tells R where the raster is located
in geographic space. It also tells R what method should be used to "flatten"
or project the raster in geographic space.

What Makes Spatial Data Line Up On A Map?
There are lots of great resources that describe coordinate reference systems and projections in greater detail (read more, below). For the purposes of this activity, what is important to understand is that data from the same location but saved in different projections will not line up in any GIS or other program. Thus, it's important when working with spatial data in a program like R to identify the coordinate reference system applied to the data and retain it throughout data processing and analysis.
Read More:
- A comprehensive online library of CRS information.
- QGIS Documentation - CRS Overview.
- Choosing the Right Map Projection.
- NCEAS Overview of CRS in R.
How Map Projections Can Fool the Eye
Check out this short video, from Buzzfeed, highlighting how map projections can make continents seems proportionally larger or smaller than they actually are!
View Raster Coordinate Reference System (CRS) in R
We can view the CRS
string associated with our R object using thecrs()
method. We can assign this string to an R object, too.
# view resolution units
crs(DSM_HARV)
## CRS arguments:
## +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
# assign crs to an object (class) to use for reprojection and other tasks
myCRS <- crs(DSM_HARV)
myCRS
## CRS arguments:
## +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
The CRS
of our DSM_HARV
object tells us that our data are in the UTM
projection.

The CRS in this case is in a PROJ
format. This means that the projection
information is strung together as a series of text elements, each of which
begins with a +
sign.
+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
We'll focus on the first few components of the CRS in this tutorial.
-
+proj=utm
The projection of the dataset. Our data are in Universal Transverse Mercator (UTM). -
+zone=18
The UTM projection divides up the world into zones, this element tells you which zone the data are in. Harvard Forest is in Zone 18. -
+datum=WGS84
The datum was used to define the center point of the projection. Our raster uses theWGS84
datum. -
+units=m
This is the horizontal units that the data are in. Our units are meters.
Extent
The spatial extent is the geographic area that the raster data covers.

The spatial extent of an R spatial object represents the geographic "edge" or
location that is the furthest north, south, east and west. In other words, extent
represents the overall geographic coverage of the spatial object.
Resolution
A raster has horizontal (x and y) resolution. This resolution represents the area on the ground that each pixel covers. The units for our data are in meters. Given our data resolution is 1 x 1, this means that each pixel represents a 1 x 1 meter area on the ground.

The best way to view resolution units is to look at the
coordinate reference system string crs()
. Notice our data contains: +units=m
.
crs(DSM_HARV)
## CRS arguments:
## +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
Calculate Raster Min and Max Values
It is useful to know the minimum or maximum values of a raster dataset. In this case, given we are working with elevation data, these values represent the min/max elevation range at our site.
Raster statistics are often calculated and embedded in a geotiff
for us.
However if they weren't already calculated, we can calculate them using the
setMinMax()
function.
# This is the code if min/max weren't calculated:
# DSM_HARV <- setMinMax(DSM_HARV)
# view the calculated min value
minValue(DSM_HARV)
## [1] 305.07
# view only max value
maxValue(DSM_HARV)
## [1] 416.07
We can see that the elevation at our site ranges from 305.07m to 416.07m.
NoData Values in Rasters
Raster data often has a NoDataValue
associated with it. This is a value
assigned to pixels where data are missing or no data were collected.
By default the shape of a raster is always square or rectangular. So if we
have a dataset that has a shape that isn't square or rectangular, some pixels
at the edge of the raster will have NoDataValue
s. This often happens when the
data were collected by an airplane which only flew over some part of a defined
region.
In the image below, the pixels that are black have NoDataValue
s.
The camera did not collect data in these areas.
# no data demonstration code - not being taught
# Use stack function to read in all bands
RGB_stack <-
stack(paste0(wd, "NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif"))
# Create an RGB image from the raster stack
par(col.axis="white",col.lab="white",tck=0)
plotRGB(RGB_stack, r = 1, g = 2, b = 3,
axes=TRUE, main="Raster With NoData Values\nRendered in Black")
In the next image, the black edges have been assigned NoDataValue
. R doesn't
render pixels that contain a specified NoDataValue
. R assigns missing data
with the NoDataValue
as NA
.
# reassign cells with 0,0,0 to NA
# this is simply demonstration code - we will not teach this.
func <- function(x) {
x[rowSums(x == 0) == 3, ] <- NA
x}
newRGBImage <- calc(RGB_stack, func)
par(col.axis="white",col.lab="white",tck=0)
# Create an RGB image from the raster stack
plotRGB(newRGBImage, r = 1, g = 2, b = 3,
axes=TRUE, main="Raster With No Data Values\nNoDataValue= NA")
NoData Value Standard
The assigned NoDataValue
varies across disciplines; -9999
is a common value
used in both the remote sensing field and the atmospheric fields. It is also
the standard used by the
National Ecological Observatory Network (NEON).
If we are lucky, our GeoTIFF file has a tag that tells us what is the
NoDataValue
. If we are less lucky, we can find that information in the
raster's metadata. If a NoDataValue
was stored in the GeoTIFF tag, when R
opens up the raster, it will assign each instance of the value to NA
. Values
of NA
will be ignored by R as demonstrated above.
Bad Data Values in Rasters
Bad data values are different from NoDataValue
s. Bad data values are values
that fall outside of the applicable range of a dataset.
Examples of Bad Data Values:
- The normalized difference vegetation index (NDVI), which is a measure of greenness, has a valid range of -1 to 1. Any value outside of that range would be considered a "bad" or miscalculated value.
- Reflectance data in an image will often range from 0-1 or 0-10,000 depending upon how the data are scaled. Thus a value greater than 1 or greater than 10,000 is likely caused by an error in either data collection or processing.
Find Bad Data Values
Sometimes a raster's metadata will tell us the range of expected values for a raster. Values outside of this range are suspect and we need to consider than when we analyze the data. Sometimes, we need to use some common sense and scientific insight as we examine the data - just as we would for field data to identify questionable values.
Create A Histogram of Raster Values
We can explore the distribution of values contained within our raster using the
hist()
function which produces a histogram. Histograms are often useful in
identifying outliers and bad data values in our raster data.
# view histogram of data
hist(DSM_HARV,
main="Distribution of Digital Surface Model Values\n Histogram Default: 100,000 pixels\n NEON Harvard Forest",
xlab="DSM Elevation Value (m)",
ylab="Frequency",
col="wheat")
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot =
## plot, ...): 4% of the raster cells were used. 100000 values used.
Notice that a warning is shown when R creates the histogram.
Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...): 4% of the raster cells were used. 100000 values used.
This warning is caused by the default maximum pixels value of 100,000 associated
with the hist
function. This maximum value is to ensure processing efficiency
as our data become larger!
- More on histograms in R from R-bloggers
We can define the max pixels to ensure that all pixel values are included in the histogram. USE THIS WITH CAUTION as forcing R to plot all pixel values in a histogram can be problematic when dealing with very large datasets.
# View the total number of pixels (cells) in is our raster
ncell(DSM_HARV)
## [1] 2319799
# create histogram that includes with all pixel values in the raster
hist(DSM_HARV,
maxpixels=ncell(DSM_HARV),
main="Distribution of DSM Values\n All Pixel Values Included\n NEON Harvard Forest Field Site",
xlab="DSM Elevation Value (m)",
ylab="Frequency",
col="wheat4")
Note that the shape of both histograms looks similar to the previous one that
was created using a representative 10,000 pixel subset of our raster data. The
distribution of elevation values for our Digital Surface Model (DSM)
looks
reasonable. It is likely there are no bad data values in this particular raster.
Raster Bands
The Digital Surface Model object (DSM_HARV
) that we've been working with
is a single band raster. This means that there is only one dataset stored in
the raster: surface elevation in meters for one time period.

A raster dataset can contain one or more bands. We can use the raster()
function
to import one single band from a single OR multi-band raster. We can view the number
of bands in a raster using the nlayers()
function.
# view number of bands
nlayers(DSM_HARV)
## [1] 1
However, raster data can also be multi-band meaning that one raster file
contains data for more than one variable or time period for each cell. By
default the raster()
function only imports the first band in a raster
regardless of whether it has one or more bands. Jump to the fourth tutorial in
this series for a tutorial on multi-band rasters:
Work with Multi-band Rasters: Images in R.
View Raster File Attributes
Remember that a GeoTIFF
contains a set of embedded tags that contain
metadata about the raster. So far, we've explored raster metadata after
importing it in R. However, we can use the GDALinfo("path-to-raster-here")
function to view raster metadata before we open a file in R.
# view attributes before opening file
GDALinfo(paste0(wd, "NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif"))
## rows 1367
## columns 1697
## bands 1
## lower left origin.x 731453
## lower left origin.y 4712471
## res.x 1
## res.y 1
## ysign -1
## oblique.x 0
## oblique.y 0
## driver GTiff
## projection +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## file ~/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif
## apparent band summary:
## GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float64 TRUE -9999 1 1697
## apparent band statistics:
## Bmin Bmax Bmean Bsd
## 1 305.07 416.07 359.8531 17.83169
## Metadata:
## AREA_OR_POINT=Area
Notice a few things in the output:
- A projection is described using a string in the
proj4
format :+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
- We can identify a
NoDataValue
: -9999 - We can tell how many
bands
the file contains: 1 - We can view the x and y
resolution
of the data: 1 - We can see the min and max values of the data:
Bmin
andBmax
.
It is ideal to use GDALinfo
to explore your file before reading it into
R.
Challenge: Explore Raster Metadata
Without using the raster
function to read the file into R, determine the
following about the NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_DSMhill.tif
file:
- Does this file has the same
CRS
asDSM_HARV
? - What is the
NoDataValue
? - What is resolution of the raster data?
- How large would a 5x5 pixel area would be on the Earth's surface?
- Is the file a multi- or single-band raster?
Notice: this file is a hillshade
. We will learn about hillshades in
Work with Multi-band Rasters: Images in R.
Get Lesson Code
Raster 01: Plot Raster Data in R
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
This tutorial reviews how to plot a raster in R using the plot()
function. It also covers how to layer a raster on top of a hillshade to produce
an eloquent map.
Learning Objectives
After completing this tutorial, you will be able to:
- Know how to plot a single band raster in R.
- Know how to layer a raster dataset on top of a hillshade to create an elegant basemap.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
More on Packages in R – Adapted from Software Carpentry.
Download Data
NEON Teaching Data Subset: Airborne Remote Sensing Data
The LiDAR and imagery data used to create this raster teaching data subset were collected over the National Ecological Observatory Network's Harvard Forest and San Joaquin Experimental Range field sites and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON Data Portal.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Additional Resources
Plot Raster Data in R
In this tutorial, we will plot the Digital Surface Model (DSM) raster
for the NEON Harvard Forest Field Site. We will use the hist()
function as a
tool to explore raster values. And render categorical plots, using the breaks
argument to get bins that are meaningful representations of our data.
We will use the raster
and rgdal
packages in this tutorial. If you do not
have the DSM_HARV
object from the
Intro To Raster In R tutorial,
please create it now.
# if they are not already loaded
library(rgdal)
library(raster)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment environment
# be sure that the downloaded file is in this directory
setwd(wd)
# import raster
DSM_HARV <- raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif"))
First, let's plot our Digital Surface Model object (DSM_HARV
) using the
plot()
function. We add a title using the argument main="title"
.
# Plot raster object
plot(DSM_HARV,
main="Digital Surface Model\nNEON Harvard Forest Field Site")
Plotting Data Using Breaks
We can view our data "symbolized" or colored according to ranges of values
rather than using a continuous color ramp. This is comparable to a "classified"
map. However, to assign breaks, it is useful to first explore the distribution
of the data using a histogram. The breaks
argument in the hist()
function
tells R to use fewer or more breaks or bins.
If we name the histogram, we can also view counts for each bin and assigned break values.
# Plot distribution of raster values
DSMhist<-hist(DSM_HARV,
breaks=3,
main="Histogram Digital Surface Model\n NEON Harvard Forest Field Site",
col="wheat3", # changes bin color
xlab= "Elevation (m)") # label the x-axis
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot =
## plot, ...): 4% of the raster cells were used. 100000 values used.
# Where are breaks and how many pixels in each category?
DSMhist$breaks
## [1] 300 350 400 450
DSMhist$counts
## [1] 32124 67392 484
Warning message!? Remember, the default for the histogram is to include only a subset of 100,000 values. We could force it to show all the pixel values or we can use the histogram as is and figure that the sample of 100,000 values represents our data well.
Looking at our histogram, R has binned out the data as follows:
- 300-350m, 350-400m, 400-450m
We can determine that most of the pixel values fall in the 350-400m range with a few pixels falling in the lower and higher range. We could specify different breaks, if we wished to have a different distribution of pixels in each bin.
We can use those bins to plot our raster data. We will use the
terrain.colors()
function to create a palette of 3 colors to use in our plot.
The breaks
argument allows us to add breaks. To specify where the breaks
occur, we use the following syntax: breaks=c(value1,value2,value3)
.
We can include as few or many breaks as we'd like.
# plot using breaks.
plot(DSM_HARV,
breaks = c(300, 350, 400, 450),
col = terrain.colors(3),
main="Digital Surface Model (DSM)\n NEON Harvard Forest Field Site")
Format Plot
If we need to create multiple plots using the same color palette, we can create
an R object (myCol
) for the set of colors that we want to use. We can then
quickly change the palette across all plots by simply modifying the myCol
object.
We can label the x- and y-axes of our plot too using xlab
and ylab
.
# Assign color to a object for repeat use/ ease of changing
myCol = terrain.colors(3)
# Add axis labels
plot(DSM_HARV,
breaks = c(300, 350, 400, 450),
col = myCol,
main="Digital Surface Model\nNEON Harvard Forest Field Site",
xlab = "UTM Westing Coordinate (m)",
ylab = "UTM Northing Coordinate (m)")
Or we can also turn off the axes altogether.
# or we can turn off the axis altogether
plot(DSM_HARV,
breaks = c(300, 350, 400, 450),
col = myCol,
main="Digital Surface Model\n NEON Harvard Forest Field Site",
axes=FALSE)
Create a plot of the Harvard Forest Digital Surface Model (DSM) that has:
- Six classified ranges of values (break points) that are evenly divided among the range of pixel values.
- Axis labels
- A plot title
Layering Rasters
We can layer a raster on top of a hillshade raster for the same area, and use a transparency factor to created a 3-dimensional shaded effect. A hillshade is a raster that maps the shadows and texture that you would see from above when viewing terrain.
# import DSM hillshade
DSM_hill_HARV <-
raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_DSMhill.tif"))
# plot hillshade using a grayscale color ramp that looks like shadows.
plot(DSM_hill_HARV,
col=grey(1:100/100), # create a color ramp of grey colors
legend=FALSE,
main="Hillshade - DSM\n NEON Harvard Forest Field Site",
axes=FALSE)
We can layer another raster on top of our hillshade using by using add=TRUE
.
Let's overlay DSM_HARV
on top of the hill_HARV
.
# plot hillshade using a grayscale color ramp that looks like shadows.
plot(DSM_hill_HARV,
col=grey(1:100/100), #create a color ramp of grey colors
legend=F,
main="DSM with Hillshade \n NEON Harvard Forest Field Site",
axes=FALSE)
# add the DSM on top of the hillshade
plot(DSM_HARV,
col=rainbow(100),
alpha=0.4,
add=T,
legend=F)
The alpha value determines how transparent the colors will be (0 being
transparent, 1 being opaque). Note that here we used the color palette
rainbow()
instead of terrain.color()
.
- More information in the R color palettes documentation.
Make sure to:
- include hillshade in the maps,
- label axes on the DSM map and exclude them from the DTM map,
- a title for the maps,
- experiment with various alpha values and color palettes to represent the data.
Get Lesson Code
Raster 02: When Rasters Don't Line Up - Reproject Raster Data in R
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
Sometimes we encounter raster datasets that do not "line up" when plotted or analyzed. Rasters that don't line up are most often in different Coordinate Reference Systems (CRS).
This tutorial explains how to deal with rasters in different, known CRSs. It
will walk though reprojecting rasters in R using the projectRaster()
function in the raster
package.
Learning Objectives
After completing this tutorial, you will be able to:
- Be able to reproject a raster in R.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
More on Packages in R – Adapted from Software Carpentry.
Data to Download
NEON Teaching Data Subset: Airborne Remote Sensing Data
The LiDAR and imagery data used to create this raster teaching data subset were collected over the National Ecological Observatory Network's Harvard Forest and San Joaquin Experimental Range field sites and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON Data Portal.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Additional Resources
Read more about the raster
package in R.
Raster Projection in R
In the Plot Raster Data in R tutorial, we learned how to layer a raster file on top of a hillshade for a nice looking basemap. In this tutorial, all of our data were in the same CRS. What happens when things don't line up?
We will use the raster
and rgdal
packages in this tutorial.
# load raster package
library(raster)
library(rgdal)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment
# be sure that the downloaded file is in this directory
setwd(wd)
Let's create a map of the Harvard Forest Digital Terrain Model
(DTM_HARV
) draped or layered on top of the hillshade (DTM_hill_HARV
).
# import DTM
DTM_HARV <- raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_dtmCrop.tif"))
# import DTM hillshade
DTM_hill_HARV <-
raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_DTMhill_WGS84.tif"))
# plot hillshade using a grayscale color ramp
plot(DTM_hill_HARV,
col=grey(1:100/100),
legend=FALSE,
main="DTM Hillshade\n NEON Harvard Forest Field Site")
# overlay the DTM on top of the hillshade
plot(DTM_HARV,
col=terrain.colors(10),
alpha=0.4,
add=TRUE,
legend=FALSE)
Our results are curious - the Digital Terrain Model (DTM_HARV
) did not plot on
top of our hillshade. The hillshade plotted just fine on it's own. Let's try to
plot the DTM on it's own to make sure there are data there.
# Plot DTM
plot(DTM_HARV,
col=terrain.colors(10),
alpha=1,
legend=F,
main="Digital Terrain Model\n NEON Harvard Forest Field Site")
Our DTM seems to contain data and plots just fine. Let's next check the Coordinate Reference System (CRS) and compare it to our hillshade.
# view crs for DTM
crs(DTM_HARV)
## CRS arguments:
## +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
# view crs for hillshade
crs(DTM_hill_HARV)
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
Aha! DTM_HARV
is in the UTM projection. DTM_hill_HARV
is in
Geographic WGS84
- which is represented by latitude and longitude values.
Because the two rasters are in different CRSs, they don't line up when plotted
in R. We need to reproject DTM_hill_HARV
into the UTM CRS. Alternatively,
we could project DTM_HARV
into WGS84.
Reproject Rasters
We can use the projectRaster
function to reproject a raster into a new CRS.
Keep in mind that reprojection only works when you first have a defined CRS
for the raster object that you want to reproject. It cannot be used if no
CRS is defined. Lucky for us, the DTM_hill_HARV
has a defined CRS.
To use the projectRaster
function, we need to define two things:
- the object we want to reproject and
- the CRS that we want to reproject it to.
The syntax is projectRaster(RasterObject,crs=CRSToReprojectTo)
We want the CRS of our hillshade to match the DTM_HARV
raster. We can thus
assign the CRS of our DTM_HARV
to our hillshade within the projectRaster()
function as follows: crs=crs(DTM_HARV)
.
# reproject to UTM
DTM_hill_UTMZ18N_HARV <- projectRaster(DTM_hill_HARV,
crs=crs(DTM_HARV))
# compare attributes of DTM_hill_UTMZ18N to DTM_hill
crs(DTM_hill_UTMZ18N_HARV)
## CRS arguments:
## +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
crs(DTM_hill_HARV)
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
# compare attributes of DTM_hill_UTMZ18N to DTM_hill
extent(DTM_hill_UTMZ18N_HARV)
## class : Extent
## xmin : 731397.3
## xmax : 733205.3
## ymin : 4712403
## ymax : 4713907
extent(DTM_hill_HARV)
## class : Extent
## xmin : -72.18192
## xmax : -72.16061
## ymin : 42.52941
## ymax : 42.54234
Notice in the output above that the crs()
of DTM_hill_UTMZ18N_HARV
is now
UTM. However, the extent values of DTM_hillUTMZ18N_HARV
are different from
DTM_hill_HARV
.
Deal with Raster Resolution
Let's next have a look at the resolution of our reprojected hillshade.
# compare resolution
res(DTM_hill_UTMZ18N_HARV)
## [1] 1.000 0.998
The output resolution of DTM_hill_UTMZ18N_HARV
is 1 x 0.998. Yet, we know that
the resolution for the data should be 1m x 1m. We can tell R to force our
newly reprojected raster to be 1m x 1m resolution by adding a line of code
(res=
).
# adjust the resolution
DTM_hill_UTMZ18N_HARV <- projectRaster(DTM_hill_HARV,
crs=crs(DTM_HARV),
res=1)
# view resolution
res(DTM_hill_UTMZ18N_HARV)
## [1] 1 1
Let's plot our newly reprojected raster.
# plot newly reprojected hillshade
plot(DTM_hill_UTMZ18N_HARV,
col=grey(1:100/100),
legend=F,
main="DTM with Hillshade\n NEON Harvard Forest Field Site")
# overlay the DTM on top of the hillshade
plot(DTM_HARV,
col=terrain.colors(100),
alpha=0.4,
add=T,
legend=F)
We have now successfully draped the Digital Terrain Model on top of our hillshade to produce a nice looking, textured map!
Reproject the data as necessary to make things line up!
## Warning in projectRaster(DTM_hill_SJER, crs = crs(DTM_SJER), res = 1):
## input and ouput crs are the same
Get Lesson Code
Raster 03: Raster Calculations in R - Subtract One Raster from Another and Extract Pixel Values For Defined Locations
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
We often want to combine values of and perform calculations on rasters to create
a new output raster. This tutorial covers how to subtract one raster from
another using basic raster math and the overlay()
function. It also covers how
to extract pixel values from a set of locations - for example a buffer region
around plot locations at a field site.
Learning Objectives
After completing this tutorial, you will be able to:
- Be able to to perform a subtraction (difference) between two rasters using raster math.
- Know how to perform a more efficient subtraction (difference) between two
rasters using the raster
overlay()
function in R.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
More on Packages in R – Adapted from Software Carpentry.
Data to Download
NEON Teaching Data Subset: Airborne Remote Sensing Data
The LiDAR and imagery data used to create this raster teaching data subset were collected over the National Ecological Observatory Network's Harvard Forest and San Joaquin Experimental Range field sites and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON Data Portal.
TEST BUTTON CSS
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Additional Resources
Raster Calculations in R
We often want to perform calculations on two or more rasters to create a new output raster. For example, if we are interested in mapping the heights of trees across an entire field site, we might want to calculate the difference between the Digital Surface Model (DSM, tops of trees) and the Digital Terrain Model (DTM, ground level). The resulting dataset is referred to as a Canopy Height Model (CHM) and represents the actual height of trees, buildings, etc. with the influence of ground elevation removed.

- Check out more on LiDAR CHM, DTM and DSM in this NEON Data Skills overview tutorial:
Load the Data
We will need the raster
package to import and perform raster calculations. We
will use the DTM (NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_dtmCrop.tif
)
and DSM (NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif
) from the
NEON Harvard Forest Field site.
# load raster package
library(raster)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment environment
# be sure that the downloaded file is in this directory
setwd(wd)
# view info about the dtm & dsm raster data that we will work with.
GDALinfo(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_dtmCrop.tif"))
## rows 1367
## columns 1697
## bands 1
## lower left origin.x 731453
## lower left origin.y 4712471
## res.x 1
## res.y 1
## ysign -1
## oblique.x 0
## oblique.y 0
## driver GTiff
## projection +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## file ~/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_dtmCrop.tif
## apparent band summary:
## GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float64 TRUE -9999 1 1697
## apparent band statistics:
## Bmin Bmax Bmean Bsd
## 1 304.56 389.82 344.8979 15.86147
## Metadata:
## AREA_OR_POINT=Area
GDALinfo(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif"))
## rows 1367
## columns 1697
## bands 1
## lower left origin.x 731453
## lower left origin.y 4712471
## res.x 1
## res.y 1
## ysign -1
## oblique.x 0
## oblique.y 0
## driver GTiff
## projection +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## file ~/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif
## apparent band summary:
## GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float64 TRUE -9999 1 1697
## apparent band statistics:
## Bmin Bmax Bmean Bsd
## 1 305.07 416.07 359.8531 17.83169
## Metadata:
## AREA_OR_POINT=Area
As seen from the geoTiff
tags, both rasters have:
- the same CRS,
- the same resolution
- defined minimum and maximum values.
Let's load the data.
# load the DTM & DSM rasters
DTM_HARV <- raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_dtmCrop.tif"))
DSM_HARV <- raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif"))
# create a quick plot of each to see what we're dealing with
plot(DTM_HARV,
main="Digital Terrain Model \n NEON Harvard Forest Field Site")
plot(DSM_HARV,
main="Digital Surface Model \n NEON Harvard Forest Field Site")
Two Ways to Perform Raster Calculations
We can calculate the difference between two rasters in two different ways:
- by directly subtracting the two rasters in R using raster math
or for more efficient processing - particularly if our rasters are large and/or the calculations we are performing are complex:
- using the
overlay()
function.
Raster Math & Canopy Height Models
We can perform raster calculations by simply subtracting (or adding, multiplying, etc) two rasters. In the geospatial world, we call this "raster math".
Let's subtract the DTM from the DSM to create a Canopy Height Model.
# Raster math example
CHM_HARV <- DSM_HARV - DTM_HARV
# plot the output CHM
plot(CHM_HARV,
main="Canopy Height Model - Raster Math Subtract\n NEON Harvard Forest Field Site",
axes=FALSE)
Let's have a look at the distribution of values in our newly created Canopy Height Model (CHM).
# histogram of CHM_HARV
hist(CHM_HARV,
col="springgreen4",
main="Histogram of Canopy Height Model\nNEON Harvard Forest Field Site",
ylab="Number of Pixels",
xlab="Tree Height (m) ")
Notice that the range of values for the output CHM is between 0 and 30 meters. Does this make sense for trees in Harvard Forest?
- What is the min and maximum value for the Harvard Forest Canopy
Height Model (
CHM_HARV
) that we just created? - What are two ways you can check this range of data in
CHM_HARV
? - What is the distribution of all the pixel values in the CHM?
- Plot a histogram with 8 bins instead of the default and change the color of the histogram.
- Plot the
CHM_HARV
raster using breaks that make sense for the data. Include a appropriate color palette for the data, plot title and no axes ticks / labels.
Overlay Function: Efficient Raster Calculations
Raster math, like we just did, is an appropriate approach to raster calculations if:
- The rasters we are using are small in size.
- The calculations we are performing are simple.
However, raster math is a less efficient approach as computation becomes more
complex or as file sizes become large.
The overlay()
function is more efficient when:
- The rasters we are using are larger in size.
- The rasters are stored as individual files.
- The computations performed are complex.
The overlay()
function takes two or more rasters and applies a function to
them using efficient processing methods. The syntax is
outputRaster <- overlay(raster1, raster2, fun=functionName)
Let's perform the same subtraction calculation that we calculated above using
raster math, using the overlay()
function.
CHM_ov_HARV<- overlay(DSM_HARV,
DTM_HARV,
fun=function(r1, r2){return(r1-r2)})
plot(CHM_ov_HARV,
main="Canopy Height Model - Overlay Subtract\n NEON Harvard Forest Field Site")
How do the plots of the CHM created with manual raster math and the overlay()
function compare?
Export a GeoTIFF
Now that we've created a new raster, let's export the data as a GeoTIFF
using
the writeRaster()
function.
When we write this raster object to a GeoTIFF
file we'll name it
chm_HARV.tiff
. This name allows us to quickly remember both what the data
contains (CHM data) and for where (HARVard Forest). The writeRaster()
function
by default writes the output file to your working directory unless you specify a
full file path.
# export CHM object to new GeotIFF
writeRaster(CHM_ov_HARV, paste0(wd,"chm_HARV.tiff"),
format="GTiff", # specify output format - GeoTIFF
overwrite=TRUE, # CAUTION: if this is true, it will overwrite an
# existing file
NAflag=-9999) # set no data value to -9999
writeRaster Options
The function arguments that we used above include:
-
format: specify that the format will be
GTiff
orGeoTiff
. - overwrite: If TRUE, R will overwrite any existing file with the same name in the specified directory. USE THIS SETTING WITH CAUTION!
-
NAflag: set the
geotiff
tag forNoDataValue
to -9999, the National Ecological Observatory Network's (NEON) standardNoDataValue
.
Data are often more interesting and powerful when we compare them across various locations. Let's compare some data collected over Harvard Forest to data collected in Southern California. The NEON San Joaquin Experimental Range (SJER) field site located in Southern California has a very different ecosystem and climate than the NEON Harvard Forest Field Site in Massachusetts.
Import the SJER DSM and DTM raster files and create a Canopy Height Model.
Then compare the two sites. Be sure to name your R objects and outputs
carefully, as follows: objectType_SJER (e.g. DSM_SJER
). This will help you
keep track of data from different sites!
- Import the DSM and DTM from the SJER directory (if not aready imported
in the
Plot Raster Data in R tutorial.)
Don't forget to examine the data for
CRS
, bad values, etc. - Create a
CHM
from the two raster layers and check to make sure the data are what you expect. - Plot the
CHM
from SJER. - Export the SJER CHM as a
GeoTIFF
. - Compare the vegetation structure of the Harvard Forest and San Joaquin Experimental Range.
Hint: plotting SJER and HARV data side-by-side is an effective way to compare both datasets!
What do these two histograms tell us about the vegetation structure at Harvard and SJER?
Get Lesson Code
Raster 04: Work With Multi-Band Rasters - Image Data in R
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
This tutorial explores how to import and plot a multi-band raster in
R. It also covers how to plot a three-band color image using the plotRGB()
function in R.
Learning Objectives
After completing this tutorial, you will be able to:
- Know how to identify a single vs. a multi-band raster file.
- Be able to import multi-band rasters into R using the
raster
package. - Be able to plot multi-band color image rasters in R using
plotRGB()
. - Understand what a
NoData
value is in a raster.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
More on Packages in R – Adapted from Software Carpentry.
Data to Download
NEON Teaching Data Subset: Airborne Remote Sensing Data
The LiDAR and imagery data used to create this raster teaching data subset were collected over the National Ecological Observatory Network's Harvard Forest and San Joaquin Experimental Range field sites and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON Data Portal.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
The Basics of Imagery - About Spectral Remote Sensing Data
About Raster Bands in R
As discussed in the Intro to Raster Data tutorial, a raster can contain 1 or more bands.

To work with multi-band rasters in R, we need to change how we import and plot our data in several ways.
- To import multi band raster data we will use the
stack()
function. - If our multi-band data are imagery that we wish to composite, we can use
plotRGB()
(instead ofplot()
) to plot a 3 band raster image.
About Multi-Band Imagery
One type of multi-band raster dataset that is familiar to many of us is a color image. A basic color image consists of three bands: red, green, and blue. Each band represents light reflected from the red, green or blue portions of the electromagnetic spectrum. The pixel brightness for each band, when composited creates the colors that we see in an image.

Getting Started with Multi-Band Data in R
To work with multi-band raster data we will use the raster
and rgdal
packages.
# work with raster data
library(raster)
# export GeoTIFFs and other core GIS functions
library(rgdal)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment environment
# be sure that the downloaded file is in this directory
setwd(wd)
In this tutorial, the multi-band data that we are working with is imagery collected using the NEON Airborne Observation Platform high resolution camera over the NEON Harvard Forest field site. Each RGB image is a 3-band raster. The same steps would apply to working with a multi-spectral image with 4 or more bands - like Landsat imagery. We can plot each band of a multi-band image individually.
Or we can composite all three bands together to make a color image.
In a multi-band dataset, the rasters will always have the same extent, CRS and resolution.
If we read a rasterStack
into R using the raster()
function, it only reads
in the first band. We can plot this band using the plot function.
# Read in multi-band raster with raster function.
# Default is the first band only.
RGB_band1_HARV <-
raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif"))
# create a grayscale color palette to use for the image.
grayscale_colors <- gray.colors(100, # number of different color levels
start = 0.0, # how black (0) to go
end = 1.0, # how white (1) to go
gamma = 2.2, # correction between how a digital
# camera sees the world and how human eyes see it
alpha = NULL) #Null=colors are not transparent
# Plot band 1
plot(RGB_band1_HARV,
col=grayscale_colors,
axes=FALSE,
main="RGB Imagery - Band 1-Red\nNEON Harvard Forest Field Site")
# view attributes: Check out dimension, CRS, resolution, values attributes, and
# band.
RGB_band1_HARV
## class : RasterLayer
## band : 1 (of 3 bands)
## dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell)
## resolution : 0.25, 0.25 (x, y)
## extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif
## names : HARV_RGB_Ortho
## values : 0, 255 (min, max)
Notice that when we look at the attributes of RGB_Band1, we see :
band: 1 (of 3 bands)
This is R telling us that this particular raster object has more bands (3) associated with it.
Image Raster Data Values
Let's next examine the raster's min and max values. What is the value range?
# view min value
minValue(RGB_band1_HARV)
## [1] 0
# view max value
maxValue(RGB_band1_HARV)
## [1] 255
This raster contains values between 0 and 255. These values represent degrees of brightness associated with the image band. In the case of a RGB image (red, green and blue), band 1 is the red band. When we plot the red band, larger numbers (towards 255) represent pixels with more red in them (a strong red reflection). Smaller numbers (towards 0) represent pixels with less red in them (less red was reflected). To plot an RGB image, we mix red + green + blue values into one single color to create a full color image - similar to the color image a digital camera creates.
Import A Specific Band
We can use the raster()
function to import specific bands in our raster object
by specifying which band we want with band=N
(N represents the band number we
want to work with). To import the green band, we would use band=2
.
# Can specify which band we want to read in
RGB_band2_HARV <-
raster(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif"),
band = 2)
# plot band 2
plot(RGB_band2_HARV,
col=grayscale_colors, # we already created this palette & can use it again
axes=FALSE,
main="RGB Imagery - Band 2- Green\nNEON Harvard Forest Field Site")
# view attributes of band 2
RGB_band2_HARV
## class : RasterLayer
## band : 2 (of 3 bands)
## dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell)
## resolution : 0.25, 0.25 (x, y)
## extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif
## names : HARV_RGB_Ortho
## values : 0, 255 (min, max)
Notice that band 2 is the second of 3 bands band: 2 (of 3 bands)
.
Other Types of Multi-band Raster Data
Multi-band raster data might also contain:
- Time series: the same variable, over the same area, over time. Check out Raster Time Series Data in R tutorial to learn more about time series stacks.
- Multi or hyperspectral imagery: image rasters that have 4 or more (multi-spectral) or more than 10-15 (hyperspectral) bands. Check out the NEON Data Skills Imaging Spectroscopy HDF5 in R tutorial tutorial for more about working with hyperspectral data cubes.
Raster Stacks in R
Next, we will work with all three image bands (red, green and blue) as an R
RasterStack
object. We will then plot a 3-band composite, or full color,
image.
To bring in all bands of a multi-band raster, we use thestack()
function.
# Use stack function to read in all bands
RGB_stack_HARV <-
stack(paste0(wd,"NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif"))
# view attributes of stack object
RGB_stack_HARV
## class : RasterStack
## dimensions : 2317, 3073, 7120141, 3 (nrow, ncol, ncell, nlayers)
## resolution : 0.25, 0.25 (x, y)
## extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## names : HARV_RGB_Ortho.1, HARV_RGB_Ortho.2, HARV_RGB_Ortho.3
## min values : 0, 0, 0
## max values : 255, 255, 255
We can view the attributes of each band the stack using RGB_stack_HARV@layers
.
Or we if we have hundreds of bands, we can specify which band we'd like to view
attributes for using an index value: RGB_stack_HARV[[1]]
. We can also use the
plot()
and hist()
functions on the RasterStack
to plot and view the
distribution of raster band values.
# view raster attributes
RGB_stack_HARV@layers
## [[1]]
## class : RasterLayer
## band : 1 (of 3 bands)
## dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell)
## resolution : 0.25, 0.25 (x, y)
## extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif
## names : HARV_RGB_Ortho.1
## values : 0, 255 (min, max)
##
##
## [[2]]
## class : RasterLayer
## band : 2 (of 3 bands)
## dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell)
## resolution : 0.25, 0.25 (x, y)
## extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif
## names : HARV_RGB_Ortho.2
## values : 0, 255 (min, max)
##
##
## [[3]]
## class : RasterLayer
## band : 3 (of 3 bands)
## dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell)
## resolution : 0.25, 0.25 (x, y)
## extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif
## names : HARV_RGB_Ortho.3
## values : 0, 255 (min, max)
# view attributes for one band
RGB_stack_HARV[[1]]
## class : RasterLayer
## band : 1 (of 3 bands)
## dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell)
## resolution : 0.25, 0.25 (x, y)
## extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif
## names : HARV_RGB_Ortho.1
## values : 0, 255 (min, max)
# view histogram of all 3 bands
hist(RGB_stack_HARV,
maxpixels=ncell(RGB_stack_HARV))
# plot all three bands separately
plot(RGB_stack_HARV,
col=grayscale_colors)
# revert to a single plot layout
par(mfrow=c(1,1))
# plot band 2
plot(RGB_stack_HARV[[2]],
main="Band 2\n NEON Harvard Forest Field Site",
col=grayscale_colors)
Create A Three Band Image
To render a final 3 band, color image in R, we use plotRGB()
.
This function allows us to:
- Identify what bands we want to render in the red, green and blue regions. The
plotRGB()
function defaults to a 1=red, 2=green, and 3=blue band order. However, you can define what bands you'd like to plot manually. Manual definition of bands is useful if you have, for example a near-infrared band and want to create a color infrared image. - Adjust the
stretch
of the image to increase or decrease contrast.
Let's plot our 3-band image.
# Create an RGB image from the raster stack
plotRGB(RGB_stack_HARV,
r = 1, g = 2, b = 3)
The image above looks pretty good. We can explore whether applying a stretch to
the image might improve clarity and contrast using stretch="lin"
or
stretch="hist"
.


# what does stretch do?
plotRGB(RGB_stack_HARV,
r = 1, g = 2, b = 3,
scale=800,
stretch = "lin")
plotRGB(RGB_stack_HARV,
r = 1, g = 2, b = 3,
scale=800,
stretch = "hist")
In this case, the stretch doesn't enhance the contrast our image significantly given the distribution of reflectance (or brightness) values is distributed well between 0 and 255.
- View the files attributes. Are there
NoData
values assigned for this file? - If so, what is the
NoData
Value? - How many bands does it have?
- Open the multi-band raster file in R.
- Plot the object as a true color image.
- What happened to the black edges in the data?
- What does this tell us about the difference in the data structure between
HARV_Ortho_wNA.tif
andHARV_RGB_Ortho.tif
(R objectRGB_stack
). How can you check?
Answer the questions above using the functions we have covered so far in this tutorial.
RasterStack vs RasterBrick in R
The R RasterStack
and RasterBrick
object types can both store multiple bands.
However, how they store each band is different. The bands in a RasterStack
are
stored as links to raster data that is located somewhere on our computer. A
RasterBrick
contains all of the objects stored within the actual R object.
In most cases, we can work with a RasterBrick
in the same way we might work
with a RasterStack
. However a RasterBrick
is often more efficient and faster
to process - which is important when working with larger files.
We can turn a RasterStack
into a RasterBrick
in R by using
brick(StackName)
. Let's use the object.size()
function to compare stack
and brick
R objects.
# view size of the RGB_stack object that contains our 3 band image
object.size(RGB_stack_HARV)
## 50104 bytes
# convert stack to a brick
RGB_brick_HARV <- brick(RGB_stack_HARV)
# view size of the brick
object.size(RGB_brick_HARV)
## 170898632 bytes
Notice that in the RasterBrick
, all of the bands are stored within the actual
object. Thus, the RasterBrick
object size is much larger than the
RasterStack
object.
You use plotRGB
to block a RasterBrick
too.
# plot brick
plotRGB(RGB_brick_HARV)
- What methods can be used to call on the
RGB_stack_HARV
object? - What methods are available for a single band within
RGB_stack_HARV
? - Why do you think there is a difference?
## Warning in .S3methods(generic.function, class, envir): 'class' is of
## length > 1; only the first element will be used
Get Lesson Code
Raster 05: Raster Time Series Data in R
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
This tutorial covers how to work with and plot a raster time series, using an
R RasterStack
object. It also covers practical assessment of data quality in
remote sensing derived imagery.
Learning Objectives
After completing this tutorial, you will be able to:
- Understand the format of a time series raster dataset.
- Know how to work with time series rasters.
- Be able to efficiently import a set of rasters stored in a single directory.
- Be able to plot and explore time series raster data using the
plot()
function in R.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
More on Packages in R – Adapted from Software Carpentry.
Data to Download
NEON Teaching Data Subset: Landsat-derived NDVI raster files
The imagery data used to create this raster teaching data subset were
collected over the
National Ecological Observatory Network's
Harvard Forest
and
San Joaquin Experimental Range
field sites.
The imagery was created by the U.S. Geological Survey (USGS) using a
multispectral scanner
on a Landsat Satellite.
The data files are Geographic Tagged Image-File Format (GeoTIFF).
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Additional Resources
Read more about the raster
package in R.
About Raster Time Series Data
A raster data file can contain one single band or many bands. If the raster data contains imagery data, each band may represent reflectance for a different wavelength (color or type of light) or set of wavelengths - for example red, green and blue. A multi-band raster may two or more bands or layers of data collected at different times for the same extent (region) and of the same resolution.

The raster data that we will use in this tutorial are located in the
(NEON-DS-Landsat-NDVI\HARV\2011\NDVI
) directory and cover part of the
NEON Harvard Forest field site.
In this tutorial, we will:
- Import NDVI data in
GeoTIFF
format. - Import, explore and plot NDVI data derived for several dates throughout the year.
- View the RGB imagery used to derived the NDVI time series to better understand unusual/outlier values.
NDVI Data
The Normalized Difference Vegetation Index or NDVI is a quantitative index of greenness ranging from 0-1 where 0 represents minimal or no greenness and 1 represents maximum greenness.
NDVI is often used for a quantitative proxy measure of vegetation health, cover and phenology (life cycle stage) over large areas. Our NDVI data are a Landsat derived single band product saved as a GeoTIFF for different times of the year.

More on NDVI from NASA
RGB Data
While the NDVI data are a single band product, the RGB images that contain the red band used to derive NDVI, contain 3 (of the 7) 30m resolution bands available from Landsat data. The RGB directory contains RGB images for each time period that NDVI is available.

Getting Started
In this tutorial, we will use the raster
and rgdal
libraries.
# load packages
library(raster)
library(rgdal)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment environment
# be sure that the downloaded file is in this directory
setwd(wd)
To begin, we will create a list of raster files using the list.files()
function in R. This list will be used to generate a RasterStack
. We will
only add files to our list with a .tif
extension using the syntax
pattern=".tif$"
.
If we specify full.names=TRUE
, the full path for each file will be added to
the list.
# Create list of NDVI file paths
# assign path to object = cleaner code
NDVI_HARV_path <- paste0(wd,"NEON-DS-Landsat-NDVI/HARV/2011/NDVI")
all_NDVI_HARV <- list.files(NDVI_HARV_path,
full.names = TRUE,
pattern = ".tif$")
# view list - note the full path, relative to our working directory, is included
all_NDVI_HARV
## [1] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/005_HARV_ndvi_crop.tif"
## [2] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/037_HARV_ndvi_crop.tif"
## [3] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/085_HARV_ndvi_crop.tif"
## [4] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/133_HARV_ndvi_crop.tif"
## [5] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/181_HARV_ndvi_crop.tif"
## [6] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/197_HARV_ndvi_crop.tif"
## [7] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/213_HARV_ndvi_crop.tif"
## [8] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/229_HARV_ndvi_crop.tif"
## [9] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/245_HARV_ndvi_crop.tif"
## [10] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/261_HARV_ndvi_crop.tif"
## [11] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/277_HARV_ndvi_crop.tif"
## [12] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/293_HARV_ndvi_crop.tif"
## [13] "/Users/olearyd/Git/data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI/309_HARV_ndvi_crop.tif"
Now we have a list of all GeoTIFF files in the NDVI
directory for Harvard
Forest. Next, we will create a RasterStack
from this list using the stack()
function.
# Create a raster stack of the NDVI time series
NDVI_HARV_stack <- stack(all_NDVI_HARV)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
We can explore the GeoTIFF tags (the embedded metadata) in a stack
using the
same syntax that we used on single-band raster objects in R including: crs()
(coordinate reference system), extent()
and res()
(resolution; specifically
yres()
and xres()
).
# view crs of rasters
crs(NDVI_HARV_stack)
## CRS arguments:
## +proj=utm +zone=19 +ellps=WGS84 +units=m +no_defs
# view extent of rasters in stack
extent(NDVI_HARV_stack)
## class : Extent
## xmin : 239415
## xmax : 239535
## ymin : 4714215
## ymax : 4714365
# view the y resolution of our rasters
yres(NDVI_HARV_stack)
## [1] 30
# view the x resolution of our rasters
xres(NDVI_HARV_stack)
## [1] 30
Notice that the CRS is +proj=utm +zone=19 +ellps=WGS84 +units=m +no_defs
. The
CRS is in UTM Zone 19. If you have completed the previous tutorials in
this
raster data in R series,
you may have noticed that the UTM zone for the NEON collected remote sensing
data was in Zone 18 rather than Zone 19. Why are the Landsat data in Zone 19?

The width of a Landsat scene is extremely wide - spanning over 170km north to
south and 180km east to west. This means that Landsat data often cover multiple
UTM zones. When the data are processed, the zone in which the majority of the
data cover, is the zone which is used for the final CRS. Thus, our field site at
Harvard Forest is located in UTM Zone 18, but the Landsat data are in a CRS
of
UTM Zone 19.
- What is the CRS?
- What is the x and y resolution of the data?
- What units is the above resolution in?
Plotting Time Series Data
Once we have created our RasterStack
, we can visualize our data. We can use
the plot()
command to quickly plot a RasterStack
.
# view a plot of all of the rasters
# 'nc' specifies number of columns (we will have 13 plots)
plot(NDVI_HARV_stack,
zlim = c(1500, 10000),
nc = 4)
Have a look at the range of NDVI values observed in the plot above. We know that the accepted values for NDVI range from 0-1. Why does our data range from 0 - 10,000?
Scale Factors
The metadata for this NDVI data specifies a scale factor: 10,000. A scale factor is sometimes used to maintain smaller file sizes by removing decimal places. Storing data in integer format keeps files sizes smaller.
Let's apply the scale factor before we go any further. Conveniently, we can quickly apply this factor using raster math on the entire stack as follows:
raster_stack_object_name / 10000
# apply scale factor to data
NDVI_HARV_stack <- NDVI_HARV_stack/10000
# plot stack with scale factor applied
# apply scale factor to limits to ensure uniform plottin
plot(NDVI_HARV_stack,
zlim = c(.15, 1),
nc = 4)
Take a Closer Look at Our Data
Let's take a closer look at the plots of our data. Note that Massachusetts, where the NEON Harvard Forest Field Site is located has a fairly consistent fall, winter, spring and summer season where vegetation turns green in the spring, continues to grow throughout the summer, and begins to change colors and senesce in the fall through winter. Do you notice anything that seems unusual about the patterns of greening and browning observed in the plots above?
Hint: the number after the "X" in each tile title is the Julian day which in this case represents the number of days into each year. If you are unfamiliar with Julian day, check out the NEON Data Skills Converting to Julian Day tutorial . tutorial.
View Distribution of Raster Values
In the above exercise, we viewed plots of our NDVI time series and noticed a few images seem to be unusually light. However this was only a visual representation of potential issues in our data. What is another way we can look at these data that is quantitative?
Next we will use histograms to explore the distribution of NDVI values stored in each raster.
# create histograms of each raster
hist(NDVI_HARV_stack,
xlim = c(0, 1))
It seems like things get green in the spring and summer like we expect, but the data at Julian days 277 and 293 are unusual. It appears as if the vegetation got green in the spring, but then died back only to get green again towards the end of the year. Is this right?
Explore Unusual Data Patterns
The NDVI data that we are using comes from 2011, perhaps a strong freeze around Julian day 277 could cause a vegetation to senesce early, however in the eastern United States, it seems unusual that it would proceed to green up again shortly thereafter.
Let's next view some temperature data for our field site to see whether there were some unusual fluctuations that may explain this pattern of greening and browning seen in the NDVI data.
There are no significant peaks or dips in the temperature during the late summer or early fall time period that might account for patterns seen in the NDVI data.
What is our next step?
Let's have a look at the source Landsat imagery that was partially used used to derive our NDVI rasters to try to understand what appears to be outlier NDVI values.
- View the imagery located in the
/NEON-DS-Landsat-NDVI/HARV/2011
directory. - Plot the RGB images for the Julian days 277 and 293 then plot and compare those images to jdays 133 and 197.
- Does the RGB imagery from these two days explain the low NDVI values observed on these days?
HINT: if you want to plot 4 images in a tiled set, you can use
par(mfrow=c(2,2))
to create a 2x2 tiled layout. When you are done, be sure to
reset your layout using: par(mfrow=c(1,1))
.
Explore The Data's Source
The third challenge question, "Does the RGB imagery from these two days explain the low NDVI values observed on these days?" highlights the importance of exploring the source of a derived data product. In this case, the NDVI data product was derived from (created using) Landsat imagery - specifically the red and near-infrared bands.
When we look at the RGB collected at Julian days 277 and 293 we see that most of the image is filled with clouds. The very low NDVI values resulted from cloud cover — a common challenge that we encounter when working with satellite remote sensing imagery.
Get Lesson Code
Raster 06: Plot Raster Time Series Data in R Using RasterVis and Levelplot
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
This tutorial covers how to improve plotting output using the rasterVis
package
in R. Specifically it covers using levelplot()
and adding meaningful custom
names to bands within a RasterStack
.
Learning Objectives
After completing this tutorial, you will be able to:
- Be able to assign custom names to bands in a RasterStack for prettier plotting.
- Understand advanced plotting of rasters using the
rasterVis
package andlevelplot
.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
rasterVis:
install.packages("rasterVis")
-
RColorBrewer:
install.packages("RColorBrewer")
-
More on Packages in R – Adapted from Software Carpentry.
Data to Download
NEON Teaching Data Subset: Landsat-derived NDVI raster files
The imagery data used to create this raster teaching data subset were
collected over the
National Ecological Observatory Network's
Harvard Forest
and
San Joaquin Experimental Range
field sites.
The imagery was created by the U.S. Geological Survey (USGS) using a
multispectral scanner
on a Landsat Satellite.
The data files are Geographic Tagged Image-File Format (GeoTIFF).
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Get Started
In this tutorial, we are working with the same set of rasters used in the
Raster Time Series Data in R
tutorial. These data are derived from the Landsat satellite and stored in
GeoTIFF
format. Each raster covers the
NEON Harvard Forest field site.
If you have not already created the RasterStack, originally created in Raster Time Series Data in R , please create it now.
# import libraries
library(raster)
library(rgdal)
library(rasterVis)
## Loading required package: terra
## terra version 1.1.4
##
## Attaching package: 'terra'
## The following objects are masked from 'package:tidyr':
##
## expand, fill, pack, separate
## The following object is masked from 'package:zoo':
##
## time<-
## The following object is masked from 'package:grid':
##
## depth
## The following object is masked from 'package:scales':
##
## rescale
## The following object is masked from 'package:ggmap':
##
## inset
## The following object is masked from 'package:rgdal':
##
## project
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, near
## The following object is masked from 'package:knitr':
##
## spin
## Loading required package: lattice
## Loading required package: latticeExtra
##
## Attaching package: 'latticeExtra'
## The following object is masked from 'package:ggplot2':
##
## layer
library(RColorBrewer)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment environment
# be sure that the downloaded file is in this directory
setwd(wd)
# Create list of NDVI file paths
all_NDVI_HARV <- list.files(paste0(wd,"NEON-DS-Landsat-NDVI/HARV/2011/NDVI"), full.names = TRUE, pattern = ".tif$")
# Create a time series raster stack
NDVI_HARV_stack <- stack(all_NDVI_HARV)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
# apply scale factor
NDVI_HARV_stack <- NDVI_HARV_stack/10000
Plot Raster Time Series Data
We can use the plot
function to plot our raster time series data.
# view a plot of all of the rasters
# nc specifies number of columns
plot(NDVI_HARV_stack, nc = 4)
Our plot is nice however, it's missing some key elements including, easily
readable titles. It also contains a legend that is repeated for each image. We
can use levelplot
from the rasterVis
package to make our plot prettier!
The syntax for the levelplot()
function is similar to that for the plot()
function. We use main="TITLE"
to add a title to the entire plot series.
# create a `levelplot` plot
levelplot(NDVI_HARV_stack,
main="Landsat NDVI\nNEON Harvard Forest")
Adjust the Color Ramp
Next, let's adjust the color ramp used to render the rasters. First, we
can change the red color ramp to a green one that is more visually suited to our
NDVI (greenness) data using the colorRampPalette()
function in combination with
colorBrewer
.
# use colorbrewer which loads with the rasterVis package to generate
# a color ramp of yellow to green
cols <- colorRampPalette(brewer.pal(9,"YlGn"))
# create a level plot - plot
levelplot(NDVI_HARV_stack,
main="Landsat NDVI -- Improved Colors \nNEON Harvard Forest Field Site",
col.regions=cols)
The yellow to green color ramp visually represents NDVI well given it's a measure of greenness. Someone looking at the plot can quickly understand that pixels that are more green, have a higher NDVI value.
- For all of the
brewer_pal
ramp names see the
Refine Plot & Tile Labels
Next, let's label each raster in our plot with the Julian day that the raster
represents. The current names come from the band (layer names) stored in the
RasterStack
and first the part each name is the Julian day.
To create a more meaningful label we can remove the "x" and replace it with
"day" using the gsub()
function in R. The syntax is as follows:
gsub("StringToReplace","TextToReplaceIt", Robject)
.
First let's remove "_HARV_NDVI_crop" from each label.
# view names for each raster layer
names(NDVI_HARV_stack)
## [1] "X005_HARV_ndvi_crop" "X037_HARV_ndvi_crop" "X085_HARV_ndvi_crop"
## [4] "X133_HARV_ndvi_crop" "X181_HARV_ndvi_crop" "X197_HARV_ndvi_crop"
## [7] "X213_HARV_ndvi_crop" "X229_HARV_ndvi_crop" "X245_HARV_ndvi_crop"
## [10] "X261_HARV_ndvi_crop" "X277_HARV_ndvi_crop" "X293_HARV_ndvi_crop"
## [13] "X309_HARV_ndvi_crop"
# use gsub to modify label names.that we'll use for the plot
rasterNames <- gsub("X","Day ", names(NDVI_HARV_stack))
# view Names
rasterNames
## [1] "Day 005_HARV_ndvi_crop" "Day 037_HARV_ndvi_crop"
## [3] "Day 085_HARV_ndvi_crop" "Day 133_HARV_ndvi_crop"
## [5] "Day 181_HARV_ndvi_crop" "Day 197_HARV_ndvi_crop"
## [7] "Day 213_HARV_ndvi_crop" "Day 229_HARV_ndvi_crop"
## [9] "Day 245_HARV_ndvi_crop" "Day 261_HARV_ndvi_crop"
## [11] "Day 277_HARV_ndvi_crop" "Day 293_HARV_ndvi_crop"
## [13] "Day 309_HARV_ndvi_crop"
# Remove HARV_NDVI_crop from the second part of the string
rasterNames <- gsub("_HARV_ndvi_crop","",rasterNames)
# view names for each raster layer
rasterNames
## [1] "Day 005" "Day 037" "Day 085" "Day 133" "Day 181" "Day 197"
## [7] "Day 213" "Day 229" "Day 245" "Day 261" "Day 277" "Day 293"
## [13] "Day 309"
Once the names for each band have been reassigned, we can render our plot with
the new labels using names.attr=rasterNames
.
# use level plot to create a nice plot with one legend and a 4x4 layout.
levelplot(NDVI_HARV_stack,
layout=c(4, 4), # create a 4x4 layout for the data
col.regions=cols, # add a color ramp
main="Landsat NDVI - Julian Days \nHarvard Forest 2011",
names.attr=rasterNames)
We can adjust the columns of our plot too using layout=c(cols,rows)
. Below
we adjust the layout to be a matrix of 5 columns and 3 rows.
# use level plot to create a nice plot with one legend and a 4x4 layout.
levelplot(NDVI_HARV_stack,
layout=c(5, 3), # create a 5x3 layout for the data
col.regions=cols, # add a color ramp
main="Landsat NDVI - Julian Days \nHarvard Forest 2011",
names.attr=rasterNames)
Finally, scales
allows us to modify the x and y-axis scale. Let's simply
remove the axis ticks from the plot with scales =list(draw=FALSE)
.
# use level plot to create a nice plot with one legend and a 4x4 layout.
levelplot(NDVI_HARV_stack,
layout=c(5, 3), # create a 5x3 layout for the data
col.regions=cols, # add a color ramp
main="Landsat NDVI - Julian Days \nHarvard Forest 2011",
names.attr=rasterNames,
scales=list(draw=FALSE )) # remove axes labels & ticks
- Create a plot and label each tile "Julian Day" with the julian day value following.
- Change the colorramp to a divergent brown to green color ramp to represent the data. Hint: Use the brewerpal page to help choose a color ramp.
Questions: Does having a divergent color ramp represent the data better than a sequential color ramp (like "YlGn")? Can you think of other data sets where a divergent color ramp may be best?
Get Lesson Code
Raster 07: Extract NDVI Summary Values from a Raster Time Series
Authors: Leah A. Wasser, Megan A. Jones, Zack Brym, Kristina Riemer, Jason Williams, Jeff Hollister, Mike Smorul
Last Updated: Apr 8, 2021
In this tutorial, we will extract NDVI values from a raster time series dataset
in R and plot them using ggplot
.
Learning Objectives
After completing this tutorial, you will be able to:
- Be able to extract summary pixel values from a raster.
- Know how to save summary values to a .csv file.
- Be able to plot summary pixel values using
ggplot()
. - Have experience comparing NDVI values between two different sites.
Things You'll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
-
ggplot2:
install.packages("ggplot2")
-
More on Packages in R – Adapted from Software Carpentry.
Data to Download
NEON Teaching Data Subset: Landsat-derived NDVI raster files
The imagery data used to create this raster teaching data subset were
collected over the
National Ecological Observatory Network's
Harvard Forest
and
San Joaquin Experimental Range
field sites.
The imagery was created by the U.S. Geological Survey (USGS) using a
multispectral scanner
on a Landsat Satellite.
The data files are Geographic Tagged Image-File Format (GeoTIFF).
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Extract Summary Statistics From Raster Data
In science, we often want to extract summary values from raster data. For example, we might want to understand overall greeness across a field site or at each plot within a field site. These values can then be compared between different field sites and combined with other related metrics to support modeling and further analysis.
Get Started
In this tutorial, we will work with the same set of rasters used in the Raster Time Series Data in R and Plot Raster Time Series Data in R Using RasterVis and Levelplot tutorials. To begin, we will create a raster stack (also created in the previous tutorials so you may be able to skip this first step!).
library(raster)
library(rgdal)
library(ggplot2)
# set working directory to ensure R can find the file we wish to import
wd <- "~/Git/data/" # this will depend on your local environment
# be sure that the downloaded file is in this directory
setwd(wd)
# Create list of NDVI file paths
all_HARV_NDVI <- list.files(paste0(wd,"NEON-DS-Landsat-NDVI/HARV/2011/NDVI"),
full.names = TRUE,
pattern = ".tif$")
# Create a time series raster stack
NDVI_HARV_stack <- stack(all_HARV_NDVI)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
# apply scale factor
NDVI_HARV_stack <- NDVI_HARV_stack/10000
Calculate Average NDVI
Our goal in this tutorial, is to create a data.frame
that contains a single,
mean NDVI value for each raster in our time series. This value represents the
mean NDVI value for this area on a given day.
We can calculate the mean for each raster using the cellStats
function. The
cellStats
function produces a numeric array of values. We can then convert our
array format output to a data.frame using as.data.frame()
.
# calculate mean NDVI for each raster
avg_NDVI_HARV <- cellStats(NDVI_HARV_stack,mean)
# convert output array to data.frame
avg_NDVI_HARV <- as.data.frame(avg_NDVI_HARV)
# To be more efficient we could do the above two steps with one line of code
# avg_NDVI_HARV <- as.data.frame(cellStats(NDVI_stack_HARV,mean))
# view data
avg_NDVI_HARV
## avg_NDVI_HARV
## X005_HARV_ndvi_crop 0.365150
## X037_HARV_ndvi_crop 0.242645
## X085_HARV_ndvi_crop 0.251390
## X133_HARV_ndvi_crop 0.599300
## X181_HARV_ndvi_crop 0.878725
## X197_HARV_ndvi_crop 0.893250
## X213_HARV_ndvi_crop 0.878395
## X229_HARV_ndvi_crop 0.881505
## X245_HARV_ndvi_crop 0.850120
## X261_HARV_ndvi_crop 0.796360
## X277_HARV_ndvi_crop 0.033050
## X293_HARV_ndvi_crop 0.056895
## X309_HARV_ndvi_crop 0.541130
# view only the value in row 1, column 1 of the data frame
avg_NDVI_HARV[1,1]
## [1] 0.36515
We now have a data.frame
with row.names
based on the original file name and
a mean NDVI value for each file. Next, let's clean up the column names in our
data.frame to make it easier for colleagues to work with our code.
It is a bit confusing to have duplicate object & column names (e.g.
avg_NDVI_HARV
), additionally the "avg" does not clearly what the value in that
particular column is. Let's change the NDVI column name to MeanNDVI
.
# view column name slot
names(avg_NDVI_HARV)
## [1] "avg_NDVI_HARV"
# rename the NDVI column
names(avg_NDVI_HARV) <- "meanNDVI"
# view cleaned column names
names(avg_NDVI_HARV)
## [1] "meanNDVI"
By renaming the column, we lose the "HARV" in the header that reminds us what
site our data are from. While, we are only working with one site now, we
might want to compare several sites worth of data in the future. Let's add a
column to our data.frame
called "site". We can populate this column with the
site name - HARV. Let's also create a year column and populate it with 2011 -
the year our data were collected.
# add a site column to our data
avg_NDVI_HARV$site <- "HARV"
# add a "year" column to our data
avg_NDVI_HARV$year <- "2011"
# view data
head(avg_NDVI_HARV)
## meanNDVI site year
## X005_HARV_ndvi_crop 0.365150 HARV 2011
## X037_HARV_ndvi_crop 0.242645 HARV 2011
## X085_HARV_ndvi_crop 0.251390 HARV 2011
## X133_HARV_ndvi_crop 0.599300 HARV 2011
## X181_HARV_ndvi_crop 0.878725 HARV 2011
## X197_HARV_ndvi_crop 0.893250 HARV 2011
We now have data frame that contains a row for each raster file processed, and a
column for meanNDVI
, site
and year
.
Extract Julian Day from row.names
We'd like to produce a plot where Julian days (the numeric day of the year, 0 - 365/366) is on the x-axis and NDVI is on the y-axis. To create this plot, we'll need a column that contains the Julian day value.
One way to create a Julian day column is to use gsub
on the file name in each
row. We can replace both the X
and the _HARV_NDVI_crop
to extract the Julian
Day value:
X005_HARV_NDVI_crop
# note the use of the vertical bar character ( | ) is equivalent to "or". This
# allows us to search for more than one pattern in our text strings.
julianDays <- gsub(pattern = "X|_HARV_ndvi_crop", #the pattern to find
x = row.names(avg_NDVI_HARV), #the object containing the strings
replacement = "") #what to replace each instance of the pattern with
# alternately you can include the above code on one single line
# julianDays <- gsub("X|_HARV_NDVI_crop", "", row.names(avg_NDVI_HARV))
# make sure output looks ok
head(julianDays)
## [1] "005" "037" "085" "133" "181" "197"
# add julianDay values as a column in the data frame
avg_NDVI_HARV$julianDay <- julianDays
# what class is the new column
class(avg_NDVI_HARV$julianDay)
## [1] "character"
What class is our julianDay
column?
Convert Julian Day to Date Class
Currently, the values in the Julian day column are stored as a character
class.
Storing this data as a date object is better - for plotting, data subsetting and
working with our data. Let's convert.
For more information on date-time classes, see the NEON Data Skills tutorial Convert Date & Time Data from Character Class to Date-Time Class (POSIX) in R.
To convert a Julian Day number to a date class, we need to set the origin of the day which "counting" Julian Days began. Our data are from 2011, and we know that the USGS Landsat Team created Julian Day values for this year. Therefore, the first day or "origin" for our Julian day count is 01 January 2011. Once we set the Julian Day origin, we can add the Julian Day value (as an integer) to the origin date.
Since the origin date was originally set as a Date class object, the new Date
column is also stored as class Date
.
# set the origin for the julian date (1 Jan 2011)
origin <- as.Date("2011-01-01")
# convert "julianDay" from class character to integer
avg_NDVI_HARV$julianDay <- as.integer(avg_NDVI_HARV$julianDay)
# create a date column; -1 added because origin is the 1st.
# If not -1, 01/01/2011 + 5 = 01/06/2011 which is Julian day 6, not 5.
avg_NDVI_HARV$Date<- origin + (avg_NDVI_HARV$julianDay-1)
# did it work?
head(avg_NDVI_HARV$Date)
## [1] "2011-01-05" "2011-02-06" "2011-03-26" "2011-05-13" "2011-06-30"
## [6] "2011-07-16"
# What are the classes of the two columns now?
class(avg_NDVI_HARV$Date)
## [1] "Date"
class(avg_NDVI_HARV$julianDay)
## [1] "integer"
Note that when we convert our integer class julianDay
values to dates, we
subtracted 1 as follows:
avg_NDVI_HARV$Date <- origin + (avg_NDVI_HARV$julianDay-1)
This is because the origin day is 01 January 2011, so the extracted day is 01.
The Julian Day (or year day) for this is also 01. When we convert from the
integer 05 julianDay
value (indicating 5th of January), we cannot simply add
origin + julianDay
because 01 + 05 = 06
or 06 January 2011. To correct, this
error we then subtract 1 to get the correct day, January 05 2011.
For this challenge, compare NDVI values for the NEON Harvard Forest and San
Joaquin Experimental Range field sites. NDVI data for SJER are located in the
NEON-DS-Landsat-NDVI/SJER/2011/NDVI
directory.
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO",
## prefer_proj = prefer_proj): Discarded datum Unknown based on WGS84
## ellipsoid in CRS definition
Plot NDVI Using ggplot
We now have a clean data.frame with properly scaled NDVI and Julian days. Let's plot our data.
We will use the ggplot()
function within the ggplot2
package for this plot.
If you are unfamiliar with ggplot()
or would like to learn more about plotting
in ggplot()
see the tutorial on
Plotting Time Series with ggplot in R .
# plot NDVI
ggplot(avg_NDVI_HARV, aes(julianDay, meanNDVI), na.rm=TRUE) +
geom_point(size=4,colour = "PeachPuff4") +
ggtitle("Landsat Derived NDVI - 2011\n NEON Harvard Forest Field Site") +
xlab("Julian Days") + ylab("Mean NDVI") +
theme(text = element_text(size=20))
Challenge: Plot San Joaquin Experimental Range Data
Create a complementary plot for the SJER data. Plot the data points in a different color.
Compare NDVI from Two Different Sites in One Plot
Comparison of plots is often easiest when both plots are side by side. Or, even better, if both sets of data are plotted in the same plot. We can do this by binding the two datasets together. The date frames must have the same number of columns and exact same column names to be bound.
# Merge Data Frames
NDVI_HARV_SJER <- rbind(avg_NDVI_HARV,avg_NDVI_SJER)
# plot NDVI values for both sites
ggplot(NDVI_HARV_SJER, aes(julianDay, meanNDVI, colour=site)) +
geom_point(size=4,aes(group=site)) +
geom_line(aes(group=site)) +
ggtitle("Landsat Derived NDVI - 2011\n Harvard Forest vs San Joaquin \n NEON Field Sites") +
xlab("Julian Day") + ylab("Mean NDVI") +
scale_colour_manual(values=c("PeachPuff4", "SpringGreen4"))+
# scale_colour : match previous plots
theme(text = element_text(size=20))
Remove Outlier Data
As we look at these plots we see variation in greenness across the year. However, the pattern is interrupted by a few points where NDVI quickly drops towards 0 during a time period when we might expect the vegetation to have a larger greenness value. Is the vegetation truly senescent or gone or are these outlier values that should be removed from the data?
Let's look at the RGB images from Harvard Forest.
NOTE: the code below uses loops which we will not teach in this tutorial. However the code demonstrates one way to plot multiple RGB rasters in a grid.
# open up RGB imagery
rgb.allCropped <- list.files(paste0(wd,"NEON-DS-Landsat-NDVI/HARV/2011/RGB/"),
full.names=TRUE,
pattern = ".tif$")
# create a layout
par(mfrow=c(4,4))
# super efficient code
for (aFile in rgb.allCropped){
NDVI.rastStack <- stack(aFile)
plotRGB(NDVI.rastStack, stretch="lin")
}
# reset layout
par(mfrow=c(1,1))
Notice that the data points with very low NDVI values can be associated with images that are filled with clouds. Thus, we can attribute the low NDVI values to high levels of cloud cover.
Is the same thing happening at SJER?
# open up the cropped files
rgb.allCropped.SJER <- list.files(paste0(wd,"NEON-DS-Landsat-NDVI/SJER/2011/RGB/"),
full.names=TRUE,
pattern = ".tif$")
# create a layout
par(mfrow=c(5,4))
# Super efficient code
# note that there is an issue with one of the rasters
# NEON-DS-Landsat-NDVI/SJER/2011/RGB/254_SJER_landRGB.tif has a blue band with no range
# thus you can't apply a stretch to it. The code below skips the stretch for
# that one image. You could automate this by testing the range of each band in each image
for (aFile in rgb.allCropped.SJER)
{NDVI.rastStack <- stack(aFile)
if (aFile ==paste0(wd,"NEON-DS-Landsat-NDVI/SJER/2011/RGB//254_SJER_landRGB.tif"))
{plotRGB(NDVI.rastStack) }
else { plotRGB(NDVI.rastStack, stretch="lin") }
}
## Error in grDevices::rgb(RGB[, 1], RGB[, 2], RGB[, 3], alpha = alpha, max = scale): color intensity NA, not in 0:255
# reset layout
par(mfrow=c(1,1))
Without significant additional processing, we will not be able to retrieve a strong reflection from vegetation, from a remotely sensed image that is predominantly cloud covered. Thus, these points are likely bad data points. Let's remove them.
First, we will identify the good data points - that should be retained. One way to do this is by identifying a threhold value. All values below that threshold will be removed from our analysis. We will use 0.1 as an example for this tutorials. We can then use the subset function to remove outlier datapoints (below our identified threshold).
# retain only rows with meanNDVI>0.1
avg_NDVI_HARV_clean<-subset(avg_NDVI_HARV, meanNDVI>0.1)
# Did it work?
avg_NDVI_HARV_clean$meanNDVI<0.1
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Now we can create another plot without the suspect data.
# plot without questionable data
ggplot(avg_NDVI_HARV_clean, aes(julianDay, meanNDVI)) +
geom_point(size=4,colour = "SpringGreen4") +
ggtitle("Landsat Derived NDVI - 2011\n NEON Harvard Forest Field Site") +
xlab("Julian Days") + ylab("Mean NDVI") +
theme(text = element_text(size=20))
Now our outlier data points are removed and the pattern of "green-up" and "brown-down" makes a bit more sense.
Write NDVI data to a .csv File
We can write our final NDVI data.frame
out to a text format, to quickly share
with a colleague or to resuse for analysis or visualization purposes. We will
export in Comma Separated Value (.csv) file format given it is usable in many
different tools and across platforms (MAC, PC, etc).
We will use write.csv()
to write a specified data.frame
to a .csv
file.
Unless you designate a different directory, the output file will be saved in
your working directory.
Before saving our file, let's quickly view the format to make sure it is what we want as an output format.
# confirm data frame is the way we want it
head(avg_NDVI_HARV_clean)
## meanNDVI site year julianDay Date
## X005_HARV_ndvi_crop 0.365150 HARV 2011 5 2011-01-05
## X037_HARV_ndvi_crop 0.242645 HARV 2011 37 2011-02-06
## X085_HARV_ndvi_crop 0.251390 HARV 2011 85 2011-03-26
## X133_HARV_ndvi_crop 0.599300 HARV 2011 133 2011-05-13
## X181_HARV_ndvi_crop 0.878725 HARV 2011 181 2011-06-30
## X197_HARV_ndvi_crop 0.893250 HARV 2011 197 2011-07-16
It looks like we have a series of row.names
that we do not need given we have
this information stored in individual columns in our data.frame. Let's remove
the row names.
# create new df to prevent changes to avg_NDVI_HARV
NDVI_HARV_toWrite<-avg_NDVI_HARV_clean
# drop the row.names column
row.names(NDVI_HARV_toWrite)<-NULL
# check data frame
head(NDVI_HARV_toWrite)
## meanNDVI site year julianDay Date
## 1 0.365150 HARV 2011 5 2011-01-05
## 2 0.242645 HARV 2011 37 2011-02-06
## 3 0.251390 HARV 2011 85 2011-03-26
## 4 0.599300 HARV 2011 133 2011-05-13
## 5 0.878725 HARV 2011 181 2011-06-30
## 6 0.893250 HARV 2011 197 2011-07-16
# create a .csv of mean NDVI values being sure to give descriptive name
# write.csv(DateFrameName, file="NewFileName")
write.csv(NDVI_HARV_toWrite, file=paste0(wd,"meanNDVI_HARV_2011.csv"))
- Create a NDVI .csv file for the NEON SJER field site that is comparable with the one we just created for the Harvard Forest. Be sure to inspect for questionable values before writing any data to a .csv file.
- Create a NDVI .csv file that stacks data from both field sites.