Series
Get Started with NEON Data: A Series of Data Tutorials
This Data Tutorial Series is designed to provide you with an introduction to how to access and use NEON data. It includes both foundational skills for working with NEON data, and tutorials that focus on specific data types, which you can choose from based on your interests.
Foundational Skills and Tools to Access NEON Data
- Start with a short video guide to downloading data from the NEON Data Portal.
- The Download and Explore NEON Data tutorial will guide you through using the neonUtilities package in R to transform NEON data, and how to use the metadata that accompany data downloads to help you understand the data.
- Many more details about the neonUtilities package, and the input parameters available for its functions, are available in the neonUtilities cheat sheet.
- If you prefer to use Python, the Using neonUtilities in Python tutorial provides instructions for using the rpy2 package to run neonUtilities in a Python environment.
- Using an API token can make your downloads faster, and helps NEON by linking your user account to your downloads. See more information about API tokens here, and learn how to use a token with neonUtilities in this tutorial.
- Learn how to work with NEON location data, using examples from vegetation structure observations and soil temperature sensors.
Introductions to Working with Different Data Types
- Explore the intersection of sensor and observational data with the Plant Phenology & Temperature tutorial series (individual tutorials that make up the series are listed in the sidebar). This is also a good introduction for inexperienced R users.
- Get familiar with NEON sensor data flagging and data quality metrics, using aquatic instrument data as exemplar datasets.
- Use the neonOS package to join tables and flag duplicates in NEON observational data.
- Calculate biodiversity metrics from NEON aquatic macroinvertebrate data.
- For a quick introduction to working with remote sensing data, calculate a canopy height model from discrete return Lidar. NEON has an extensive catalog of tutorials about remote sensing principles and data; search the tutorials and tutorial series if you are interested in other topics.
- Connecting ground observations to remote sensing imagery is important to many NEON users; get familiar with the process, as well as some of the challenges of comparing these data sources, by comparing tree height observations to a canopy height model.
- Use the neonUtilities package to wrangle NEON surface-atmosphere exchange data (published in HDF5 format).
Download and Explore NEON Data
Authors: Claire K. Lunch
Last Updated: Nov 2, 2022
This tutorial covers downloading NEON data, using the Data Portal and the neonUtilities R package, as well as basic instruction in beginning to explore and work with the downloaded data, including guidance in navigating data documentation.
NEON data
There are 3 basic categories of NEON data:
- Remote sensing (AOP) - Data collected by the airborne observation platform, e.g. LIDAR, surface reflectance
- Observational (OS) - Data collected by a human in the field, or in an analytical laboratory, e.g. beetle identification, foliar isotopes
- Instrumentation (IS) - Data collected by an automated, streaming sensor, e.g. net radiation, soil carbon dioxide. This category also includes the eddy covariance (EC) data, which are processed and structured in a unique way, distinct from other instrumentation data (see Tutorial for EC data for details).
This lesson covers all three types of data. The download procedures are similar for all types, but data navigation differs significantly by type.
Objectives
After completing this activity, you will be able to:
- Download NEON data using the neonUtilities package.
- Understand downloaded data sets and load them into R for analyses.
Things You’ll Need To Complete This Tutorial
To complete this tutorial you will need the most current version of R and, preferably, RStudio loaded on your computer.
Install R Packages
- neonUtilities: Basic functions for accessing NEON data
- neonOS: Functions for common data wrangling needs for NEON observational data
- raster: Raster package; needed for remote sensing data
Both of these packages can be installed from CRAN:
install.packages("neonUtilities")
install.packages("neonOS")
install.packages("raster")
Additional Resources
- Tutorial for neonUtilities. Some overlap with this tutorial but goes into more detail about the neonUtilities package.
- Tutorial for using neonUtilities from a Python environment.
- GitHub repository for neonUtilities
- neonUtilities cheat sheet. A quick reference guide for users.
Getting started: Download data from the Portal and load packages
Go to the NEON Data Portal and download some data! Almost any IS or OS data product can be used for this section of the tutorial, but we will proceed assuming you've downloaded Photosynthetically Active Radiation (PAR) (DP1.00024.001) data. For optimal results, download three months of data from one site. The downloaded file should be a zip file named NEON_par.zip. For this tutorial, we will be using PAR data from the Wind River Experimental Forest (WREF) in Washington state from September-November 2019.
Now switch over to R and load all the packages installed above.
# load packages
library(neonUtilities)
library(neonOS)
library(raster)
# Set global option to NOT convert all character variables to factors
options(stringsAsFactors=F)
Stack the downloaded data files: stackByTable()
The stackByTable()
function will unzip and join the files in the
downloaded zip file.
# Modify the file path to match the path to your zip file
stackByTable("~/Downloads/NEON_par.zip")
In the same directory as the zipped file, you should now have an unzipped folder of the same name. When you open this you will see a new folder called stackedFiles, which should contain five files: PARPAR_30min.csv, PARPAR_1min.csv, sensor_positions.csv, variables.csv, and readme.txt.
We'll look at these files in more detail below.
Download files and load directly to R: loadByProduct()
In the section above, we downloaded a .zip file from the data portal to
our downloads folder, then used the stackByTable() function to transform
those data into a usable format. However, there is a faster way to load
data directly into the R Global Environment using loadByProduct()
.
The most popular function in neonUtilities
is loadByProduct()
.
This function downloads data from the NEON API, merges the site-by-month
files, and loads the resulting data tables into the R environment,
assigning each data type to the appropriate R class. This is a popular
choice because it ensures you're always working with the latest data,
and it ends with ready-to-use tables in R. However, if you use it in
a workflow you run repeatedly, keep in mind it will re-download the
data every time.
loadByProduct()
works on most observational (OS) and sensor (IS) data,
but not on surface-atmosphere exchange (SAE) data, remote sensing (AOP)
data, and some of the data tables in the microbial data products. For
functions that download AOP data, see the byFileAOP()
and byTileAOP()
sections in this tutorial. For functions that work with SAE data, see
the NEON eddy flux data tutorial.
The inputs to loadByProduct()
control which data to download and how
to manage the processing:
-
dpID
: the data product ID, e.g. DP1.00002.001 -
site
: defaults to "all", meaning all sites with available data; can be a vector of 4-letter NEON site codes, e.g.c("HARV","CPER","ABBY")
. -
startdate
andenddate
: defaults to NA, meaning all dates with available data; or a date in the form YYYY-MM, e.g. 2017-06. Since NEON data are provided in month packages, finer scale querying is not available. Both start and end date are inclusive. -
package
: either basic or expanded data package. Expanded data packages generally include additional information about data quality, such as chemical standards and quality flags. Not every data product has an expanded package; if the expanded package is requested but there isn't one, the basic package will be downloaded. -
timeIndex
: defaults to "all", to download all data; or the number of minutes in the averaging interval. Only applicable to IS data. -
savepath
: the file path you want to download to; defaults to the working directory. -
check.size
: T or F: should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set it to F. -
nCores
: Number of cores to use for parallel processing. Defaults to 1, i.e. no parallelization. -
forceParallel
: If the data volume to be processed does not meet minimum requirements to run in parallel, this overrides.
The dpID
is the data product identifier of the data you want to
download. The DPID can be found on the
Explore Data Products page.
It will be in the form DP#.#####.###
Here, we'll download aquatic plant chemistry data from three lake sites: Prairie Lake (PRLA), Suggs Lake (SUGG), and Toolik Lake (TOOK).
apchem <- loadByProduct(dpID="DP1.20063.001",
site=c("PRLA","SUGG","TOOK"),
package="expanded", check.size=T)
The object returned by loadByProduct()
is a named list of data
frames. To work with each of them, select them from the list
using the $
operator.
names(apchem)
View(apchem$apl_plantExternalLabDataPerSample)
If you prefer to extract each table from the list and work
with it as an independent object, you can use the
list2env()
function:
list2env(apchem, .GlobalEnv)
## <environment: R_GlobalEnv>
If you want to be able to close R and come back to these data without
re-downloading, you'll want to save the tables locally. We recommend
also saving the variables file, both so you'll have it to refer to, and
so you can use it with readTableNEON()
(see below).
write.csv(apl_clipHarvest,
"~/Downloads/apl_clipHarvest.csv",
row.names=F)
write.csv(apl_biomass,
"~/Downloads/apl_biomass.csv",
row.names=F)
write.csv(apl_plantExternalLabDataPerSample,
"~/Downloads/apl_plantExternalLabDataPerSample.csv",
row.names=F)
write.csv(variables_20063,
"~/Downloads/variables_20063.csv",
row.names=F)
But, if you want to save files locally and load them into R (or another
platform) each time you run a script, instead of downloading from the API
every time, you may prefer to use zipsByProduct()
and stackByTable()
instead of loadByProduct()
, as we did in the first section above. Details
can be found in our neonUtilities tutorial. You can also try out the
community-developed neonstore
package, which is designed for
maintaining a local store of the NEON data you use.
Download remote sensing data: byFileAOP() and byTileAOP()
Remote sensing data files are very large, so downloading them
can take a long time. byFileAOP()
and byTileAOP()
enable
easier programmatic downloads, but be aware it can take a very
long time to download large amounts of data.
Input options for the AOP functions are:
-
dpID
: the data product ID, e.g. DP1.00002.001 -
site
: the 4-letter code of a single site, e.g. HARV -
year
: the 4-digit year to download -
savepath
: the file path you want to download to; defaults to the working directory -
check.size
: T or F: should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set it to F. -
easting
:byTileAOP()
only. Vector of easting UTM coordinates whose corresponding tiles you want to download -
northing
:byTileAOP()
only. Vector of northing UTM coordinates whose corresponding tiles you want to download -
buffer
:byTileAOP()
only. Size in meters of buffer to include around coordinates when deciding which tiles to download
Here, we'll download one tile of Ecosystem structure (Canopy Height Model) (DP3.30015.001) from WREF in 2017.
byTileAOP("DP3.30015.001", site="WREF", year="2017", check.size = T,
easting=580000, northing=5075000, savepath="~/Downloads")
In the directory indicated in savepath
, you should now have a folder
named DP3.30015.001
with several nested subfolders, leading to a tif
file of a canopy height model tile. We'll look at this in more detail
below.
Navigate data downloads: IS
Let's take a look at the PAR data we downloaded earlier. We'll
read in the 30-minute file using the function readTableNEON()
,
which uses the variables.csv
file to assign data types to each
column of data:
par30 <- readTableNEON(
dataFile="~/Downloads/NEON_par/stackedFiles/PARPAR_30min.csv",
varFile="~/Downloads/NEON_par/stackedFiles/variables_00024.csv")
View(par30)
The first four columns are added by stackByTable()
when it merges
files across sites, months, and tower heights. The column
publicationDate
is the date-time stamp indicating when the data
were published. This can be used as an indicator for whether data
have been updated since the last time you downloaded them.
The remaining columns are described by the variables file:
parvar <- read.csv("~/Downloads/NEON_par/stackedFiles/variables_00024.csv")
View(parvar)
The variables file shows you the definition and units for each column of data.
The Quick Start Guide is a pdf file, and it contains basic information to get you started using this data product, such as the data quality information provided and common calculations many user will want to make.
Now that we know what we're looking at, let's plot PAR from the top tower level:
plot(PARMean~startDateTime,
data=par30[which(par30$verticalPosition=="080"),],
type="l")
Looks good! The sun comes up and goes down every day, and some days are cloudy. If you want to dig in a little deeper, try plotting PAR from lower tower levels on the same axes to see light attenuation through the canopy.
Navigate data downloads: OS
Let's take a look at the aquatic plant data. OS data products are simple in that the data generally tabular, and data volumes are lower than the other NEON data types, but they are complex in that almost all consist of multiple tables containing information collected at different times in different ways. For example, samples collected in the field may be shipped to a laboratory for analysis. Data associated with the field collection will appear in one data table, and the analytical results will appear in another. Complexity in working with OS data usually involves bringing data together from multiple measurements or scales of analysis.
As with the IS data, the variables file can tell you more about the data. OS data also come with a validation file, which contains information about the validation and controlled data entry that were applied to the data:
View(variables_20063)
View(validation_20063)
OS data products each come with a Data Product User Guide, which can be downloaded with the data, or accessed from the document library on the Data Portal, or the Product Details page for the data product. The User Guide is designed to give a basic introduction to the data product, including a brief summary of the protocol and descriptions of data format and structure.
To get started with the aquatic plant chemistry data, let's
take a look at carbon isotope ratios in plants across the three
sites we downloaded. The chemical analytes are reported in the
apl_plantExternalLabDataPerSample
table, and the table is in
long format, with one record per sample per analyte, so we'll
subset to only the carbon isotope analyte:
boxplot(analyteConcentration~siteID,
data=apl_plantExternalLabDataPerSample,
subset=analyte=="d13C",
xlab="Site", ylab="d13C")
We see plants at Suggs and Toolik are quite low in 13C, with more
spread at Toolik than Suggs, and plants at Prairie Lake are relatively
enriched. Clearly the next question is what species these data represent.
But taxonomic data aren't present in the apl_plantExternalLabDataPerSample
table, they're in the apl_biomass
table. We'll need to join the two
tables to get chemistry by taxon.
As mentioned above, each data product has a Quick Start Guide, and for OS
products it includes a section describing how to join the tables in the
data product. Since it's a pdf file, loadByProduct()
doesn't bring it in,
but you can view the Aquatic plant chemistry QSG on the
Product Details
page. The neonOS
package uses the information from the QSGs to provide
an automated table-joining function, joinTableNEON()
.
apct <- joinTableNEON(apl_biomass,
apl_plantExternalLabDataPerSample)
Using the merged data, now we can plot carbon isotope ratio for each taxon.
boxplot(analyteConcentration~scientificName,
data=apct, subset=analyte=="d13C",
xlab=NA, ylab="d13C",
las=2, cex.axis=0.7)
And now we can see most of the sampled plants have carbon isotope ratios around -30, with just two species accounting for most of the more enriched samples.
Navigate data downloads: AOP
To work with AOP data, the best bet is the raster
package.
It has functionality for most analyses you might want to do.
We'll use it to read in the tile we downloaded:
chm <- raster("~/Downloads/DP3.30015.001/neon-aop-products/2017/FullSite/D16/2017_WREF_1/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D16_WREF_DP3_580000_5075000_CHM.tif")
The raster
package includes plotting functions:
plot(chm, col=topo.colors(6))
Now we can see canopy height across the downloaded tile; the tallest trees are over 60 meters, not surprising in the Pacific Northwest. There is a clearing or clear cut in the lower right corner.
Get Lesson Code
Using neonUtilities in Python
Authors: Claire K. Lunch
Last Updated: Jan 4, 2023
The instructions below will guide you through using the neonUtilities R package in Python, via the rpy2 package. rpy2 creates an R environment you can interact with from Python.
The assumption in this tutorial is that you want to work with NEON data in
Python, but you want to use the handy download and merge functions provided by
the neonUtilities
R package to access and format the data for analysis. If
you want to do your analyses in R, use one of the R-based tutorials linked
below.
For more information about the neonUtilities
package, and instructions for
running it in R directly, see the Download and Explore tutorial
and/or the neonUtilities tutorial.
Install and set up
Before starting, you will need:
- Python 3 installed. It is probably possible to use this workflow in Python 2, but these instructions were developed and tested using 3.7.4.
- R installed. You don't need to have ever used it directly. We wrote this tutorial using R 4.1.1, but most other recent versions should also work.
-
rpy2
installed. Run the line below from the command line, it won't run within a Python script. See Python documentation for more information on how to install packages.rpy2
often has install problems on Windows, see "Windows Users" section below if you are running Windows. - You may need to install
pip
before installingrpy2
, if you don't have it installed already.
From the command line, run pip install rpy2
Windows users
The rpy2 package was built for Mac, and doesn't always work smoothly on Windows. If you have trouble with the install, try these steps.
- Add C:\Program Files\R\R-3.3.1\bin\x64 to the Windows Environment Variable “Path”
- Install rpy2 manually from https://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2
- Pick the correct version. At the download page the portion of the files with cp## relate to the Python version. e.g., rpy2 2.9.2 cp36 cp36m win_amd64.whl is the correct download when 2.9.2 is the latest version of rpy2 and you are running Python 36 and 64 bit Windows (amd64).
- Save the whl file, navigate to it in windows then run pip directly on the file as follows “pip install rpy2 2.9.2 cp36 cp36m win_amd64.whl”
- Add an R_HOME Windows environment variable with the path C:\Program Files\R\R-3.4.3 (or whichever version you are running)
- Add an R_USER Windows environment variable with the path C:\Users\yourUserName\AppData\Local\Continuum\Anaconda3\Lib\site-packages\rpy2
Additional troubleshooting
If you're still having trouble getting R to communicate with Python, you can try pointing Python directly to your R installation path.
- Run
R.home()
in R. - Run
import os
in Python. - Run
os.environ['R_HOME'] = '/Library/Frameworks/R.framework/Resources'
in Python, substituting the file path you found in step 1.
Load packages
Now open up your Python interface of choice (Jupyter notebook, Spyder, etc) and import rpy2
into your session.
import rpy2
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
Load the base R functionality, using the rpy2
function importr()
.
base = importr('base')
utils = importr('utils')
stats = importr('stats')
The basic syntax for running R code via rpy2
is package.function(inputs)
,
where package
is the R package in use, function
is the name of the function
within the R package, and inputs
are the inputs to the function. In other
words, it's very similar to running code in R as package::function(inputs)
.
For example:
stats.rnorm(6, 0, 1)
FloatVector with 6 elements.
Suppress R warnings. This step can be skipped, but will result in messages getting passed through from R that Python will interpret as warnings.
from rpy2.rinterface_lib.callbacks import logger as rpy2_logger
import logging
rpy2_logger.setLevel(logging.ERROR)
Install the neonUtilities
R package. Here I've specified the RStudio
CRAN mirror as the source, but you can use a different one if you
prefer.
You only need to do this step once to use the package, but we update
the neonUtilities
package every few months, so reinstalling
periodically is recommended.
This installation step carries out the same steps in the same places on
your hard drive that it would if run in R directly, so if you use R
regularly and have already installed neonUtilities
on your machine,
you can skip this step. And be aware, this also means if you install
other packages, or new versions of packages, via rpy2
, they'll
be updated the next time you use R, too.
The semicolon at the end of the line (here, and in some other function calls below) can be omitted. It suppresses a note indicating the output of the function is null. The output is null because these functions download or modify files on your local drive, but none of the data are read into the Python or R environments.
utils.install_packages('neonUtilities', repos='https://cran.rstudio.com/');
The downloaded binary packages are in
/var/folders/_k/gbjn452j1h3fk7880d5ppkx1_9xf6m/T//Rtmpl5OpMA/downloaded_packages
Now load the neonUtilities
package. This does need to be run every time
you use the code; if you're familiar with R, importr()
is roughly
equivalent to the library()
function in R.
neonUtilities = importr('neonUtilities')
Join data files: stackByTable()
The function stackByTable()
in neonUtilities
merges the monthly,
site-level files the NEON Data Portal
provides. Start by downloading the dataset you're interested in from the
Portal. Here, we'll assume you've downloaded IR Biological Temperature.
It will download as a single zip file named NEON_temp-bio.zip
. Note the
file path it's saved to and proceed.
Run the stackByTable()
function to stack the data. It requires only one
input, the path to the zip file you downloaded from the NEON Data Portal.
Modify the file path in the code below to match the path on your machine.
For additional, optional inputs to stackByTable()
, see the R tutorial
for neonUtilities.
neonUtilities.stackByTable(filepath='/Users/Shared/NEON_temp-bio.zip');
Stacking operation across a single core.
Stacking table IRBT_1_minute
Stacking table IRBT_30_minute
Merged the most recent publication of sensor position files for each site and saved to /stackedFiles
Copied the most recent publication of variable definition file to /stackedFiles
Finished: Stacked 2 data tables and 3 metadata tables!
Stacking took 2.019079 secs
All unzipped monthly data folders have been removed.
Check the folder containing the original zip file from the Data Portal;
you should now have a subfolder containing the unzipped and stacked files
called stackedFiles
. To import these data to Python, skip ahead to the
"Read downloaded and stacked files into Python" section; to learn how to
use neonUtilities
to download data, proceed to the next section.
Download files to be stacked: zipsByProduct()
The function zipsByProduct()
uses the NEON API to programmatically download
data files for a given product. The files downloaded by zipsByProduct()
can then be fed into stackByTable()
.
Run the downloader with these inputs: a data product ID (DPID), a set of 4-letter site IDs (or "all" for all sites), a download package (either basic or expanded), the filepath to download the data to, and an indicator to check the size of your download before proceeding or not (TRUE/FALSE).
The DPID is the data product identifier, and can be found in the data product box on the NEON Explore Data page. Here we'll download Breeding landbird point counts, DP1.10003.001.
There are two differences relative to running zipsByProduct()
in R directly:
-
check.size
becomescheck_size
, because dots have programmatic meaning in Python -
TRUE
(orT
) becomes'TRUE'
because the values TRUE and FALSE don't have special meaning in Python the way they do in R, so it interprets them as variables if they're unquoted.
check_size='TRUE'
does not work correctly in the Python environment. In R,
it estimates the size of the download and asks you to confirm before
proceeding, and the interactive question and answer don't work correctly
outside R. Set check_size='FALSE'
to avoid this problem, but be thoughtful
about the size of your query since it will proceed to download without checking.
neonUtilities.zipsByProduct(dpID='DP1.10003.001',
site=base.c('HARV','BART'),
savepath='/Users/Shared',
package='basic',
check_size='FALSE');
Finding available files
|======================================================================| 100%
Downloading files totaling approximately 4.217543 MB
Downloading 18 files
|======================================================================| 100%
18 files successfully downloaded to /Users/Shared/filesToStack10003
The message output by zipsByProduct()
indicates the file path where the
files have been downloaded.
Now take that file path and pass it to stackByTable()
.
neonUtilities.stackByTable(filepath='/Users/Shared/filesToStack10003');
Unpacking zip files using 1 cores.
Stacking operation across a single core.
Stacking table brd_countdata
Stacking table brd_perpoint
Copied the most recent publication of validation file to /stackedFiles
Copied the most recent publication of categoricalCodes file to /stackedFiles
Copied the most recent publication of variable definition file to /stackedFiles
Finished: Stacked 2 data tables and 4 metadata tables!
Stacking took 0.4586661 secs
All unzipped monthly data folders have been removed.
Read downloaded and stacked files into Python
We've downloaded biological temperature and bird data, and merged the site by month files. Now let's read those data into Python so you can proceed with analyses.
First let's take a look at what's in the output folders.
import os
os.listdir('/Users/Shared/filesToStack10003/stackedFiles/')
['categoricalCodes_10003.csv',
'issueLog_10003.csv',
'brd_countdata.csv',
'brd_perpoint.csv',
'readme_10003.txt',
'variables_10003.csv',
'validation_10003.csv']
os.listdir('/Users/Shared/NEON_temp-bio/stackedFiles/')
['IRBT_1_minute.csv',
'sensor_positions_00005.csv',
'issueLog_00005.csv',
'IRBT_30_minute.csv',
'variables_00005.csv',
'readme_00005.txt']
Each data product folder contains a set of data files and metadata files. Here, we'll read in the data files and take a look at the contents; for more details about the contents of NEON data files and how to interpret them, see the Download and Explore tutorial.
There are a variety of modules and methods for reading tabular data into
Python; here we'll use the pandas
module, but feel free to use your own
preferred method.
First, let's read in the two data tables in the bird data:
brd_countdata
and brd_perpoint
.
import pandas
brd_perpoint = pandas.read_csv('/Users/Shared/filesToStack10003/stackedFiles/brd_perpoint.csv')
brd_countdata = pandas.read_csv('/Users/Shared/filesToStack10003/stackedFiles/brd_countdata.csv')
And take a look at the contents of each file. For descriptions and unit of each
column, see the variables_10003
file.
brd_perpoint
uid | namedLocation | domainID | siteID | plotID | plotType | pointID | nlcdClass | decimalLatitude | decimalLongitude | ... | endRH | observedHabitat | observedAirTemp | kmPerHourObservedWindSpeed | laboratoryName | samplingProtocolVersion | remarks | measuredBy | publicationDate | release | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 32ab1419-b087-47e1-829d-b1a67a223a01 | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | C1 | evergreenForest | 44.060146 | -71.315479 | ... | 56.0 | evergreen forest | 18.0 | 1.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vG | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
1 | f02e2458-caab-44d8-a21a-b3b210b71006 | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | B1 | evergreenForest | 44.060146 | -71.315479 | ... | 56.0 | deciduous forest | 19.0 | 3.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vG | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
2 | 58ccefb8-7904-4aa6-8447-d6f6590ccdae | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | A1 | evergreenForest | 44.060146 | -71.315479 | ... | 56.0 | mixed deciduous/evergreen forest | 17.0 | 0.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vG | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
3 | 1b14ead4-03fc-4d47-bd00-2f6e31cfe971 | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | A2 | evergreenForest | 44.060146 | -71.315479 | ... | 56.0 | deciduous forest | 19.0 | 0.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vG | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
4 | 3055a0a5-57ae-4e56-9415-eeb7704fab02 | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | B2 | evergreenForest | 44.060146 | -71.315479 | ... | 56.0 | deciduous forest | 16.0 | 0.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vG | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1405 | 56d2f3b3-3ee5-41b9-ae22-e78a814d83e4 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | A2 | evergreenForest | 42.451400 | -72.250100 | ... | 71.0 | mixed deciduous/evergreen forest | 16.0 | 1.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vK | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
1406 | 8f61949b-d0cc-49c2-8b59-4e2938286da0 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | A3 | evergreenForest | 42.451400 | -72.250100 | ... | 71.0 | mixed deciduous/evergreen forest | 17.0 | 0.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vK | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
1407 | 36574bab-3725-44d4-b96c-3fc6dcea0765 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | B3 | evergreenForest | 42.451400 | -72.250100 | ... | 71.0 | mixed deciduous/evergreen forest | 19.0 | 0.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vK | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
1408 | eb6dcb4a-cc6c-4ec1-9ee2-6932b7aefc54 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | A1 | evergreenForest | 42.451400 | -72.250100 | ... | 71.0 | deciduous forest | 19.0 | 2.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vK | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
1409 | 51ff3c20-397f-4c88-84e9-f34c2f52d6a8 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | B2 | evergreenForest | 42.451400 | -72.250100 | ... | 71.0 | evergreen forest | 19.0 | 3.0 | Bird Conservancy of the Rockies | NEON.DOC.014041vK | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
1410 rows × 31 columns
brd_countdata
uid | namedLocation | domainID | siteID | plotID | plotType | pointID | startDate | eventID | pointCountMinute | ... | vernacularName | observerDistance | detectionMethod | visualConfirmation | sexOrAge | clusterSize | clusterCode | identifiedBy | publicationDate | release | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4e22256f-5e86-4a2c-99be-dd1c7da7af28 | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | C1 | 2015-06-14T09:23Z | BART_025.C1.2015-06-14 | 1 | ... | Black-capped Chickadee | 42.0 | singing | No | Male | 1.0 | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
1 | 93106c0d-06d8-4816-9892-15c99de03c91 | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | C1 | 2015-06-14T09:23Z | BART_025.C1.2015-06-14 | 1 | ... | Red-eyed Vireo | 9.0 | singing | No | Male | 1.0 | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
2 | 5eb23904-9ae9-45bf-af27-a4fa1efd4e8a | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | C1 | 2015-06-14T09:23Z | BART_025.C1.2015-06-14 | 2 | ... | Black-and-white Warbler | 17.0 | singing | No | Male | 1.0 | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
3 | 99592c6c-4cf7-4de8-9502-b321e925684d | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | C1 | 2015-06-14T09:23Z | BART_025.C1.2015-06-14 | 2 | ... | Black-throated Green Warbler | 50.0 | singing | No | Male | 1.0 | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
4 | 6c07d9fb-8813-452b-8182-3bc5e139d920 | BART_025.birdGrid.brd | D01 | BART | BART_025 | distributed | C1 | 2015-06-14T09:23Z | BART_025.C1.2015-06-14 | 1 | ... | Black-throated Green Warbler | 12.0 | singing | No | Male | 1.0 | NaN | JRUEB | 20211222T013942Z | RELEASE-2022 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15378 | cffdd5e4-f664-411b-9aea-e6265071332a | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | B2 | 2022-06-12T13:31Z | HARV_021.B2.2022-06-12 | 3 | ... | Belted Kingfisher | 37.0 | calling | No | Unknown | 1.0 | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
15379 | 92b58b34-077f-420a-871d-116ac5b1c98a | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | B2 | 2022-06-12T13:31Z | HARV_021.B2.2022-06-12 | 5 | ... | Common Yellowthroat | 8.0 | calling | Yes | Male | 1.0 | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
15380 | 06ccb684-da77-4cdf-a8f7-b0d9ac106847 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | B2 | 2022-06-12T13:31Z | HARV_021.B2.2022-06-12 | 1 | ... | Ovenbird | 28.0 | singing | No | Unknown | 1.0 | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
15381 | 0254f165-0052-406e-b9ae-b76ef4109df1 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | B2 | 2022-06-12T13:31Z | HARV_021.B2.2022-06-12 | 2 | ... | Veery | 50.0 | calling | No | Unknown | 1.0 | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
15382 | 432c797d-c4ea-4bfd-901c-5c2481b845c4 | HARV_021.birdGrid.brd | D01 | HARV | HARV_021 | distributed | B2 | 2022-06-12T13:31Z | HARV_021.B2.2022-06-12 | 4 | ... | Pine Warbler | 29.0 | singing | No | Unknown | 1.0 | NaN | KKLAP | 20221129T224415Z | PROVISIONAL |
15383 rows × 24 columns
And now let's do the same with the 30-minute data table for biological temperature.
IRBT30 = pandas.read_csv('/Users/Shared/NEON_temp-bio/stackedFiles/IRBT_30_minute.csv')
IRBT30
domainID | siteID | horizontalPosition | verticalPosition | startDateTime | endDateTime | bioTempMean | bioTempMinimum | bioTempMaximum | bioTempVariance | bioTempNumPts | bioTempExpUncert | bioTempStdErMean | finalQF | publicationDate | release | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | D18 | BARR | 0 | 10 | 2021-09-01T00:00:00Z | 2021-09-01T00:30:00Z | 7.82 | 7.43 | 8.39 | 0.03 | 1800.0 | 0.60 | 0.00 | 0 | 20211219T025212Z | PROVISIONAL |
1 | D18 | BARR | 0 | 10 | 2021-09-01T00:30:00Z | 2021-09-01T01:00:00Z | 7.47 | 7.16 | 7.75 | 0.01 | 1800.0 | 0.60 | 0.00 | 0 | 20211219T025212Z | PROVISIONAL |
2 | D18 | BARR | 0 | 10 | 2021-09-01T01:00:00Z | 2021-09-01T01:30:00Z | 7.43 | 6.89 | 8.11 | 0.07 | 1800.0 | 0.60 | 0.01 | 0 | 20211219T025212Z | PROVISIONAL |
3 | D18 | BARR | 0 | 10 | 2021-09-01T01:30:00Z | 2021-09-01T02:00:00Z | 7.36 | 6.78 | 8.15 | 0.06 | 1800.0 | 0.60 | 0.01 | 0 | 20211219T025212Z | PROVISIONAL |
4 | D18 | BARR | 0 | 10 | 2021-09-01T02:00:00Z | 2021-09-01T02:30:00Z | 6.91 | 6.50 | 7.27 | 0.03 | 1800.0 | 0.60 | 0.00 | 0 | 20211219T025212Z | PROVISIONAL |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13099 | D18 | BARR | 3 | 0 | 2021-11-30T21:30:00Z | 2021-11-30T22:00:00Z | -14.62 | -14.78 | -14.46 | 0.00 | 1800.0 | 0.57 | 0.00 | 0 | 20211206T221914Z | PROVISIONAL |
13100 | D18 | BARR | 3 | 0 | 2021-11-30T22:00:00Z | 2021-11-30T22:30:00Z | -14.59 | -14.72 | -14.50 | 0.00 | 1800.0 | 0.57 | 0.00 | 0 | 20211206T221914Z | PROVISIONAL |
13101 | D18 | BARR | 3 | 0 | 2021-11-30T22:30:00Z | 2021-11-30T23:00:00Z | -14.56 | -14.65 | -14.45 | 0.00 | 1800.0 | 0.57 | 0.00 | 0 | 20211206T221914Z | PROVISIONAL |
13102 | D18 | BARR | 3 | 0 | 2021-11-30T23:00:00Z | 2021-11-30T23:30:00Z | -14.50 | -14.60 | -14.39 | 0.00 | 1800.0 | 0.57 | 0.00 | 0 | 20211206T221914Z | PROVISIONAL |
13103 | D18 | BARR | 3 | 0 | 2021-11-30T23:30:00Z | 2021-12-01T00:00:00Z | -14.45 | -14.57 | -14.32 | 0.00 | 1800.0 | 0.57 | 0.00 | 0 | 20211206T221914Z | PROVISIONAL |
13104 rows × 16 columns
Download remote sensing files: byFileAOP()
The function byFileAOP()
uses the NEON API
to programmatically download data files for remote sensing (AOP) data
products. These files cannot be stacked by stackByTable()
because they
are not tabular data. The function simply creates a folder in your working
directory and writes the files there. It preserves the folder structure
for the subproducts.
The inputs to byFileAOP()
are a data product ID, a site, a year,
a filepath to save to, and an indicator to check the size of the
download before proceeding, or not. As above, set check_size="FALSE"
when working in Python. Be especially cautious about download size
when downloading AOP data, since the files are very large.
Here, we'll download Ecosystem structure (Canopy Height Model) data from Hopbrook (HOPB) in 2017.
neonUtilities.byFileAOP(dpID='DP3.30015.001', site='HOPB',
year='2017', check_size='FALSE',
savepath='/Users/Shared');
Downloading files totaling approximately 147.930656 MB
Downloading 217 files
|======================================================================| 100%
Successfully downloaded 217 files to /Users/Shared/DP3.30015.001
Let's read one tile of data into Python and view it. We'll use the
rasterio
and matplotlib
modules here, but as with tabular data,
there are other options available.
import rasterio
CHMtile = rasterio.open('/Users/Shared/DP3.30015.001/neon-aop-products/2017/FullSite/D01/2017_HOPB_2/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D01_HOPB_DP3_718000_4709000_CHM.tif')
import matplotlib.pyplot as plt
from rasterio.plot import show
fig, ax = plt.subplots(figsize = (8,3))
show(CHMtile)
<AxesSubplot:>
fig
Get Lesson Code
Using an API Token when Accessing NEON Data with neonUtilities
Authors: [Claire K. Lunch]
Last Updated: Nov 23, 2020
NEON data can be downloaded from either the NEON Data Portal or the NEON API. When downloading from the Data Portal, you can create a user account. Read about the benefits of an account on the User Account page. You can also use your account to create a token for using the API. Your token is unique to your account, so don't share it.
Using a token is optional! You can download data without a token, and without a user account. Using a token when downloading data via the API, including when using the neonUtilities package, links your downloads to your user account, as well as enabling faster download speeds. For more information about token usage and benefits, see the NEON API documentation page.
For now, in addition to faster downloads, using a token helps NEON to track data downloads. Using anonymized user information, we can then calculate data access statistics, such as which data products are downloaded most frequently, which data products are downloaded in groups by the same users, and how many users in total are downloading data. This information helps NEON to evaluate the growth and reach of the observatory, and to advocate for training activities, workshops, and software development.
Tokens can be used whenever you use the NEON API. In this tutorial, we'll focus on using tokens with the neonUtilities R package.
Objectives
After completing this activity, you will be able to:
- Create a NEON API token
- Use your token when downloading data with neonUtilities
Things You’ll Need To Complete This Tutorial
You will need a version of R (3.4.1 or higher) and, preferably, RStudio
loaded on your computer to complete this tutorial.
Install R Packages
-
neonUtilities:
install.packages("neonUtilities")
Additional Resources
If you've never downloaded NEON data using the neonUtilities package before, we recommend starting with the Download and Explore tutorial before proceeding with this tutorial.
In the next sections, we'll get an API token from the NEON Data Portal, and then use it in neonUtilities when downloading data.
Get a NEON API Token
The first step is create a NEON user account, if you don't have one. Follow the instructions on the Data Portal User Accounts page. If you do already have an account, go to the NEON Data Portal, sign in, and go to your My Account profile page.
Once you have an account, you can create an API token for yourself. At the bottom of the My Account page, you should see this bar:

Click the 'GET API TOKEN' button. After a moment, you should see this:

Click on the Copy button to copy your API token to the clipboard:

Use API token in neonUtilities
In the next section, we'll walk through saving your token somewhere secure but accessible to your code. But first let's try out using the token the easy way.
First, we need to load the neonUtilities
package and set the working
directory:
# install neonUtilities - can skip if already installed, but
# API tokens are only enabled in neonUtilities v1.3.4 and higher
# if your version number is lower, re-install
install.packages("neonUtilities")
# load neonUtilities
library(neonUtilities)
# set working directory
wd <- "~/data" # this will depend on your local machine
setwd(wd)
NEON API tokens are very long, so it would be annoying to keep pasting the entire text string into functions. Assign your token an object name:
NEON_TOKEN <- "PASTE YOUR TOKEN HERE"
Now we'll use the loadByProduct()
function to download data. Your
API token is entered as the optional token
input parameter. For
this example, we'll download Plant foliar traits (DP1.10026.001).
foliar <- loadByProduct(dpID="DP1.10026.001", site="all",
package="expanded", check.size=F,
token=NEON_TOKEN)
You should now have data saved in the foliar
object; the API
silently used your token. If you've downloaded data without a
token before, you may notice this is faster!
This format applies to all neonUtilities
functions that involve
downloading data or otherwise accessing the API; you can use the
token
input with all of them. For example, when downloading
remote sensing data:
chm <- byTileAOP(dpID="DP3.30015.001", site="WREF",
year=2017, check.size=F,
easting=c(571000,578000),
northing=c(5079000,5080000),
savepath=wd,
token=NEON_TOKEN)
Token management for open code
Your API token is unique to your account, so don't share it!
If you're writing code that will be shared with colleagues or available
publicly, such as in a GitHub repository or supplemental materials of a
published paper, you can't include the line of code above where we assigned
your token to NEON_TOKEN
, since your token is fully visible in the code
there. Instead, you'll need to save your token locally on your computer,
and pull it into your code without displaying it. There are a few ways to
do this, we'll show two options here.
-
Option 1: Save the token in a local file, and
source()
that file at the start of every script. This is fairly simple but requires a line of code in every script. -
Option 2: Add the token to a
.Renviron
file to create an environment variable that gets loaded when you open R. This is a little harder to set up initially, but once it's done, it's done globally, and it will work in every script you run.
Option 1: Save token in a local file
Open a new, empty R script (.R). Put a single line of code in the script:
NEON_TOKEN <- "PASTE YOUR TOKEN HERE"
Save this file in a logical place on your machine, somewhere that won't be
visible publicly. Here, let's call the file neon_token_source.R
, and
save it to the working directory. Then, at the start of
every script where you're going to use the NEON API, you would run this line
of code:
source(paste0(wd, "/neon_token_source.R"))
Then you'll be able to use token=NEON_TOKEN
when you run neonUtilities
functions, and you can share your code without accidentally sharing your
token.
Option 2: Save token to the R environment
To create a persistent environment variable, we use a .Renviron
file.
Before creating a file, check which directory R is using as your home
directory:
# For Windows:
Sys.getenv("R_USER")
# For Mac/Linux:
Sys.getenv("HOME")
Check the home directory to see if you already have a .Renviron
file, using
the file browse pane in RStudio, or using another file browse method with
hidden files shown. Files that begin with .
are hidden by default, but
RStudio recognizes files that begin with .R
and displays them.

If you already have a .Renviron
file, open it and follow the instructions
below to add to it. If you don't have one, create one using File -> New File
-> Text File in the RStudio menus.
Add one line to the text file. In this option, there are no quotes around the token value.
NEON_TOKEN=PASTE YOUR TOKEN HERE
Save the file as .Renviron
, in the RStudio home directory identified above.
Double check the spelling, this will not work if you have a typo. Re-start
R to load the environment.
Once your token is assigned to an environment variable, use the function
Sys.getenv()
to access it. For example, in loadByProduct()
:
foliar <- loadByProduct(dpID="DP1.10026.001", site="all",
package="expanded", check.size=F,
token=Sys.getenv("NEON_TOKEN"))
Get Lesson Code
Access and Work with NEON Geolocation Data
Authors: Claire K. Lunch
Last Updated: Mar 21, 2023
This tutorial explores NEON geolocation data. The focus is on the locations of NEON observational sampling and sensor data; NEON remote sensing data are inherently spatial and have dedicated tutorials. If you are interested in connecting remote sensing with ground-based measurements, the methods in the vegetation structure and canopy height model tutorial can be generalized to other data products.
In planning your analyses, consider what level of spatial resolution is required. There is no reason to carefully map each measurement if precise spatial locations aren't required to address your hypothesis! For example, if you want to use the Vegetation structure data product to calculate a site-scale estimate of biomass and production, the spatial coordinates of each tree are probably not needed. If you want to explore relationships between vegetation and beetle communities, you will need to identify the sampling plots where NEON measures both beetles and vegetation, but finer-scale coordinates may not be needed. Finally, if you want to relate vegetation measurements to airborne remote sensing data, you will need very accurate coordinates for each measurement on the ground.
Learning Objectives
After completing this tutorial you will be able to:
- access NEON spatial data through data downloaded with the neonUtilities package.
- access and plot specific sampling locations for TOS data products.
- access and use sensor location data.
Things You’ll Need To Complete This Tutorial
R Programming Language
You will need a current version of R to complete this tutorial. We also recommend the RStudio IDE to work with R.
Setup R Environment
We'll need several R packages in this tutorial. Install the packages, if not already installed, and load the libraries for each.
# run once to get the package, and re-run if you need to get updates
install.packages("ggplot2") # plotting
install.packages("neonUtilities") # work with NEON data
install.packages("neonOS") # work with NEON observational data
install.packages("devtools") # to use the install_github() function
devtools::install_github("NEONScience/NEON-geolocation/geoNEON") # work with NEON spatial data
# run every time you start a script
library(ggplot2)
library(neonUtilities)
library(neonOS)
library(geoNEON)
options(stringsAsFactors=F)
Locations for observational data
Plot level locations
Both aquatic and terrestrial observational data downloads include spatial data in the downloaded files. The spatial data in the aquatic data files are the most precise locations available for the sampling events. The spatial data in the terrestrial data downloads represent the locations of the sampling plots. In some cases, the plot is the most precise location available, but for many terrestrial data products, more precise locations can be calculated for specific sampling events.
Here, we'll download the Vegetation structure (DP1.10098.001) data product, examine the plot location data in the download, then calculate the locations of individual trees. These steps can be extrapolated to other terrestrial observational data products; the specific sampling layout varies from data product to data product, but the methods for working with the data are similar.
First, let's download the vegetation structure data from one site, Wind River Experimental Forest (WREF).
If downloading data using the neonUtilities
package is new to you, check out
the Download and Explore tutorial.
# load veg structure data
vst <- loadByProduct(dpID="DP1.10098.001",
site="WREF",
check.size=F)
Data downloaded this way are stored in R as a large list. For this tutorial, we'll work with the individual dataframes within this large list. Alternatively, each dataframe can be assigned as its own object.
To find the spatial data for any given data product, view the variables files to figure out which data table the spatial data are contained in.
View(vst$variables_10098)
Looking through the variables, we can see that the spatial data
(decimalLatitude
and decimalLongitude
, etc) are in the
vst_perplotperyear
table. Let's take a look at
the table.
View(vst$vst_perplotperyear)
As noted above, the spatial data here are at the plot level; the
latitude and longitude represent the centroid of the sampling plot.
We can map these plots on the landscape using the easting
and
northing
variables; these are the UTM coordinates. At this site,
tower plots are 40 m x 40 m, and distributed plots are 20 m x 20 m;
we can use the symbols()
function to draw boxes of the correct
size.
We'll also use the treesPresent
variable to subset to only
those plots where trees were found and measured.
# start by subsetting data to plots with trees
vst.trees <- vst$vst_perplotperyear[which(
vst$vst_perplotperyear$treesPresent=="Y"),]
# make variable for plot sizes
plot.size <- numeric(nrow(vst.trees))
# populate plot sizes in new variable
plot.size[which(vst.trees$plotType=="tower")] <- 40
plot.size[which(vst.trees$plotType=="distributed")] <- 20
# create map of plots
symbols(vst.trees$easting,
vst.trees$northing,
squares=plot.size, inches=F,
xlab="Easting", ylab="Northing")
We can see where the plots are located across the landscape, and we can see the denser cluster of plots in the area near the micrometeorology tower.
For many analyses, this level of spatial data may be sufficient. Calculating the precise location of each tree is only required for certain hypotheses; consider whether you need these data when working with a data product with plot-level spatial data.
Looking back at the variables_10098
table, notice that there is
a table in this data product called vst_mappingandtagging
,
suggesting we can find mapping data there. Let's take a look.
View(vst$vst_mappingandtagging)
Here we see data fields for stemDistance
and stemAzimuth
. Looking
back at the variables_10098
file, we see these fields contain the
distance and azimuth from a pointID
to a specific stem. To calculate
the precise coordinates of each tree, we would need to get the locations
of the pointID
s, and then adjust the coordinates based on distance and
azimuth. The Data Product User Guide describes how to carry out these
steps, and can be downloaded from the
Data Product Details page.
However, carrying out these calculations yourself is not the only option!
The geoNEON
package contains a function that can do this for you, for
the TOS data products with location data more precise than the plot level.
Sampling locations
The getLocTOS()
function in the geoNEON
package uses the NEON API to
access NEON location data and then makes protocol-specific calculations
to return precise locations for each sampling effort. This function works for a
subset of NEON TOS data products. The list of tables and data products that can
be entered is in the
package documentation on GitHub.
For more information about the NEON API, see the API tutorial and the API web page. For more information about the location calculations used in each data product, see the Data Product User Guide for each product.
The getLocTOS()
function requires two inputs:
- A data table that contains spatial data from a NEON TOS data product
- The NEON table name of that data table
For vegetation structure locations, the function call looks like this. This function may take a while to download all the location data. For faster downloads, use an API token.
# calculate individual tree locations
vst.loc <- getLocTOS(data=vst$vst_mappingandtagging,
dataProd="vst_mappingandtagging")
What additional data are now available in the data obtained by getLocTOS()
?
# print variable names that are new
names(vst.loc)[which(!names(vst.loc) %in%
names(vst$vst_mappingandtagging))]
## [1] "utmZone" "adjNorthing" "adjEasting"
## [4] "adjCoordinateUncertainty" "adjDecimalLatitude" "adjDecimalLongitude"
## [7] "adjElevation" "adjElevationUncertainty"
Now we have adjusted latitude, longitude, and elevation, and the corresponding easting and northing UTM data. We also have coordinate uncertainty data for these coordinates.
As we did with the plots above, we can use the easting and northing data to plot the locations of the individual trees.
plot(vst.loc$adjEasting, vst.loc$adjNorthing,
pch=".", xlab="Easting", ylab="Northing")
We can see the mapped trees in the same plots we mapped above.
We've plotted each individual tree as a .
, so all we can see at
this scale is the cluster of dots that make up each plot.
Let's zoom in on a single plot:
plot(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")],
vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")],
pch=20, xlab="Easting", ylab="Northing")
Now we can see the location of each tree within the sampling plot WREF_085. This is interesting, but it would be more interesting if we could see more information about each tree. How are species distributed across the plot, for instance?
We can plot the tree species at each location using the text()
function
and the vst.loc$taxonID
field.
plot(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")],
vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")],
type="n", xlab="Easting", ylab="Northing")
text(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")],
vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")],
labels=vst.loc$taxonID[which(vst.loc$plotID=="WREF_085")],
cex=0.5)
Almost all of the mapped trees in this plot are either Pseudotsuga menziesii or Tsuga heterophylla (Douglas fir and Western hemlock), not too surprising at Wind River.
But suppose we want to map the diameter of each tree? This is a very common way to present a stem map, it gives a visual as if we were looking down on the plot from overhead and had cut off each tree at its measurement height.
Other than taxon, the attributes of the trees, such as diameter, height,
growth form, and canopy position, are found in the vst_apparentindividual
table, not in the vst_mappingandtagging
table. We'll need to join the
two tables to get the tree attributes together with their mapped locations.
The neonOS
package contains the function joinTableNEON()
, which can be
used to do this. See the tutorial for the neonOS
package for more details
about this function.
veg <- joinTableNEON(vst.loc,
vst$vst_apparentindividual,
name1="vst_mappingandtagging",
name2="vst_apparentindividual")
Now we can use the symbols()
function to plot the diameter of each tree,
at its spatial coordinates, to create a correctly scaled map of boles in
the plot. Note that stemDiameter
is in centimeters, while easting and
northing UTMs are in meters, so we divide by 100 to scale correctly.
symbols(veg$adjEasting[which(veg$plotID=="WREF_085")],
veg$adjNorthing[which(veg$plotID=="WREF_085")],
circles=veg$stemDiameter[which(veg$plotID=="WREF_085")]/100/2,
inches=F, xlab="Easting", ylab="Northing")
If you are interested in taking the vegetation structure data a step further, and connecting measurements of trees on the ground to remotely sensed Lidar data, check out the Vegetation Structure and Canopy Height Model tutorial.
If you are interested in working with other terrestrial observational (TOS) data products, the basic techniques used here to find precise sampling locations and join data tables can be adapted to other TOS data products. Consult the Data Product User Guide for each data product to find details specific to that data product.
Locations for sensor data
Downloads of instrument system (IS) data include a file called sensor_positions.csv. The sensor positions file contains information about the coordinates of each sensor, relative to a reference location.
While the specifics vary, techniques are generalizable for working with sensor
data and the sensor_positions.csv file. For this tutorial, let's look at the
sensor locations for soil temperature (PAR; DP1.00041.001) at
the NEON Treehaven site (TREE) in July 2018. To reduce our file size, we'll use
the 30 minute averaging interval. Our final product from this section is to
create a depth profile of soil temperature in one soil plot.
If downloading data using the neonUtilties
package is new to you, check out the
neonUtilities tutorial.
This function will download about 7 MB of data as written so we have check.size =F
for ease of running the code.
# load soil temperature data of interest
soilT <- loadByProduct(dpID="DP1.00041.001", site="TREE",
startdate="2018-07", enddate="2018-07",
timeIndex=30, check.size=F)
## Attempting to stack soil sensor data. Note that due to the number of soil sensors at each site, data volume is very high for these data. Consider dividing data processing into chunks, using the nCores= parameter to parallelize stacking, and/or using a high-performance system.
Sensor positions file
Now we can specifically look at the sensor positions file.
# create object for sensor positions file
pos <- soilT$sensor_positions_00041
# view column names
names(pos)
## [1] "siteID" "HOR.VER"
## [3] "sensorLocationID" "sensorLocationDescription"
## [5] "positionStartDateTime" "positionEndDateTime"
## [7] "referenceLocationID" "referenceLocationIDDescription"
## [9] "referenceLocationIDStartDateTime" "referenceLocationIDEndDateTime"
## [11] "xOffset" "yOffset"
## [13] "zOffset" "pitch"
## [15] "roll" "azimuth"
## [17] "locationReferenceLatitude" "locationReferenceLongitude"
## [19] "locationReferenceElevation" "eastOffset"
## [21] "northOffset" "xAzimuth"
## [23] "yAzimuth" "publicationDate"
# view table
View(pos)
The sensor locations are indexed by the HOR.VER
variable - see the
file naming conventions
page for more details.
Using unique()
we can view all the location indices in this file.
unique(pos$HOR.VER)
## [1] "001.501" "001.502" "001.503" "001.504" "001.505" "001.506" "001.507" "001.508" "001.509" "002.501"
## [11] "002.502" "002.503" "002.504" "002.505" "002.506" "002.507" "002.508" "002.509" "003.501" "003.502"
## [21] "003.503" "003.504" "003.505" "003.506" "003.507" "003.508" "003.509" "004.501" "004.502" "004.503"
## [31] "004.504" "004.505" "004.506" "004.507" "004.508" "004.509" "005.501" "005.502" "005.503" "005.504"
## [41] "005.505" "005.506" "005.507" "005.508" "005.509"
Soil temperature data are collected in 5 instrumented soil plots inside the tower footprint. We see this reflected in the data where HOR = 001 to 005. Within each plot, temperature is measured at 9 depths, seen in VER = 501 to 509. At some sites, the number of depths may differ slightly.
The x, y, and z offsets in the sensor positions file are the relative distance, in meters, to the reference latitude, longitude, and elevation in the file.
The HOR and VER indices in the sensor positions file correspond to the
verticalPosition
and horizontalPosition
fields in soilT$ST_30_minute
.
Note that there are two sets of position data for soil plot 001, and that
one set has a positionEndDateTime
date in the file. This indicates sensors either
moved or were relocated; in this case there was a frost heave incident.
You can read about it in the issue log, which is displayed on the
Data Product Details page,
and also included as a table in the data download:
soilT$issueLog_00041[grep("TREE soil plot 1",
soilT$issueLog_00041$locationAffected),]
## id parentIssueID issueDate resolvedDate dateRangeStart dateRangeEnd
## 1: 9328 NA 2019-05-23T00:00:00Z 2019-05-23T00:00:00Z 2018-11-07T00:00:00Z 2019-04-19T00:00:00Z
## locationAffected
## 1: D05 TREE soil plot 1 measurement levels 1-9 (HOR.VER: 001.501, 001.502, 001.503, 001.504, 001.505, 001.506, 001.507, 001.508, 001.509)
## issue
## 1: Soil temperature sensors were pushed or pulled out of the ground by 3 cm over winter, presumably due to freeze-thaw action. The exact timing of this is unknown, but it occurred sometime between 2018-11-07 and 2019-04-19.
## resolution
## 1: Sensor depths were updated in the database with a start date of 2018-11-07 for the new depths.
Since we're working with data from July 2018, and the change in
sensor locations is dated Nov 2018, we'll use the original locations.
There are a number of ways to drop the later locations from the
table; here, we find the rows in which the positionEndDateTime
field is empty,
indicating no end date, and the rows corresponding to soil plot 001,
and drop all the rows that meet both criteria.
pos <- pos[-intersect(grep("001.", pos$HOR.VER),
which(pos$positionEndDateTime=="")),]
Our goal is to plot a time series of temperature, stratified by depth, so let's start by joining the data file and sensor positions file, to bring the depth measurements into the same data frame with the data.
# paste horizontalPosition and verticalPosition together
# to match HOR.VER in the sensor positions file
soilT$ST_30_minute$HOR.VER <- paste(soilT$ST_30_minute$horizontalPosition,
soilT$ST_30_minute$verticalPosition,
sep=".")
# left join to keep all temperature records
soilTHV <- merge(soilT$ST_30_minute, pos,
by="HOR.VER", all.x=T)
And now we can plot soil temperature over time for each depth.
We'll use ggplot
since it's well suited to this kind of
stratification. Each soil plot is its own panel, and each depth
is its own line:
gg <- ggplot(soilTHV,
aes(endDateTime, soilTempMean,
group=zOffset, color=zOffset)) +
geom_line() +
facet_wrap(~horizontalPosition)
gg
## Warning: Removed 1488 rows containing missing values (`geom_line()`).
We can see that as soil depth increases, temperatures
become much more stable, while the shallowest measurement
has a clear diurnal cycle. We can also see that
something has gone wrong with one of the sensors in plot
002. To remove those data, use only values where the final
quality flag passed, i.e. finalQF
= 0
gg <- ggplot(subset(soilTHV, finalQF==0),
aes(endDateTime, soilTempMean,
group=zOffset, color=zOffset)) +
geom_line() +
facet_wrap(~horizontalPosition)
gg
Get Lesson Code
Work With NEON's Plant Phenology Data
Authors: Megan A. Jones, Natalie Robinson, Lee Stanish
Last Updated: May 13, 2021
Many organisms, including plants, show patterns of change across seasons - the different stages of this observable change are called phenophases. In this tutorial we explore how to work with NEON plant phenophase data.
Objectives
After completing this activity, you will be able to:
- work with NEON Plant Phenology Observation data.
- use dplyr functions to filter data.
- plot time series data in a bar plot using ggplot the function.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
neonUtilities:
install.packages("neonUtilities")
-
ggplot2:
install.packages("ggplot2")
-
dplyr:
install.packages("dplyr")
More on Packages in R – Adapted from Software Carpentry.
Download Data
This tutorial is designed to have you download data directly from the NEON
portal API using the neonUtilities package. However, you can also directly
download this data, prepackaged, from FigShare. This data set includes all the
files needed for the Work with NEON OS & IS Data - Plant Phenology & Temperature
tutorial series. The data are in the format you would receive if downloading them
using the zipsByProduct()
function in the neonUtilities package.
Additional Resources
- NEON data portal
- NEON Plant Phenology Observations data product user guide
- RStudio's data wrangling (dplyr/tidyr) cheatsheet
- NEONScience GitHub Organization
- nneo API wrapper on CRAN
Plants change throughout the year - these are phenophases. Why do they change?
Explore Phenology Data
The following sections provide a brief overview of the NEON plant phenology observation data. When designing a research project using this data, you need to consult the documents associated with this data product and not rely solely on this summary.
The following description of the NEON Plant Phenology Observation data is modified from the data product user guide.
NEON Plant Phenology Observation Data
NEON collects plant phenology data and provides it as NEON data product DP1.10055.001.
The plant phenology observations data product provides in-situ observations of the phenological status and intensity of tagged plants (or patches) during discrete observations events.
Sampling occurs at all terrestrial field sites at site and season specific intervals. During Phase I (dominant species) sampling (pre-2021), three species with 30 individuals each are sampled. In 2021, Phase II (community) sampling will begin, with <=20 species with 5 or more individuals sampled will occur.
Status-based Monitoring
NEON employs status-based monitoring, in which the phenological condition of an individual is reported any time that individual is observed. At every observations bout, records are generated for every phenophase that is occurring and for every phenophase not occurring. With this approach, events (such as leaf emergence in Mediterranean climates, or flowering in many desert species) that may occur multiple times during a single year, can be captured. Continuous reporting of phenophase status enables quantification of the duration of phenophases rather than just their date of onset while allows enabling the explicit quantification of uncertainty in phenophase transition dates that are introduced by monitoring in discrete temporal bouts.
Specific products derived from this sampling include the observed phenophase status (whether or not a phenophase is occurring) and the intensity of phenophases for individuals in which phenophase status = ‘yes’. Phenophases reported are derived from the USA National Phenology Network (USA-NPN) categories. The number of phenophases observed varies by growth form and ranges from 1 phenophase (cactus) to 7 phenophases (semi-evergreen broadleaf). In this tutorial we will focus only on the state of the phenophase, not the phenophase intensity data.
Phenology Transects
Plant phenology observations occurs at all terrestrial NEON sites along an 800 meter square loop transect (primary) and within a 200 m x 200 m plot located within view of a canopy level, tower-mounted, phenology camera.

Timing of Observations
At each site, there are:
- ~50 observation bouts per year.
- no more that 100 sampling points per phenology transect.
- no more than 9 sampling points per phenocam plot.
- 1 annual measurement per year to collect annual size and disease status measurements from each sampling point.
Available Data Tables
In the downloaded data packet, data are available in two main files
- phe_statusintensity: Plant phenophase status and intensity data
- phe_perindividual: Geolocation and taxonomic identification for phenology plants
- phe_perindividualperyear: recorded once a year, essentially the "metadata" about the plant: DBH, height, etc.
There are other files in each download including a readme with information on the data product and the download; a variables file that defines the term descriptions, data types, and units; a validation file with data entry validation and parsing rules; and an XML with machine readable metadata.
Stack NEON Data
NEON data are delivered in a site and year-month format. When you download data, you will get a single zipped file containing a directory for each month and site that you've requested data for. Dealing with these separate tables from even one or two sites over a 12 month period can be a bit overwhelming. Luckily NEON provides an R package neonUtilities that takes the unzipped downloaded file and joining the data files. The teaching data downloaded with this tutorial is already stacked. If you are working with other NEON data, please go through the tutorial to stack the data in R or in Python and then return to this tutorial.
Work with NEON Data
When we do this for phenology data we get three files, one for each data table, with all the data from your site and date range of interest.
First, we need to set up our R environment.
# install needed package (only uncomment & run if not already installed)
#install.packages("neonUtilities")
#install.packages("dplyr")
#install.packages("ggplot2")
# load needed packages
library(neonUtilities)
library(dplyr)
library(ggplot2)
options(stringsAsFactors=F) #keep strings as character type not factors
# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd <- "~/Git/data/" # Change this to match your local environment
setwd(wd)
Let's start by loading our data of interest. For this series, we'll work with date from the NEON Domain 02 sites:
- Blandy Farm (BLAN)
- Smithsonian Conservation Biology Institute (SCBI)
- Smithsonian Environmental Research Center (SERC)
And we'll use data from January 2017 to December 2019. This downloads over 9MB of data. If this is too large, use a smaller date range. If you opt to do this, your figures and some output may look different later in the tutorial.
With this information, we can download our data using the neonUtilities package.
If you are not using a NEON token to download your data, remove the
token = Sys.getenv(NEON_TOKEN)
line of code (learn more about NEON API tokens
in the
Using an API Token when Accessing NEON Data with neonUtilities tutorial).
If you are using the data downloaded at the start of the tutorial, use the commented out code in the second half of this code chunk.
## Two options for accessing data - programmatic or from the example dataset
# Read data from data portal
phe <- loadByProduct(dpID = "DP1.10055.001", site=c("BLAN","SCBI","SERC"),
startdate = "2017-01", enddate="2019-12",
token = Sys.getenv("NEON_TOKEN"),
check.size = F)
## API token was not recognized. Public rate limit applied.
## Finding available files
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 11%
|
|======= | 12%
|
|======== | 13%
|
|======== | 14%
|
|========= | 15%
|
|========== | 16%
|
|========== | 17%
|
|=========== | 18%
|
|============ | 19%
|
|============ | 20%
|
|============= | 21%
|
|============= | 22%
|
|============== | 23%
|
|=============== | 24%
|
|=============== | 25%
|
|================ | 26%
|
|================= | 27%
|
|================= | 28%
|
|================== | 29%
|
|=================== | 31%
|
|=================== | 32%
|
|==================== | 33%
|
|===================== | 34%
|
|===================== | 35%
|
|====================== | 36%
|
|====================== | 37%
|
|======================= | 38%
|
|======================== | 39%
|
|======================== | 40%
|
|========================= | 41%
|
|========================== | 42%
|
|========================== | 43%
|
|=========================== | 44%
|
|============================ | 45%
|
|============================ | 46%
|
|============================= | 47%
|
|============================== | 48%
|
|============================== | 49%
|
|=============================== | 51%
|
|=============================== | 52%
|
|================================ | 53%
|
|================================= | 54%
|
|================================= | 55%
|
|================================== | 56%
|
|=================================== | 57%
|
|=================================== | 58%
|
|==================================== | 59%
|
|===================================== | 60%
|
|===================================== | 61%
|
|====================================== | 62%
|
|======================================= | 63%
|
|======================================= | 64%
|
|======================================== | 65%
|
|======================================== | 66%
|
|========================================= | 67%
|
|========================================== | 68%
|
|========================================== | 69%
|
|=========================================== | 71%
|
|============================================ | 72%
|
|============================================ | 73%
|
|============================================= | 74%
|
|============================================== | 75%
|
|============================================== | 76%
|
|=============================================== | 77%
|
|================================================ | 78%
|
|================================================ | 79%
|
|================================================= | 80%
|
|================================================= | 81%
|
|================================================== | 82%
|
|=================================================== | 83%
|
|=================================================== | 84%
|
|==================================================== | 85%
|
|===================================================== | 86%
|
|===================================================== | 87%
|
|====================================================== | 88%
|
|======================================================= | 89%
|
|======================================================= | 91%
|
|======================================================== | 92%
|
|========================================================= | 93%
|
|========================================================= | 94%
|
|========================================================== | 95%
|
|========================================================== | 96%
|
|=========================================================== | 97%
|
|============================================================ | 98%
|
|============================================================ | 99%
|
|=============================================================| 100%
##
## Downloading files totaling approximately 7.985319 MB
## Downloading 95 files
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 9%
|
|====== | 10%
|
|====== | 11%
|
|======= | 12%
|
|======== | 13%
|
|======== | 14%
|
|========= | 15%
|
|========== | 16%
|
|========== | 17%
|
|=========== | 18%
|
|============ | 19%
|
|============ | 20%
|
|============= | 21%
|
|============== | 22%
|
|============== | 23%
|
|=============== | 24%
|
|================ | 26%
|
|================ | 27%
|
|================= | 28%
|
|================== | 29%
|
|================== | 30%
|
|=================== | 31%
|
|=================== | 32%
|
|==================== | 33%
|
|===================== | 34%
|
|===================== | 35%
|
|====================== | 36%
|
|======================= | 37%
|
|======================= | 38%
|
|======================== | 39%
|
|========================= | 40%
|
|========================= | 41%
|
|========================== | 43%
|
|=========================== | 44%
|
|=========================== | 45%
|
|============================ | 46%
|
|============================= | 47%
|
|============================= | 48%
|
|============================== | 49%
|
|============================== | 50%
|
|=============================== | 51%
|
|================================ | 52%
|
|================================ | 53%
|
|================================= | 54%
|
|================================== | 55%
|
|================================== | 56%
|
|=================================== | 57%
|
|==================================== | 59%
|
|==================================== | 60%
|
|===================================== | 61%
|
|====================================== | 62%
|
|====================================== | 63%
|
|======================================= | 64%
|
|======================================== | 65%
|
|======================================== | 66%
|
|========================================= | 67%
|
|========================================== | 68%
|
|========================================== | 69%
|
|=========================================== | 70%
|
|=========================================== | 71%
|
|============================================ | 72%
|
|============================================= | 73%
|
|============================================= | 74%
|
|============================================== | 76%
|
|=============================================== | 77%
|
|=============================================== | 78%
|
|================================================ | 79%
|
|================================================= | 80%
|
|================================================= | 81%
|
|================================================== | 82%
|
|=================================================== | 83%
|
|=================================================== | 84%
|
|==================================================== | 85%
|
|===================================================== | 86%
|
|===================================================== | 87%
|
|====================================================== | 88%
|
|======================================================= | 89%
|
|======================================================= | 90%
|
|======================================================== | 91%
|
|======================================================== | 93%
|
|========================================================= | 94%
|
|========================================================== | 95%
|
|========================================================== | 96%
|
|=========================================================== | 97%
|
|============================================================ | 98%
|
|============================================================ | 99%
|
|=============================================================| 100%
##
## Unpacking zip files using 1 cores.
## Stacking operation across a single core.
## Stacking table phe_perindividual
## Stacking table phe_statusintensity
## Stacking table phe_perindividualperyear
## Copied the most recent publication of validation file to /stackedFiles
## Copied the most recent publication of categoricalCodes file to /stackedFiles
## Copied the most recent publication of variable definition file to /stackedFiles
## Finished: Stacked 3 data tables and 3 metadata tables!
## Stacking took 1.46806 secs
# if you aren't sure you can handle the data file size use check.size = T.
# save dataframes from the downloaded list
ind <- phe$phe_perindividual #individual information
status <- phe$phe_statusintensity #status & intensity info
##If choosing to use example dataset downloaded from this tutorial:
# Stack multiple files within the downloaded phenology data
#stackByTable("NEON-pheno-temp-timeseries_v2/filesToStack10055", folder = T)
# read in data - readTableNEON uses the variables file to assign the correct
# data type for each variable
#ind <- readTableNEON('NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/phe_perindividual.csv', 'NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/variables_10055.csv')
#status <- readTableNEON('NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/phe_statusintensity.csv', 'NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/variables_10055.csv')
Let's explore the data. Let's get to know what the ind
dataframe looks like.
# What are the fieldnames in this dataset?
names(ind)
## [1] "uid" "namedLocation"
## [3] "domainID" "siteID"
## [5] "plotID" "decimalLatitude"
## [7] "decimalLongitude" "geodeticDatum"
## [9] "coordinateUncertainty" "elevation"
## [11] "elevationUncertainty" "subtypeSpecification"
## [13] "transectMeter" "directionFromTransect"
## [15] "ninetyDegreeDistance" "sampleLatitude"
## [17] "sampleLongitude" "sampleGeodeticDatum"
## [19] "sampleCoordinateUncertainty" "sampleElevation"
## [21] "sampleElevationUncertainty" "date"
## [23] "editedDate" "individualID"
## [25] "taxonID" "scientificName"
## [27] "identificationQualifier" "taxonRank"
## [29] "nativeStatusCode" "growthForm"
## [31] "vstTag" "samplingProtocolVersion"
## [33] "measuredBy" "identifiedBy"
## [35] "recordedBy" "remarks"
## [37] "dataQF" "publicationDate"
## [39] "release"
# Unsure of what some of the variables are you? Look at the variables table!
View(phe$variables_10055)
# if using the pre-downloaded data, you need to read in the variables file
# or open and look at it on your desktop
#var <- read.csv('NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/variables_10055.csv')
#View(var)
# how many rows are in the data?
nrow(ind)
## [1] 433
# look at the first six rows of data.
#head(ind) #this is a good function to use but looks messy so not rendering it
# look at the structure of the dataframe.
str(ind)
## 'data.frame': 433 obs. of 39 variables:
## $ uid : chr "76bf37d9-c834-43fc-a430-83d87e4b9289" "cf0239bb-2953-44a8-8fd2-051539be5727" "833e5f41-d5cb-4550-ba60-e6f000a2b1b6" "6c2e348d-d19e-4543-9d22-0527819ee964" ...
## $ namedLocation : chr "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" ...
## $ domainID : chr "D02" "D02" "D02" "D02" ...
## $ siteID : chr "BLAN" "BLAN" "BLAN" "BLAN" ...
## $ plotID : chr "BLAN_061" "BLAN_061" "BLAN_061" "BLAN_061" ...
## $ decimalLatitude : num 39.1 39.1 39.1 39.1 39.1 ...
## $ decimalLongitude : num -78.1 -78.1 -78.1 -78.1 -78.1 ...
## $ geodeticDatum : chr NA NA NA NA ...
## $ coordinateUncertainty : num NA NA NA NA NA NA NA NA NA NA ...
## $ elevation : num 183 183 183 183 183 183 183 183 183 183 ...
## $ elevationUncertainty : num NA NA NA NA NA NA NA NA NA NA ...
## $ subtypeSpecification : chr "primary" "primary" "primary" "primary" ...
## $ transectMeter : num 491 464 537 15 753 506 527 305 627 501 ...
## $ directionFromTransect : chr "Left" "Right" "Left" "Left" ...
## $ ninetyDegreeDistance : num 0.5 4 2 3 2 1 2 3 2 3 ...
## $ sampleLatitude : num NA NA NA NA NA NA NA NA NA NA ...
## $ sampleLongitude : num NA NA NA NA NA NA NA NA NA NA ...
## $ sampleGeodeticDatum : chr "WGS84" "WGS84" "WGS84" "WGS84" ...
## $ sampleCoordinateUncertainty: num NA NA NA NA NA NA NA NA NA NA ...
## $ sampleElevation : num NA NA NA NA NA NA NA NA NA NA ...
## $ sampleElevationUncertainty : num NA NA NA NA NA NA NA NA NA NA ...
## $ date : POSIXct, format: "2016-04-20" ...
## $ editedDate : POSIXct, format: "2016-05-09" ...
## $ individualID : chr "NEON.PLA.D02.BLAN.06290" "NEON.PLA.D02.BLAN.06501" "NEON.PLA.D02.BLAN.06204" "NEON.PLA.D02.BLAN.06223" ...
## $ taxonID : chr "RHDA" "SOAL6" "RHDA" "LOMA6" ...
## $ scientificName : chr "Rhamnus davurica Pall." "Solidago altissima L." "Rhamnus davurica Pall." "Lonicera maackii (Rupr.) Herder" ...
## $ identificationQualifier : chr NA NA NA NA ...
## $ taxonRank : chr "species" "species" "species" "species" ...
## $ nativeStatusCode : chr "I" "N" "I" "I" ...
## $ growthForm : chr "Deciduous broadleaf" "Forb" "Deciduous broadleaf" "Deciduous broadleaf" ...
## $ vstTag : chr NA NA NA NA ...
## $ samplingProtocolVersion : chr NA "NEON.DOC.014040vJ" "NEON.DOC.014040vJ" "NEON.DOC.014040vJ" ...
## $ measuredBy : chr "jcoloso@neoninc.org" "jward@battelleecology.org" "alandes@field-ops.org" "alandes@field-ops.org" ...
## $ identifiedBy : chr "shackley@neoninc.org" "llemmon@field-ops.org" "llemmon@field-ops.org" "llemmon@field-ops.org" ...
## $ recordedBy : chr "shackley@neoninc.org" NA NA NA ...
## $ remarks : chr "Nearly dead shaded out" "no entry" "no entry" "no entry" ...
## $ dataQF : chr NA NA NA NA ...
## $ publicationDate : chr "20201218T103411Z" "20201218T103411Z" "20201218T103411Z" "20201218T103411Z" ...
## $ release : chr "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" ...
Notice that the neonUtilities package read the data type from the variables file and then automatically converts the data to the correct date type in R.
(Note that if you first opened your data file in Excel, you might see 06/14/2014 as the format instead of 2014-06-14. Excel can do some ~~weird~~ interesting things to dates.)
Phenology status
Now let's look at the status data.
# What variables are included in this dataset?
names(status)
## [1] "uid" "namedLocation"
## [3] "domainID" "siteID"
## [5] "plotID" "date"
## [7] "editedDate" "dayOfYear"
## [9] "individualID" "phenophaseName"
## [11] "phenophaseStatus" "phenophaseIntensityDefinition"
## [13] "phenophaseIntensity" "samplingProtocolVersion"
## [15] "measuredBy" "recordedBy"
## [17] "remarks" "dataQF"
## [19] "publicationDate" "release"
nrow(status)
## [1] 219357
#head(status) #this is a good function to use but looks messy so not rendering it
str(status)
## 'data.frame': 219357 obs. of 20 variables:
## $ uid : chr "b69ada55-41d1-41c7-9031-149c54de51f9" "9be6f7ad-4422-40ac-ba7f-e32e0184782d" "58e7aeaf-163c-4ea2-ad75-db79a580f2f8" "efe7ca02-d09e-4964-b35d-aebdac8f3efb" ...
## $ namedLocation : chr "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" ...
## $ domainID : chr "D02" "D02" "D02" "D02" ...
## $ siteID : chr "BLAN" "BLAN" "BLAN" "BLAN" ...
## $ plotID : chr "BLAN_061" "BLAN_061" "BLAN_061" "BLAN_061" ...
## $ date : POSIXct, format: "2017-02-24" ...
## $ editedDate : POSIXct, format: "2017-03-31" ...
## $ dayOfYear : num 55 55 55 55 55 55 55 55 55 55 ...
## $ individualID : chr "NEON.PLA.D02.BLAN.06229" "NEON.PLA.D02.BLAN.06226" "NEON.PLA.D02.BLAN.06222" "NEON.PLA.D02.BLAN.06223" ...
## $ phenophaseName : chr "Leaves" "Leaves" "Leaves" "Leaves" ...
## $ phenophaseStatus : chr "no" "no" "no" "no" ...
## $ phenophaseIntensityDefinition: chr NA NA NA NA ...
## $ phenophaseIntensity : chr NA NA NA NA ...
## $ samplingProtocolVersion : chr NA NA NA NA ...
## $ measuredBy : chr "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" ...
## $ recordedBy : chr "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" ...
## $ remarks : chr NA NA NA NA ...
## $ dataQF : chr "legacyData" "legacyData" "legacyData" "legacyData" ...
## $ publicationDate : chr "20201217T203824Z" "20201217T203824Z" "20201217T203824Z" "20201217T203824Z" ...
## $ release : chr "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" ...
# date range
min(status$date)
## [1] "2017-02-24 GMT"
max(status$date)
## [1] "2019-12-12 GMT"
Clean up the Data
- remove duplicates (full rows)
- convert to date format
- retain only the most recent
editedDate
in the perIndividual and status table.
Remove Duplicates
The individual table (ind) file is included in each site by month-year file. As a result when all the tables are stacked there are many duplicates.
Let's remove any duplicates that exist.
# drop UID as that will be unique for duplicate records
ind_noUID <- select(ind, -(uid))
status_noUID <- select(status, -(uid))
# remove duplicates
## expect many
ind_noD <- distinct(ind_noUID)
nrow(ind_noD)
## [1] 433
status_noD<-distinct(status_noUID)
nrow(status_noD)
## [1] 216837
Variable Overlap between Tables
From the initial inspection of the data we can see there is overlap in variable names between the fields.
Let's see what they are.
# where is there an intersection of names
intersect(names(status_noD), names(ind_noD))
## [1] "namedLocation" "domainID"
## [3] "siteID" "plotID"
## [5] "date" "editedDate"
## [7] "individualID" "samplingProtocolVersion"
## [9] "measuredBy" "recordedBy"
## [11] "remarks" "dataQF"
## [13] "publicationDate" "release"
There are several fields that overlap between the datasets. Some of these are expected to be the same and will be what we join on.
However, some of these will have different values in each table. We want to keep those distinct value and not join on them. Therefore, we can rename these fields before joining:
- date
- editedDate
- measuredBy
- recordedBy
- samplingProtocolVersion
- remarks
- dataQF
- publicationDate
Now we want to rename the variables that would have duplicate names. We can rename all the variables in the status object to have "Stat" at the end of the variable name.
# in Status table rename like columns
status_noD <- rename(status_noD, dateStat=date,
editedDateStat=editedDate, measuredByStat=measuredBy,
recordedByStat=recordedBy,
samplingProtocolVersionStat=samplingProtocolVersion,
remarksStat=remarks, dataQFStat=dataQF,
publicationDateStat=publicationDate)
Filter to last editedDate
The individual (ind) table contains all instances that any of the location or
taxonomy data of an individual was updated. Therefore there are many rows for
some individuals. We only want the latest editedDate
on ind.
# retain only the max of the date for each individualID
ind_last <- ind_noD %>%
group_by(individualID) %>%
filter(editedDate==max(editedDate))
# oh wait, duplicate dates, retain only the most recent editedDate
ind_lastnoD <- ind_last %>%
group_by(editedDate, individualID) %>%
filter(row_number()==1)
Join Dataframes
Now we can join the two data frames on all the variables with the same name.
We use a left_join()
from the dpylr package because we want to match all the
rows from the "left" (first) dataframe to any rows that also occur in the "right"
(second) dataframe.
Check out RStudio's data wrangling (dplyr/tidyr) cheatsheet for other types of joins.
# Create a new dataframe "phe_ind" with all the data from status and some from ind_lastnoD
phe_ind <- left_join(status_noD, ind_lastnoD)
## Joining, by = c("namedLocation", "domainID", "siteID", "plotID", "individualID", "release")
Now that we have clean datasets we can begin looking into our particular data to address our research question: do plants show patterns of changes in phenophase across season?
Patterns in Phenophase
From our larger dataset (several sites, species, phenophases), let's create a
dataframe with only the data from a single site, species, and phenophase and
call it phe_1sp
.
Select Site(s) of Interest
To do this, we'll first select our site of interest. Note how we set this up with an object that is our site of interest. This will allow us to more easily change which site or sites if we want to adapt our code later.
# set site of interest
siteOfInterest <- "SCBI"
# use filter to select only the site of Interest
## using %in% allows one to add a vector if you want more than one site.
## could also do it with == instead of %in% but won't work with vectors
phe_1st <- filter(phe_ind, siteID %in% siteOfInterest)
Select Species of Interest
Now we may only want to view a single species or a set of species. Let's first
look at the species that are present in our data. We could do this just by looking
at the taxonID
field which give the four letter UDSA plant code for each
species. But if we don't know all the plant codes, we can get a bit fancier and
view both
# see which species are present - taxon ID only
unique(phe_1st$taxonID)
## [1] "JUNI" "MIVI" "LITU"
# or see which species are present with taxon ID + species name
unique(paste(phe_1st$taxonID, phe_1st$scientificName, sep=' - '))
## [1] "JUNI - Juglans nigra L."
## [2] "MIVI - Microstegium vimineum (Trin.) A. Camus"
## [3] "LITU - Liriodendron tulipifera L."
For now, let's choose only the flowering tree Liriodendron tulipifera (LITU).
By writing it this way, we could also add a list of species to the speciesOfInterest
object to select for multiple species.
speciesOfInterest <- "LITU"
#subset to just "LITU"
# here just use == but could also use %in%
phe_1sp <- filter(phe_1st, taxonID==speciesOfInterest)
# check that it worked
unique(phe_1sp$taxonID)
## [1] "LITU"
Select Phenophase of Interest
And, perhaps a single phenophase.
# see which phenophases are present
unique(phe_1sp$phenophaseName)
## [1] "Open flowers" "Breaking leaf buds"
## [3] "Colored leaves" "Increasing leaf size"
## [5] "Falling leaves" "Leaves"
phenophaseOfInterest <- "Leaves"
#subset to just the phenosphase of interest
phe_1sp <- filter(phe_1sp, phenophaseName %in% phenophaseOfInterest)
# check that it worked
unique(phe_1sp$phenophaseName)
## [1] "Leaves"
Select only Primary Plots
NEON plant phenology observations are collected along two types of plots.
- Primary plots: an 800 meter square phenology loop transect
- Phenocam plots: a 200 m x 200 m plot located within view of a canopy level, tower-mounted, phenology camera
In the data, these plots are differentiated by the subtypeSpecification
.
Depending on your question you may want to use only one or both of these plot types.
For this activity, we're going to only look at the primary plots.
# what plots are present?
unique(phe_1sp$subtypeSpecification)
## [1] "primary" "phenocam"
# filter
phe_1spPrimary <- filter(phe_1sp, subtypeSpecification == 'primary')
# check that it worked
unique(phe_1spPrimary$subtypeSpecification)
## [1] "primary"
Total in Phenophase of Interest
The phenophaseState
is recorded as "yes" or "no" that the individual is in that
phenophase. The phenophaseIntensity
are categories for how much of the individual
is in that state. For now, we will stick with phenophaseState
.
We can now calculate the total number of individuals with that state. We use
n_distinct(indvidualID)
count the individuals (and not the records) in case
there are duplicate records for an individual.
But later on we'll also want to calculate the percent of the observed individuals in the "leaves" status, therefore, we're also adding in a step here to retain the sample size so that we can calculate % later.
Here we use pipes %>%
from the dpylr package to "pass" objects onto the next
function.
# Calculate sample size for later use
sampSize <- phe_1spPrimary %>%
group_by(dateStat) %>%
summarise(numInd= n_distinct(individualID))
# Total in status by day for distinct individuals
inStat <- phe_1spPrimary%>%
group_by(dateStat, phenophaseStatus)%>%
summarise(countYes=n_distinct(individualID))
## `summarise()` has grouped output by 'dateStat'. You can override using the `.groups` argument.
inStat <- full_join(sampSize, inStat, by="dateStat")
# Retain only Yes
inStat_T <- filter(inStat, phenophaseStatus %in% "yes")
# check that it worked
unique(inStat_T$phenophaseStatus)
## [1] "yes"
Now that we have the data we can plot it.
Plot with ggplot
The ggplot()
function within the ggplot2
package gives us considerable control
over plot appearance. Three basic elements are needed for ggplot()
to work:
- The data_frame: containing the variables that we wish to plot,
-
aes
(aesthetics): which denotes which variables will map to the x-, y- (and other) axes, -
geom_XXXX
(geometry): which defines the data's graphical representation (e.g. points (geom_point
), bars (geom_bar
), lines (geom_line
), etc).
The syntax begins with the base statement that includes the data_frame
(inStat_T
) and associated x (date
) and y (n
) variables to be
plotted:
ggplot(inStat_T, aes(date, n))
Bar Plots with ggplot
To successfully plot, the last piece that is needed is the geom
etry type.
To create a bar plot, we set the geom
element from to geom_bar()
.
The default setting for a ggplot bar plot - geom_bar()
- is a histogram
designated by stat="bin"
. However, in this case, we want to plot count values.
We can use geom_bar(stat="identity")
to force ggplot to plot actual values.
# plot number of individuals in leaf
phenoPlot <- ggplot(inStat_T, aes(dateStat, countYes)) +
geom_bar(stat="identity", na.rm = TRUE)
phenoPlot
# Now let's make the plot look a bit more presentable
phenoPlot <- ggplot(inStat_T, aes(dateStat, countYes)) +
geom_bar(stat="identity", na.rm = TRUE) +
ggtitle("Total Individuals in Leaf") +
xlab("Date") + ylab("Number of Individuals") +
theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
theme(text = element_text(size=18))
phenoPlot
We could also covert this to percentage and plot that.
# convert to percent
inStat_T$percent<- ((inStat_T$countYes)/inStat_T$numInd)*100
# plot percent of leaves
phenoPlot_P <- ggplot(inStat_T, aes(dateStat, percent)) +
geom_bar(stat="identity", na.rm = TRUE) +
ggtitle("Proportion in Leaf") +
xlab("Date") + ylab("% of Individuals") +
theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
theme(text = element_text(size=18))
phenoPlot_P
The plots demonstrate the nice expected pattern of increasing leaf-out, peak, and drop-off.
Drivers of Phenology
Now that we see that there are differences in and shifts in phenophases, what are the drivers of phenophases?
The NEON phenology measurements track sensitive and easily observed indicators of biotic responses to climate variability by monitoring the timing and duration of phenological stages in plant communities. Plant phenology is affected by forces such as temperature, timing and duration of pest infestations and disease outbreaks, water fluxes, nutrient budgets, carbon dynamics, and food availability and has feedbacks to trophic interactions, carbon sequestration, community composition and ecosystem function. (quoted from Plant Phenology Observations user guide.)
Filter by Date
In the next part of this series, we will be exploring temperature as a driver of phenology. Temperature date is quite large (NEON provides this in 1 minute or 30 minute intervals) so let's trim our phenology date down to only one year so that we aren't working with as large a data.
Let's filter to just 2018 data.
# use filter to select only the date of interest
phe_1sp_2018 <- filter(inStat_T, dateStat >= "2018-01-01" & dateStat <= "2018-12-31")
# did it work?
range(phe_1sp_2018$dateStat)
## [1] "2018-04-13 GMT" "2018-11-20 GMT"
How does that look?
# Now let's make the plot look a bit more presentable
phenoPlot18 <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
geom_bar(stat="identity", na.rm = TRUE) +
ggtitle("Total Individuals in Leaf") +
xlab("Date") + ylab("Number of Individuals") +
theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
theme(text = element_text(size=18))
phenoPlot18
Now that we've filtered down to just the 2018 data from SCBI for LITU in leaf, we may want to save that subsetted data for another use. To do that you can write the data frame to a .csv file.
You do not need to follow this step if you are continuing on to the next tutorials in this series as you already have the data frame in your environment. Of course if you close R and then come back to it, you will need to re-load this data and instructions for that are provided in the relevant tutorials.
# Write .csv - this step is optional
# This will write to your current working directory, change as desired.
write.csv( phe_1sp_2018 , file="NEONpheno_LITU_Leaves_SCBI_2018.csv", row.names=F)
#If you are using the downloaded example date, this code will write it to the
# pheno data file. Note - this file is already a part of the download.
#write.csv( phe_1sp_2018 , file="NEON-pheno-temp-timeseries_v2/NEONpheno_LITU_Leaves_SCBI_2018.csv", row.names=F)
Get Lesson Code
Work with NEON's Single-Aspirated Air Temperature Data
Authors: Lee Stanish, Megan A. Jones, Natalie Robinson
Last Updated: Mar 21, 2023
In this tutorial, we explore the NEON single-aspirated air temperature data. We then discuss how to interpret the variables, how to work with date-time and date formats, and finally how to plot the data.
This tutorial is part of a series on how to work with both discrete and continuous time series data with NEON plant phenology and temperature data products.
Objectives
After completing this activity, you will be able to:
- work with "stacked" NEON Single-Aspirated Air Temperature data.
- correctly format date-time data.
- use dplyr functions to filter data.
- plot time series data in scatter plots using ggplot function.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
neonUtilities:
install.packages("neonUtilities")
-
ggplot2:
install.packages("ggplot2")
-
dplyr:
install.packages("dplyr")
-
tidyr:
install.packages("tidyr")
More on Packages in R – Adapted from Software Carpentry.
Additional Resources
- NEON data portal
- RStudio's data wrangling (dplyr/tidyr) cheatsheet
- NEONScience GitHub Organization
- nneo API wrapper on CRAN
- RStudio's data wrangling (dplyr/tidyr) cheatsheet
- Hadley Wickham's
documentation
on the
ggplot2
package. - Winston Chang's
Background Information About NEON Air Temperature Data
Air temperature is continuously monitored by NEON by two methods. At terrestrial sites temperature at the top of the tower is derived from a triple redundant aspirated air temperature sensor. This is provided as NEON data product DP1.00003.001. Single Aspirated Air Temperature sensors (SAAT) are deployed to develop temperature profiles at multiple levels on the tower at NEON terrestrial sites and on the meteorological stations at NEON aquatic sites. This is provided as NEON data product DP1.00002.001.
When designing a research project using this data, consult the Data Product Details Page for more detailed documentation.
Single-aspirated Air Temperature
Air temperature profiles are ascertained by deploying SAATs at various heights on NEON tower infrastructure. Air temperature at aquatic sites is measured using a single SAAT at a standard height of 3m above ground level. Air temperature for this data product is provided as one- and thirty-minute averages of 1 Hz observations. Temperature observations are made using platinum resistance thermometers, which are housed in a fan aspirated shield to reduce radiative heating. The temperature is measured in Ohms and subsequently converted to degrees Celsius during data processing. Details on the conversion can be found in the associated Algorithm Theoretic Basis Document (ATBD; see Product Details page linked above).
Available Data Tables
The SAAT data product contains two data tables for each site and month selected, consisting of the 1-minute and 30-minute averaging intervals. In addition, there are several metadata files that provide additional useful information.
- readme with information on the data product and the download
- variables file that defines the terms, data types, and units
- EML file with machine readable metadata in standardized Ecological Metadata Language
Access NEON Data
There are several ways to access NEON data, directly from the NEON data portal, access through a data partner (select data products only), writing code to directly pull data from the NEON API, or, as we'll do here, using the neonUtilities package which is a wrapper for the API to make working with the data easier.
Downloading from the Data Portal
If you prefer to download data from the data portal, please review the Getting started and Stack the downloaded data sections of the Download and Explore NEON Data tutorial. This will get you to the point where you can download data from sites or dates of interest and resume this tutorial.
Downloading Data Using neonUtilities
First, we need to set up our environment with the packages needed for this tutorial.
# Install needed package (only uncomment & run if not already installed)
#install.packages("neonUtilities")
#install.packages("ggplot2")
#install.packages("dplyr")
#install.packages("tidyr")
# Load required libraries
library(neonUtilities) # for accessing NEON data
library(ggplot2) # for plotting
library(dplyr) # for data munging
library(tidyr) # for data munging
# set working directory
# this step is optional, only needed if you plan to save the
# data files at the end of the tutorial
wd <- "~/data" # enter your working directory here
setwd(wd)
This tutorial is part of series working with discrete plant phenology data and (nearly) continuous temperature data. Our overall "research" question is to see if there is any correlation between plant phenology and temperature. Therefore, we will want to work with data that align with the plant phenology data that we worked with in the first tutorial. If you are only interested in working with the temperature data, you do not need to complete the previous tutorial.
Our data of interest will be the temperature data from 2018 from NEON's Smithsonian Conservation Biology Institute (SCBI) field site located in Virginia near the northern terminus of the Blue Ridge Mountains.
NEON single aspirated air temperature data is available in two averaging intervals, 1 minute and 30 minute intervals. Which data you want to work with is going to depend on your research questions. Here, we're going to only download and work with the 30 minute interval data as we're primarily interest in longer term (daily, weekly, annual) patterns.
This will download 7.7 MB of data. check.size
is set to false (F
) to improve flow
of the script but is always a good idea to view the size with true (T
) before
downloading a new dataset.
# download data of interest - Single Aspirated Air Temperature
saat <- loadByProduct(dpID="DP1.00002.001", site="SCBI",
startdate="2018-01", enddate="2018-12",
package="basic", timeIndex="30",
check.size = F)
Explore Temperature Data
Now that you have the data, let's take a look at the structure and understand
what's in the data. The data (saat
) come in as a large list of four items.
View(saat)
So what exactly are these five files and why would you want to use them?
-
data file(s): There will always be one or more dataframes that include the
primary data of the data product you downloaded. Since we downloaded only the 30
minute averaged data we only have one data table
SAAT_30min
. - readme_xxxxx: The readme file, with the corresponding 5 digits from the data product number, provides you with important information relevant to the data product and the specific instance of downloading the data.
- sensor_positions_xxxxx: This table contains the spatial coordinates of each sensor, relative to a reference location.
- variables_xxxxx: This table contains all the variables found in the associated data table(s). This includes full definitions, units, and rounding.
- issueLog_xxxxx: This table contains records of any known issues with the data product, such as sensor malfunctions.
- scienceReviewFlags_xxxxx: This table may or may not be present. It contains descriptions of adverse events that led to manual flagging of the data, and is usually more detailed than the issue log. It only contains records relevant to the sites and dates of data downloaded.
Since we want to work with the individual files, let's make the elements of the list into independent objects.
list2env(saat, .GlobalEnv)
## <environment: R_GlobalEnv>
Now let's take a look at the data table.
str(SAAT_30min)
## 'data.frame': 87600 obs. of 16 variables:
## $ domainID : chr "D02" "D02" "D02" "D02" ...
## $ siteID : chr "SCBI" "SCBI" "SCBI" "SCBI" ...
## $ horizontalPosition : chr "000" "000" "000" "000" ...
## $ verticalPosition : chr "010" "010" "010" "010" ...
## $ startDateTime : POSIXct, format: "2018-01-01 00:00:00" "2018-01-01 00:30:00" "2018-01-01 01:00:00" ...
## $ endDateTime : POSIXct, format: "2018-01-01 00:30:00" "2018-01-01 01:00:00" "2018-01-01 01:30:00" ...
## $ tempSingleMean : num -11.8 -11.8 -12 -12.2 -12.4 ...
## $ tempSingleMinimum : num -12.1 -12.2 -12.3 -12.6 -12.8 ...
## $ tempSingleMaximum : num -11.4 -11.3 -11.3 -11.7 -12.1 ...
## $ tempSingleVariance : num 0.0208 0.0315 0.0412 0.0393 0.0361 0.0289 0.0126 0.0211 0.0115 0.0022 ...
## $ tempSingleNumPts : num 1800 1800 1800 1800 1800 1800 1800 1800 1800 1800 ...
## $ tempSingleExpUncert: num 0.13 0.13 0.13 0.13 0.129 ...
## $ tempSingleStdErMean: num 0.0034 0.0042 0.0048 0.0047 0.0045 0.004 0.0026 0.0034 0.0025 0.0011 ...
## $ finalQF : num 0 0 0 0 0 0 0 0 0 0 ...
## $ publicationDate : chr "20221210T185420Z" "20221210T185420Z" "20221210T185420Z" "20221210T185420Z" ...
## $ release : chr "undetermined" "undetermined" "undetermined" "undetermined" ...
Quality Flags
The sensor data undergo a variety of automated quality assurance and quality control
checks. You can read about them in detail in the Quality Flags and Quality Metrics ATBD, in the Documentation section of the product details page.
The expanded data package
includes all of these quality flags, which can allow you to decide if not passing
one of the checks will significantly hamper your research and if you should
therefore remove the data from your analysis. Here, we're using the
basic data package, which only includes the final quality flag (finalQF
),
which is aggregated from the full set of quality flags.
A pass of the check is 0, while a fail is 1. Let's see what percentage of the data we downloaded passed the quality checks.
sum(SAAT_30min$finalQF==1)/nrow(SAAT_30min)
## [1] 0.2340297
What should we do with the 23% of the data that are flagged? This may depend on why it is flagged and what questions you are asking, and the expanded data package would be useful for determining this.
For now, for demonstration purposes, we'll keep the flagged data.
What about null (NA
) data?
sum(is.na(SAAT_30min$tempSingleMean))/nrow(SAAT_30min)
## [1] 0.2239269
mean(SAAT_30min$tempSingleMean)
## [1] NA
22% of the mean temperature values are NA
. Note that this is not
additive with the flagged data! Empty data records are flagged, so this
indicates nearly all of the flagged data in our download are empty records.
Why was there no output from the calculation of mean temperature?
The R programming language, by default, won't calculate a mean (and many other
summary statistics) in data that contain NA
values. We could override this
using the input parameter na.rm=TRUE
in the mean()
function, or just
remove the empty values from our analysis.
# create new dataframe without NAs
SAAT_30min_noNA <- SAAT_30min %>%
drop_na(tempSingleMean) # tidyr function
# alternate base R
# SAAT_30min_noNA <- SAAT_30min[!is.na(SAAT_30min$tempSingleMean),]
# did it work?
sum(is.na(SAAT_30min_noNA$tempSingleMean))
## [1] 0
Scatterplots with ggplot
We can use ggplot to create scatter plots. Which data should we plot, as we have several options?
- tempSingleMean: the mean temperature for the interval
- tempSingleMinimum: the minimum temperature during the interval
- tempSingleMaximum: the maximum temperature for the interval
Depending on exactly what question you are asking you may prefer to use one over the other. For many applications, the mean temperature of the 1- or 30-minute interval will provide the best representation of the data.
Let's plot it. (This is a plot of a large amount of data. It can take 1-2 mins to process. It is not essential for completing the next steps if this takes too much of your computer memory.)
# plot temp data
tempPlot <- ggplot(SAAT_30min, aes(startDateTime, tempSingleMean)) +
geom_point(size=0.3) +
ggtitle("Single Aspirated Air Temperature") +
xlab("Date") + ylab("Temp (C)") +
theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
theme(text = element_text(size=18))
tempPlot
## Warning: Removed 19616 rows containing missing values (`geom_point()`).
What patterns can you see in the data?
Something odd seems to have happened in late April/May 2018. Since it is unlikely Virginia experienced -50C during this time, these are probably erroneous sensor readings and why we should probably remove data that are flagged with those quality flags.
Right now we are also looking at all the data points in the dataset. However, we may want to view or aggregate the data differently:
- aggregated data: min, mean, or max over a some duration
- the number of days since a freezing temperatures
- or some other segregation of the data.
Given that in the previous tutorial, Work With NEON's Plant Phenology Data, we were working with phenology data collected on a daily scale let's aggregate to that level.
To make this plot better, lets do two things
- Remove flagged data
- Aggregate to a daily mean.
Subset to remove quality flagged data
We already removed the empty records. Now we'll subset the data to remove the remaining flagged data.
# subset and add C to name for "clean"
SAAT_30minC <- filter(SAAT_30min_noNA, SAAT_30min_noNA$finalQF==0)
# Do any quality flags remain?
sum(SAAT_30minC$finalQF==1)
## [1] 0
Now we can plot only the unflagged data.
# plot temp data
tempPlot <- ggplot(SAAT_30minC, aes(startDateTime, tempSingleMean)) +
geom_point(size=0.3) +
ggtitle("Single Aspirated Air Temperature") +
xlab("Date") + ylab("Temp (C)") +
theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
theme(text = element_text(size=18))
tempPlot
That looks better! But we're still working with the 30-minute data.
Aggregate Data by Day
We can use the dplyr package functions to aggregate the data. However, we have to choose which data we want to aggregate. Again, you might want daily minimum temps, mean temperature or maximum temps depending on your question.
In the context of phenology, minimum temperatures might be very important if you are interested in a species that is very frost susceptible. Any days with a minimum temperature below 0C could dramatically change the phenophase. For other species or climates, maximum thresholds may be very important. Or you might be most interested in the daily mean.
And note that you can combine different input values with different aggregation functions - for example, you could calculate the minimum of the half-hourly average temperature, or the average of the half-hourly maximum temperature.
For this tutorial, let's use maximum daily temperature, i.e. the maximum of the
tempSingleMax
values for the day.
# convert to date, easier to work with
SAAT_30minC$Date <- as.Date(SAAT_30minC$startDateTime)
# max of mean temp each day
temp_day <- SAAT_30minC %>%
group_by(Date) %>%
distinct(Date, .keep_all=T) %>%
mutate(dayMax=max(tempSingleMaximum))
Now we can plot the cleaned up daily temperature.
# plot Air Temperature Data across 2018 using daily data
tempPlot_dayMax <- ggplot(temp_day, aes(Date, dayMax)) +
geom_point(size=0.5) +
ggtitle("Daily Max Air Temperature") +
xlab("") + ylab("Temp (C)") +
theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
theme(text = element_text(size=18))
tempPlot_dayMax
Thought questions:
- What do we gain by this visualization?
- What do we lose relative to the 30 minute intervals?
ggplot - Subset by Time
Sometimes we want to scale the x- or y-axis to a particular time subset without
subsetting the entire data_frame
. To do this, we can define start and end
times. We can then define these limits
in the scale_x_date
object as
follows:
scale_x_date(limits=start.end) +
Let's plot just the first three months of the year.
# Define Start and end times for the subset as R objects that are the time class
startTime <- as.Date("2018-01-01")
endTime <- as.Date("2018-03-31")
# create a start and end time R object
start.end <- c(startTime,endTime)
str(start.end)
## Date[1:2], format: "2018-01-01" "2018-03-31"
# View data for first 3 months only
# And we'll add some color for a change.
tempPlot_dayMax3m <- ggplot(temp_day, aes(Date, dayMax)) +
geom_point(color="blue", size=0.5) +
ggtitle("Air Temperature\n Jan - March") +
xlab("Date") + ylab("Air Temperature (C)")+
(scale_x_date(limits=start.end,
date_breaks="1 week",
date_labels="%b %d"))
tempPlot_dayMax3m
## Warning: Removed 268 rows containing missing values (`geom_point()`).
Now we have the temperature data matching our Phenology data from the previous tutorial, we want to save it to our computer to use in future analyses (or the next tutorial). This is optional if you are continuing directly to the next tutorial as you already have the data in R.
# Write .csv - this step is optional
# This will write to the working directory we set at the start of the tutorial
write.csv(temp_day , file="NEONsaat_daily_SCBI_2018.csv", row.names=F)
Get Lesson Code
Plot Continuous & Discrete Data Together
Authors: Lee Stanish, Megan A. Jones, Natalie Robinson
Last Updated: May 7, 2021
This tutorial discusses ways to plot plant phenology (discrete time series) and single-aspirated temperature (continuous time series) together. It uses data frames created in the first two parts of this series, Work with NEON OS & IS Data - Plant Phenology & Temperature. If you have not completed these tutorials, please download the dataset below.
Objectives
After completing this tutorial, you will be able to:
- plot multiple figures together with grid.arrange()
- plot only a subset of dates
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
neonUtilities:
install.packages("neonUtilities")
-
ggplot2:
install.packages("ggplot2")
-
dplyr:
install.packages("dplyr")
-
gridExtra:
install.packages("gridExtra")
More on Packages in R – Adapted from Software Carpentry.
Download Data
This tutorial is designed to have you download data directly from the NEON
portal API using the neonUtilities package. However, you can also directly
download this data, prepackaged, from FigShare. This data set includes all the
files needed for the Work with NEON OS & IS Data - Plant Phenology & Temperature
tutorial series. The data are in the format you would receive if downloading them
using the zipsByProduct()
function in the neonUtilities package.
To start, we need to set up our R environment. If you're continuing from the previous tutorial in this series, you'll only need to load the new packages.
# Install needed package (only uncomment & run if not already installed)
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("scales")
# Load required libraries
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(scales)
options(stringsAsFactors=F) #keep strings as character type not factors
# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd <- "~/Documents/data/" # Change this to match your local environment
setwd(wd)
If you don't already have the R objects, temp_day
and phe_1sp_2018
, loaded
you'll need to load and format those data. If you do, you can skip this code.
# Read in data -> if in series this is unnecessary
temp_day <- read.csv(paste0(wd,'NEON-pheno-temp-timeseries/NEONsaat_daily_SCBI_2018.csv'))
phe_1sp_2018 <- read.csv(paste0(wd,'NEON-pheno-temp-timeseries/NEONpheno_LITU_Leaves_SCBI_2018.csv'))
# Convert dates
temp_day$Date <- as.Date(temp_day$Date)
# use dateStat - the date the phenophase status was recorded
phe_1sp_2018$dateStat <- as.Date(phe_1sp_2018$dateStat)
Separate Plots, Same Panel
In this dataset, we have phenology and temperature data from the Smithsonian Conservation Biology Institute (SCBI) NEON field site. There are a variety of ways we may want to look at this data, including aggregated at the site level, by a single plot, or viewing all plots at the same time but in separate plots. In the Work With NEON's Plant Phenology Data and the Work with NEON's Single-Aspirated Air Temperature Data tutorials, we created separate plots of the number of individuals who had leaves at different times of the year and the temperature in 2018.
However, plot the data next to each other to aid comparisons. The grid.arrange()
function from the gridExtra package can help us do this.
# first, create one plot
phenoPlot <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
geom_bar(stat="identity", na.rm = TRUE) +
ggtitle("Total Individuals in Leaf") +
xlab("") + ylab("Number of Individuals")
# create second plot of interest
tempPlot_dayMax <- ggplot(temp_day, aes(Date, dayMax)) +
geom_point() +
ggtitle("Daily Max Air Temperature") +
xlab("Date") + ylab("Temp (C)")
# Then arrange the plots - this can be done with >2 plots as well.
grid.arrange(phenoPlot, tempPlot_dayMax)
Now, we can see both plots in the same window. But, hmmm... the x-axis on both plots is kinda wonky. We want the same spacing in the scale across the year (e.g., July in one should line up with July in the other) plus we want the dates to display in the same format(e.g. 2016-07 vs. Jul vs Jul 2018).
Format Dates in Axis Labels
The date format parameter can be adjusted with scale_x_date
. Let's format the x-axis
ticks so they read "month" (%b
) in both graphs. We will use the syntax:
scale_x_date(labels=date_format("%b"")
Rather than re-coding the entire plot, we can add the scale_x_date
element
to the plot object phenoPlot
we just created.
-
You can type
?strptime
into the R console to find a list of date format conversion specifications (e.g. %b = month). Typescale_x_date
for a list of parameters that allow you to format dates on the x-axis. -
If you are working with a date & time class (e.g. POSIXct), you can use
scale_x_datetime
instead ofscale_x_date
.
# format x-axis: dates
phenoPlot <- phenoPlot +
(scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b")))
tempPlot_dayMax <- tempPlot_dayMax +
(scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b")))
# New plot.
grid.arrange(phenoPlot, tempPlot_dayMax)
But this only solves one of the problems, we still have a different range on the x-axis which makes it harder to see trends.
Align data sets with different start dates
Now let's work to align the values on the x-axis. We can do this in two ways,
- setting the x-axis to have the same date range or 2) by filtering the dataset itself to only include the overlapping data. Depending on what you are trying to demonstrate and if you're doing additional analyses and want only the overlapping data, you may prefer one over the other. Let's try both.
Set range of x-axis
Alternatively, we can set the x-axis range for both plots by adding the limits
parameter to the scale_x_date()
function.
# first, lets recreate the full plot and add in the
phenoPlot_setX <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
geom_bar(stat="identity", na.rm = TRUE) +
ggtitle("Total Individuals in Leaf") +
xlab("") + ylab("Number of Individuals") +
scale_x_date(breaks = date_breaks("1 month"),
labels = date_format("%b"),
limits = as.Date(c('2018-01-01','2018-12-31')))
# create second plot of interest
tempPlot_dayMax_setX <- ggplot(temp_day, aes(Date, dayMax)) +
geom_point() +
ggtitle("Daily Max Air Temperature") +
xlab("Date") + ylab("Temp (C)") +
scale_x_date(date_breaks = "1 month",
labels=date_format("%b"),
limits = as.Date(c('2018-01-01','2018-12-31')))
# Plot
grid.arrange(phenoPlot_setX, tempPlot_dayMax_setX)
Now we can really see the pattern over the full year. This emphasizes the point that during much of the late fall, winter, and early spring none of the trees have leaves on them (or that data were not collected - this plot would not distinguish between the two).
Subset one data set to match other
Alternatively, we can simply filter the dataset with the larger date range so the we only plot the data from the overlapping dates.
# filter to only having overlapping data
temp_day_filt <- filter(temp_day, Date >= min(phe_1sp_2018$dateStat) &
Date <= max(phe_1sp_2018$dateStat))
# Check
range(phe_1sp_2018$date)
## [1] "2018-04-13" "2018-11-20"
range(temp_day_filt$Date)
## [1] "2018-04-13" "2018-11-20"
#plot again
tempPlot_dayMaxFiltered <- ggplot(temp_day_filt, aes(Date, dayMax)) +
geom_point() +
scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
ggtitle("Daily Max Air Temperature") +
xlab("Date") + ylab("Temp (C)")
grid.arrange(phenoPlot, tempPlot_dayMaxFiltered)
With this plot, we really look at the area of overlap in the plotted data (but this does cut out the time where the data are collected but not plotted).
Same plot with two Y-axes
What about layering these plots and having two y-axes (right and left) that have the different scale bars?
Some argue that you should not do this as it can distort what is actually going
on with the data. The author of the ggplot2 package is one of these individuals.
Therefore, you cannot use ggplot()
to create a single plot with multiple y-axis
scales. You can read his own discussion of the topic on this
StackOverflow post.
However, individuals have found work arounds for these plots. The code below is provided as a demonstration of this capability. Note, by showing this code here, we don't necessarily endorse having plots with two y-axes.
This code is adapted from code by Jake Heare.
# Source: http://heareresearch.blogspot.com/2014/10/10-30-2014-dual-y-axis-graph-ggplot2_30.html
# Additional packages needed
library(gtable)
library(grid)
# Plot 1: Pheno data as bars, temp as scatter
grid.newpage()
phenoPlot_2 <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
geom_bar(stat="identity", na.rm = TRUE) +
scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b")) +
ggtitle("Total Individuals in Leaf vs. Temp (C)") +
xlab(" ") + ylab("Number of Individuals") +
theme_bw()+
theme(legend.justification=c(0,1),
legend.position=c(0,1),
plot.title=element_text(size=25,vjust=1),
axis.text.x=element_text(size=20),
axis.text.y=element_text(size=20),
axis.title.x=element_text(size=20),
axis.title.y=element_text(size=20))
tempPlot_dayMax_corr_2 <- ggplot() +
geom_point(data = temp_day_filt, aes(Date, dayMax),color="red") +
scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
xlab("") + ylab("Temp (C)") +
theme_bw() %+replace%
theme(panel.background = element_rect(fill = NA),
panel.grid.major.x=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.y=element_blank(),
panel.grid.minor.y=element_blank(),
axis.text.y=element_text(size=20,color="red"),
axis.title.y=element_text(size=20))
g1<-ggplot_gtable(ggplot_build(phenoPlot_2))
g2<-ggplot_gtable(ggplot_build(tempPlot_dayMax_corr_2))
pp<-c(subset(g1$layout,name=="panel",se=t:r))
g<-gtable_add_grob(g1, g2$grobs[[which(g2$layout$name=="panel")]],pp$t,pp$l,pp$b,pp$l)
ia<-which(g2$layout$name=="axis-l")
ga <- g2$grobs[[ia]]
ax <- ga$children[[2]]
ax$widths <- rev(ax$widths)
ax$grobs <- rev(ax$grobs)
ax$grobs[[1]]$x <- ax$grobs[[1]]$x - unit(1, "npc") + unit(0.15, "cm")
g <- gtable_add_cols(g, g2$widths[g2$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)
grid.draw(g)
# Plot 2: Both pheno data and temp data as line graphs
grid.newpage()
phenoPlot_3 <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
geom_line(na.rm = TRUE) +
scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
ggtitle("Total Individuals in Leaf vs. Temp (C)") +
xlab("Date") + ylab("Number of Individuals") +
theme_bw()+
theme(legend.justification=c(0,1),
legend.position=c(0,1),
plot.title=element_text(size=25,vjust=1),
axis.text.x=element_text(size=20),
axis.text.y=element_text(size=20),
axis.title.x=element_text(size=20),
axis.title.y=element_text(size=20))
tempPlot_dayMax_corr_3 <- ggplot() +
geom_line(data = temp_day_filt, aes(Date, dayMax),color="red") +
scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
xlab("") + ylab("Temp (C)") +
theme_bw() %+replace%
theme(panel.background = element_rect(fill = NA),
panel.grid.major.x=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.y=element_blank(),
panel.grid.minor.y=element_blank(),
axis.text.y=element_text(size=20,color="red"),
axis.title.y=element_text(size=20))
g1<-ggplot_gtable(ggplot_build(phenoPlot_3))
g2<-ggplot_gtable(ggplot_build(tempPlot_dayMax_corr_3))
pp<-c(subset(g1$layout,name=="panel",se=t:r))
g<-gtable_add_grob(g1, g2$grobs[[which(g2$layout$name=="panel")]],pp$t,pp$l,pp$b,pp$l)
ia<-which(g2$layout$name=="axis-l")
ga <- g2$grobs[[ia]]
ax <- ga$children[[2]]
ax$widths <- rev(ax$widths)
ax$grobs <- rev(ax$grobs)
ax$grobs[[1]]$x <- ax$grobs[[1]]$x - unit(1, "npc") + unit(0.15, "cm")
g <- gtable_add_cols(g, g2$widths[g2$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)
grid.draw(g)
Get Lesson Code
Download and work with NEON Aquatic Instrument Data
Authors: Bobby Hensley, Guy Litt, Megan Jones
Last Updated: Apr 8, 2021
This tutorial covers downloading NEON Aquatic Instrument System (AIS) data, using the neonUtilities R package, as well as basic instruction in beginning to explore and work with the downloaded data, including guidance in navigating data documentation, separating data using the horizontal location (HOR) variable, interpreting quality flags, and resampling time intervals.
The following material steps through the multiple considerations in interpreting NEON data, and ultimately achieves a data comparison between two different sensors at nearby locations that are published at different time intervals. This sort of data wrangling is useful for comparing different data streams, and/or preparing data into a consistent format for modeling.
Objectives
After completing this activity, you will be able to:
- Download NEON AIS data using the
neonUtilities
package. - Understand downloaded data sets and load them into R for analyses.
- Separate data collected at different sensor locations using the HOR variable.
- Understand and interpret quality flags, including how to discover what non-
- standard quality flags mean.
- Aggregate time series to higher intervals and impute (fill-in) observations
- where absent.
Things You'll Need To Complete This Tutorial
To complete this tutorial you will need R (version >3.4) and, preferably, RStudio loaded on your computer.
Install R Packages
- neonUtilities: Basic functions for accessing NEON data
- ggplot2: Plotting functions
- dplyr: Data manipulation functions
- padr: Time-series data preparation functions
These packages are on CRAN and can be installed by
install.packages()
.
Additional Resources
Download Files and Load Directly to R: loadByProduct()
The most popular function in neonUtilities
is loadByProduct()
.
This function downloads data from the NEON API, merges the site-by-month
files, and loads the resulting data tables into the R environment,
assigning each data type to the appropriate R class. This is a popular
choice because it ensures you're always working with the most up-to-date data,
and it ends with ready-to-use tables in R. However, if you use it in
a workflow you run repeatedly, keep in mind it will re-download the
data every time.
Before we get the NEON data, we need to install (if not already done) and load the neonUtilities R package, as well as other packages we will use in the analysis.
# Install neonUtilities package if you have not yet.
install.packages("neonUtilities")
install.packages("ggplot2")
install.packages("dplyr")
install.packages("padr")
# Set global option to NOT convert all character variables to factors
options(stringsAsFactors=F)
# Load required packages
library(neonUtilities)
library(ggplot2)
library(dplyr)
library(padr)
The inputs to loadByProduct()
control which data to download and how
to manage the processing. The following are frequently used inputs:
-
dpID
: the data product ID, e.g. DP1.20288.001 -
site
: defaults to "all", meaning all sites with available data; can be a vector of 4-letter NEON site codes, e.g.c("MART","ARIK","BARC")
. -
startdate
andenddate
: defaults to NA, meaning all dates with available data; or a date in the form YYYY-MM, e.g. 2017-06. Since NEON data are provided in month packages, finer scale querying is not available. Both start and end date are inclusive. -
package
: either basic or expanded data package. Expanded data packages generally include additional information about data quality, such as individual quality flag test results. Not every NEON data product has an expanded package; if the expanded package is requested but there isn't one, the basic package will be downloaded. -
avg
: defaults to "all", to download all data; or the number of minutes in the averaging interval. See example below; only applicable to IS data. -
savepath
: the file path you want to download to; defaults to the working directory. -
check.size
: T or F; should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set this to F. -
token
: this allows you to input your NEON API token to obtain faster downloads. Learn more about NEON API tokens in the Using an API Token when Accessing NEON Data with neonUtilities tutorial.
There are additional inputs you can learn about in the Use the neonUtilities R Package to Access NEON Data tutorial.
The dpID
is the data product identifier of the data you want to
download. The DPID can be found on the
Explore Data Products page.
It will be in the form DP#.#####.###. For this tutorial, we'll use some data products collected in NEON's Aquatic Instrument System:
- DP1.20288.001: Water quality
- DP1.20033.001: Nitrate in surface water
- DP1.20016.001: Elevation of surface water
Now it's time to consider the NEON field site of interest. If not specified, the default will download a data product from all sites. The following are 4-letter site codes for NEON's 34 aquatics sites as of 2020:
ARIK = Arikaree River CO
BARC = Barco Lake FL
BIGC = Upper Big Creek CA
BLDE = Black Deer Creek WY
BLUE = Blue River OK
BLWA = Black Warrior River AL
CARI = Caribou Creek AK
COMO = Como Creek CO
CRAM = Crampton Lake WI
CUPE = Rio Cupeyes PR
FLNT = Flint River GA
GUIL = Rio Guilarte PR
HOPB = Lower Hop Brook MA
KING = Kings Creek KS
LECO = LeConte Creek TN
LEWI = Lewis Run VA
LIRO = Little Rock Lake WI
MART = Martha Creek WA
MAYF = Mayfield Creek AL
MCDI = McDiffett Creek KS
MCRA = McRae Creek OR
OKSR = Oksrukuyik Creek AK
POSE = Posey Creek VA
PRIN = Pringle Creek TX
PRLA = Prairie Lake ND
PRPO = Prairie Pothole ND
REDB = Red Butte Creek UT
SUGG = Suggs Lake FL
SYCA = Sycamore Creek AZ
TECR = Teakettle Creek CA
TOMB = Lower Tombigbee River AL
TOOK = Toolik Lake AK
WALK = Walker Branch TN
WLOU = West St Louis Creek CO
In this exercise, we will want data from only one NEON field site, Pringle Creek, TX (PRIN) from February, 2020.
Now let us download our data. If you are not using a NEON token to download
your data, neonUtilities will ignore the token
input. We set check.size = F
so that the script runs well but remember you always want to check your
download size first.
# download data of interest - Water Quality
waq <- loadByProduct(dpID="DP1.20288.001", site="PRIN",
startdate="2020-02", enddate="2020-02",
package="expanded",
token = Sys.getenv("NEON_TOKEN"),
check.size = F)
Using what you've learned above, can you modify the code to download data for the following parameters?
- Data Product DP1.20033.001: nitrate in surface water
- Data Product DP1.20016.001: elevation of surface water
- The expanded data tables
- Dates matching the other data products you've downloaded
- What is the size of the downloaded data?
- Without downloading all the data, how can you tell the difference in size between the "expanded" and "basic" packages?
# download data of interest - Nitrate in Suface Water
nsw <- loadByProduct(dpID="DP1.20033.001", site="PRIN",
startdate="2020-02", enddate="2020-02",
package="expanded",
token = Sys.getenv("NEON_TOKEN"),
check.size = F)
# #1. 2.0 MiB
# #2. You can change check.size to True (T), and compare "basic" vs "expaneded"
# package types. The basic package is 37.0 KiB, and the expanded is 42.4 KiB.
# download data of interest - Elevation of surface water
eos <- loadByProduct(dpID="DP1.20016.001", site="PRIN",
startdate="2020-02", enddate="2020-02",
package="expanded",
token = Sys.getenv("NEON_TOKEN"),
check.size = F)
Files Associated with Downloads
The data we've downloaded comes as an object that is a named list of objects.
To work with each of them, select them from the list using the $
operator.
# view all components of the list
names(waq)
## [1] "readme_20288" "sensor_positions_20288" "variables_20288"
## [4] "waq_instantaneous"
# View the dataFrame
View(waq$waq_instantaneous)
We can see that there are four objects in the downloaded water quality data. One
dataframe of data (waq_instantaneous
) and three metadata files.
If you'd like you can use the $
operator to assign an object from an item in
the list. If you prefer to extract each table from the list and work with it as
independent objects, which we will do, you can use the list2env()
function.
# unlist the variables and add to the global environment
list2env(waq, .GlobalEnv)
## <environment: R_GlobalEnv>
So what exactly are these four files and why would you want to use them?
- data file(s): There will always be one or more dataframes that include the primary data of the data product you downloaded. Multiple dataframes are available when there are related datatables for a single data product.
- readme_xxxxx: The readme file, with the corresponding 5 digits from the data product number, provides you with important information relevant to the data product and the specific instance of downloading the data. Here you can find manual flagging notes for all sites, locations, and time periods.
- sensor_postions_xxxxx: this file contains information about the coordinates of each sensor, relative to a reference location.
- variables_xxxxx: this file contains all the variables found in the associated data table(s). This includes full definitions, units, and other important information.
Let's perform the same thing for the surface water nitrate and elevation of surface water data products too:
list2env(nsw, .GlobalEnv)
## <environment: R_GlobalEnv>
list2env(eos, .GlobalEnv)
## <environment: R_GlobalEnv>
Note that a few more objects were added to the Global Environment, including:
-
NSW_15_minute
-
EOS_5_min
-
EOS_30_min
The 15_minute
name indicates the time-averaging intervals in a dataset. Other
examples may include 5_min
and 30_min
in the same data product, such as
elevation of surface water (DP1.20016.001). If only one time average interests
you, you may specify the time interval of interest when downloading the data when
calling neonUtilities::loadByProduct()
.
Data from Different Sensor Locations (HOR)
NEON often collects the same type of data from sensors in different locations. These
data are delivered together but you will frequently want to plot the data
separately or only include data from one sensor in your analysis. NEON uses the
horizontalPosition
variable in the data tables to describe which sensor
data is collected from. The horizontalPosition
is always a three digit number
for AIS data. Non-shoreline HOR examples as of 2020 at AIS sites include:
- 101: stream sensors located at the upstream station on a monopod mount,
- 111: stream sensors located at the upstream station on an overhead cable mount,
- 131: stream sensors located at the upstream station on a stand alone pressure transducer mount,
- 102: stream sensors located at the downstream station on a monopod mount,
- 112: stream sensors located at the downstream station on an overhead cable mount
- 132: stream sensors located at the downstream station on a stand alone pressure transducer mount,
- 110: pressure transducers mounted to a staff gauge.
- 103: sensors mounted on buoys in lakes or rivers
- 130 and 140: sensors mounted in the littoral zone of lakes
You'll frequently want to know which sensor locations are represented in your
data. We can do this by looking for the unique()
position designations in
horizontalPostions
.
# which sensor locations exist for water quality, DP1.20288.001?
print("Water quality horizontal positions:")
## [1] "Water quality horizontal positions:"
unique(waq_instantaneous$horizontalPosition)
## [1] "101" "102"
We can see that there are two water quality sensor positions at PRIN in February 2020. As the locations of sensors can change at sites over time (especially with aquatic sensors as AIS sites undergo redesigns) it is a good idea to check horizontal positions when you're adding in new locations or a new date range to your analyses.
Let's check the HOR locations for surface water nitrate and elevation too:
# which sensor locations exist for other data products?
print("Nitrate in Surface Water horizontal positions: ")
## [1] "Nitrate in Surface Water horizontal positions: "
unique(NSW_15_minute$horizontalPosition)
## [1] "102"
print("Elevation of Surface Water horizontal positions: ")
## [1] "Elevation of Surface Water horizontal positions: "
unique(EOS_30_min$horizontalPosition)
## [1] "110" "132"
Now we can use this information to split water quality data into the two different sensor set locations: upstream and the downstream.
# Split data into separate dataframes by upstream/downstream locations.
waq_up <-
waq_instantaneous[(waq_instantaneous$horizontalPosition=="101"),]
waq_down <-
waq_instantaneous[(waq_instantaneous$horizontalPosition=="102"),]
# Note: The surface water nitrate sensor is only stationed at one location.
eos_up <- EOS_30_min[(EOS_30_min$horizontalPosition=="110"),]
eos_down <- EOS_30_min[(EOS_30_min$horizontalPosition=="132"),]
Plot Data
Now that we have our data separated into the upstream and downstream data, let's plot both of the data sets together. We want to create a plot of the measures of Dissolved Oxygen from the two different sensors.
First, let's identify the column names important for plotting - time and dissolved oxygen data:
# One option is to view column names in the data frame
colnames(waq_instantaneous)
## [1] "domainID" "siteID"
## [3] "horizontalPosition" "verticalPosition"
## [5] "startDateTime" "endDateTime"
## [7] "sensorDepth" "sensorDepthExpUncert"
## [9] "sensorDepthRangeQF" "sensorDepthNullQF"
## [11] "sensorDepthGapQF" "sensorDepthValidCalQF"
## [13] "sensorDepthSuspectCalQF" "sensorDepthPersistQF"
## [15] "sensorDepthAlphaQF" "sensorDepthBetaQF"
## [17] "sensorDepthFinalQF" "sensorDepthFinalQFSciRvw"
## [19] "specificConductance" "specificConductanceExpUncert"
## [21] "specificConductanceRangeQF" "specificConductanceStepQF"
## [23] "specificConductanceNullQF" "specificConductanceGapQF"
## [25] "specificConductanceSpikeQF" "specificConductanceValidCalQF"
## [27] "specificCondSuspectCalQF" "specificConductancePersistQF"
## [29] "specificConductanceAlphaQF" "specificConductanceBetaQF"
## [31] "specificCondFinalQF" "specificCondFinalQFSciRvw"
## [33] "dissolvedOxygen" "dissolvedOxygenExpUncert"
## [35] "dissolvedOxygenRangeQF" "dissolvedOxygenStepQF"
## [37] "dissolvedOxygenNullQF" "dissolvedOxygenGapQF"
## [39] "dissolvedOxygenSpikeQF" "dissolvedOxygenValidCalQF"
## [41] "dissolvedOxygenSuspectCalQF" "dissolvedOxygenPersistenceQF"
## [43] "dissolvedOxygenAlphaQF" "dissolvedOxygenBetaQF"
## [45] "dissolvedOxygenFinalQF" "dissolvedOxygenFinalQFSciRvw"
## [47] "dissolvedOxygenSaturation" "dissolvedOxygenSatExpUncert"
## [49] "dissolvedOxygenSatRangeQF" "dissolvedOxygenSatStepQF"
## [51] "dissolvedOxygenSatNullQF" "dissolvedOxygenSatGapQF"
## [53] "dissolvedOxygenSatSpikeQF" "dissolvedOxygenSatValidCalQF"
## [55] "dissOxygenSatSuspectCalQF" "dissolvedOxygenSatPersistQF"
## [57] "dissolvedOxygenSatAlphaQF" "dissolvedOxygenSatBetaQF"
## [59] "dissolvedOxygenSatFinalQF" "dissolvedOxygenSatFinalQFSciRvw"
## [61] "pH" "pHExpUncert"
## [63] "pHRangeQF" "pHStepQF"
## [65] "pHNullQF" "pHGapQF"
## [67] "pHSpikeQF" "pHValidCalQF"
## [69] "pHSuspectCalQF" "pHPersistenceQF"
## [71] "pHAlphaQF" "pHBetaQF"
## [73] "pHFinalQF" "pHFinalQFSciRvw"
## [75] "chlorophyll" "chlorophyllExpUncert"
## [77] "chlorophyllRangeQF" "chlorophyllStepQF"
## [79] "chlorophyllNullQF" "chlorophyllGapQF"
## [81] "chlorophyllSpikeQF" "chlorophyllValidCalQF"
## [83] "chlorophyllSuspectCalQF" "chlorophyllPersistenceQF"
## [85] "chlorophyllAlphaQF" "chlorophyllBetaQF"
## [87] "chlorophyllFinalQF" "chlorophyllFinalQFSciRvw"
## [89] "turbidity" "turbidityExpUncert"
## [91] "turbidityRangeQF" "turbidityStepQF"
## [93] "turbidityNullQF" "turbidityGapQF"
## [95] "turbiditySpikeQF" "turbidityValidCalQF"
## [97] "turbiditySuspectCalQF" "turbidityPersistenceQF"
## [99] "turbidityAlphaQF" "turbidityBetaQF"
## [101] "turbidityFinalQF" "turbidityFinalQFSciRvw"
## [103] "fDOM" "rawCalibratedfDOM"
## [105] "fDOMExpUncert" "fDOMRangeQF"
## [107] "fDOMStepQF" "fDOMNullQF"
## [109] "fDOMGapQF" "fDOMSpikeQF"
## [111] "fDOMValidCalQF" "fDOMSuspectCalQF"
## [113] "fDOMPersistenceQF" "fDOMAlphaQF"
## [115] "fDOMBetaQF" "fDOMTempQF"
## [117] "fDOMAbsQF" "fDOMFinalQF"
## [119] "fDOMFinalQFSciRvw" "buoyNAFlag"
## [121] "spectrumCount" "publicationDate"
## [123] "release"
# Alternatively, view the variables object corresponding to the data product for more information
View(variables_20288)
Quite a few columns in the water quality data product!
The time column we'll consider for instrumented systems is endDateTime
because it approximately represents data within the interval on or before the
endDateTime
time stamp. Timestamp column choice matters for time-aggregated
datasets, but should not matter for instantaneous data such as water quality.
When interpreting data, keep in mind NEON timestamps are always in UTC.
The data column we would like to plot is labeled dissolvedOxygen
.
# plot
wqual <- ggplot() +
geom_line(data = waq_up,
aes(endDateTime, dissolvedOxygen,color="a"),
na.rm=TRUE ) +
geom_line(data = waq_down,
aes(endDateTime, dissolvedOxygen, color="b"),
na.rm=TRUE) +
geom_line(na.rm = TRUE) +
ylim(0, 20) + ylab("Dissolved Oxygen (mg/L)") +
xlab(" ") +
scale_color_manual(values = c("blue","red"),
labels = c("upstream","downstream")) +
labs(colour = "") + # Remove legend title
theme(legend.position = "top") +
ggtitle("PRIN Upstream and Downstream DO")
wqual
Now let's try plotting fDOM. fDOM is only measured at the downstream location. NEON also provides uncertainty values for each measurement. Let's also consider measurement uncertainty in the plot.
The data columns we would like to plot are labeled fDOM
and fDOMExpUncert
.
# plot
fdomUcert <- ggplot() +
geom_line(data = waq_down,
aes(endDateTime, fDOM),
na.rm=TRUE, color="orange") +
geom_ribbon(data=waq_down,
aes(x=endDateTime,
ymin = (fDOM - fDOMExpUncert),
ymax = (fDOM + fDOMExpUncert)),
alpha = 0.4, fill = "grey75") +
geom_line( na.rm = TRUE) +
ylim(0, 200) + ylab("fDOM (QSU)") +
xlab(" ") +
ggtitle("PRIN Downstream fDOM with Expected Uncertainty Bounds")
fdomUcert
Using what you've learned above, identify horizontal postions and column names for nitrate in surface water.
# recall dataframes created in list2env() command, including NSW_15_minute
# which sensor locations?
unique(NSW_15_minute$horizontalPosition)
# what is the column name of the data stream of interest?
names(NSW_15_minute)
Using what you've learned above, plot nitrate in surface water.
# plot
plot_NSW <- ggplot(data = NSW_15_minute,
aes(endDateTime, surfWaterNitrateMean)) +
geom_line(na.rm=TRUE, color="blue") +
ylab("NO3-N (uM)") + xlab(" ") +
ggtitle("PRIN Downstream Nitrate in Surface Water")
plot_NSW
Examine Quality Flagged Data
Data product quality flags fall under two distinct types:
- Automated quality flags, e.g. range, spike, step, null
- Manual science review quality flag
In instantaneous data such as water quality DP1.20288.001, the quality flag columns are denoted with "QF".
In time-averaged data, most quality flags have been aggregated into quality metrics, with column names denoted with "QM" representing the fraction of flagged points within the time averaging window.
waq_qf_names <- names(waq_down)[grep("QF", names(waq_down))]
print(paste0("Total columns in DP1.20288.001 expanded package = ",
as.character(length(waq_qf_names))))
## [1] "Total columns in DP1.20288.001 expanded package = 96"
# water quality has 96 data columns with QF in the name,
# so let's just look at those corresponding to fDOM
print("fDOM columns in DP1.20288.001 expanded package:")
## [1] "fDOM columns in DP1.20288.001 expanded package:"
print(waq_qf_names[grep("fDOM", waq_qf_names)])
## [1] "fDOMRangeQF" "fDOMStepQF" "fDOMNullQF" "fDOMGapQF"
## [5] "fDOMSpikeQF" "fDOMValidCalQF" "fDOMSuspectCalQF" "fDOMPersistenceQF"
## [9] "fDOMAlphaQF" "fDOMBetaQF" "fDOMTempQF" "fDOMAbsQF"
## [13] "fDOMFinalQF" "fDOMFinalQFSciRvw"
A quality flag (QF) of 0 indicates a pass, 1 indicates a fail, and -1 indicates a test that could not be performed. For example, a range test cannot be performed on missing measurements.
Detailed quality flags test results are all available in the
package = 'expanded'
setting we specified when calling
neonUtilities::loadByProduct()
. If we had specified package = 'basic'
,
we wouldn't be able to investigate the detail in the type of data flag thrown.
We would only see the FinalQF columns.
The AlphaQF
and BetaQF
represent aggregated results of various QF tests,
and vary by a data product's algorithm. In most cases, an observation's
AlphaQF = 1
indicates whether or not at least one QF was set to a value
of 1, and an observation's BetaQF = 1
indicates whether or not at least one
QF was set to value of -1.
Note that fDOM has a couple other data-stream specific QFs beyond the standard quality flags. These are specific to the algorithms used to correct raw fDOM readings using temperature and absorbance per Watras et al. (2011) and Downing et al. (2012).
Let's consider what types of fDOM quality flags were thrown.
waq_qf_names <- names(waq_down)[grep("QF", names(waq_down))]
print(paste0("Total QF columns: ",length(waq_qf_names)))
## [1] "Total QF columns: 96"
# water quality has 96 data columns with QF in the name,
# so let us just look at those corresponding to fDOM
fdom_qf_names <- waq_qf_names[grep("fDOM",waq_qf_names)]
for(col_nam in fdom_qf_names){
print(paste0(col_nam, " unique values: ",
paste0(unique(waq_down[,col_nam]),
collapse = ", ")))
}
## [1] "fDOMRangeQF unique values: 0, -1"
## [1] "fDOMStepQF unique values: 0, 1, -1"
## [1] "fDOMNullQF unique values: 0, 1"
## [1] "fDOMGapQF unique values: 0, 1"
## [1] "fDOMSpikeQF unique values: 0, -1, 1"
## [1] "fDOMValidCalQF unique values: 0"
## [1] "fDOMSuspectCalQF unique values: 0"
## [1] "fDOMPersistenceQF unique values: 0"
## [1] "fDOMAlphaQF unique values: 0, 1"
## [1] "fDOMBetaQF unique values: 0, 1"
## [1] "fDOMTempQF unique values: 0, 1, -1"
## [1] "fDOMAbsQF unique values: 0, -1, 1, 2"
## [1] "fDOMFinalQF unique values: 0, 1"
## [1] "fDOMFinalQFSciRvw unique values: NA"
QF values generally mean the following:
- 0: Quality test passed
- 1: Quality test failed
- -1: Quality test could not be run
- 2: A special case for
fDOMAbsQF
So what does fDOMAbsQF = 2
mean? The data product's variable descriptions may
provide us some clues.
Recall we previously viewed the water quality variables object that comes
with every NEON data download. Now let's print the description corresponding
to the fDOMAbsQF
field name.
print(variables_20288$description[which(variables_20288$fieldName == "fDOMAbsQF")])
## [1] "Quality flag indicating that fDOM absorbance corrections were applied = 0; unable to be applied = 1; absorbance values were high = 2; calculated correction factor was 1 (i.e. no absorbance correction was made) = 3"
So whenever fDOMAbsQF = 2
, the absorbance values coming from the
SUNA (surface water nitrate sensor) were high.
Now let's consider the total number of flags generated for each quality test:
# Loop across the fDOM QF column names.
# Within each column, count the number of rows that equal '1'.
print("FLAG TEST - COUNT")
## [1] "FLAG TEST - COUNT"
for (col_nam in fdom_qf_names){
totl_qf_in_col <- length(which(waq_down[,col_nam] == 1))
print(paste0(col_nam,": ",totl_qf_in_col))
}
## [1] "fDOMRangeQF: 0"
## [1] "fDOMStepQF: 770"
## [1] "fDOMNullQF: 233"
## [1] "fDOMGapQF: 218"
## [1] "fDOMSpikeQF: 71"
## [1] "fDOMValidCalQF: 0"
## [1] "fDOMSuspectCalQF: 0"
## [1] "fDOMPersistenceQF: 0"
## [1] "fDOMAlphaQF: 9997"
## [1] "fDOMBetaQF: 238"
## [1] "fDOMTempQF: 9"
## [1] "fDOMAbsQF: 9016"
## [1] "fDOMFinalQF: 9997"
## [1] "fDOMFinalQFSciRvw: 0"
# Let's also check out how many fDOMAbsQF = 2 exist
print(paste0("fDOMAbsQF = 2: ",
length(which(waq_down[,"fDOMAbsQF"] == 2))))
## [1] "fDOMAbsQF = 2: 210"
print(paste0("Total fDOM observations: ", nrow(waq_down) ))
## [1] "Total fDOM observations: 41769"
Above lists the total fDOM QFs from a month of data at PRIN, as well as the total number of observation data points in the data file.
We see a notably higher quantity of fDOMAbsQF relative to other quality flags. Why is that? How do we know where to look?
The variables_20228
included in the download would be a good place to start.
Let's check the description for fDOMAbsQF
again.
print(variables_20288[which(variables_20288$fieldName == "fDOMAbsQF"),])
## table fieldName
## 1: waq_instantaneous fDOMAbsQF
## description
## 1: Quality flag indicating that fDOM absorbance corrections were applied = 0; unable to be applied = 1; absorbance values were high = 2; calculated correction factor was 1 (i.e. no absorbance correction was made) = 3
## dataType units downloadPkg pubFormat primaryKey categoricalCodeName
## 1: signed integer <NA> expanded integer <NA> NA
So fDOMAbsQF = 1
means fDOM absorbance corrections were unable to be applied.
For specific details on the algorithms used to create a data product and it's corresponding quality tests, it's best to first check the data product's Algorithm Theoretical Basis Document (ATBD). For water quality, that is NEON.DOC.004931 listed as Documentation references in the README file and the data product's web page.
Are there any manual science review quality flags? If so, the explanation for flagging may also be viewed in the data product's README file or in the data product's web page on NEON's data portal.
Filtering (Some) Quality Flagged Observations
A simple approach to removing quality flagged observations is to remove data when the finalQF is raised. Let's view a plotting example using fDOM:
# Map QF label names forthe plot for the fDOMFinalQF grouping
group_labels <- c("fDOMFinalQF = 0", "fDOMFinalQF = 1")
names(group_labels) <- c("0","1")
# Plot fDOM data, grouping by the fDOMFinalQF value
ggplot2::ggplot(data = waq_down,
aes(x = endDateTime, y = fDOM, group = fDOMFinalQF)) +
ggplot2::geom_step() +
facet_grid(fDOMFinalQF ~ .,
labeller = labeller(fDOMFinalQF = group_labels)) +
ggplot2::ggtitle("PRIN Sensor Set 102 fDOM final QF comparison")
The top panel corresponding to fDOMFinalQF = 0
represents all fDOM data
that were not flagged. Conversely, the fDOMFinalQF = 1
represents all
flagged fDOM data. Clearly, many spikes look like they were appropriately
flagged. However, some flagged data look like they could be useful, such as
the 2020 February 18-February 24 time range.
Let's inspect the quality flags during that time.
# Find row indices around February 22:
idxs_Feb22 <- base::which(waq_down$endDateTime > as.POSIXct("2020-02-22"))[1:1440]
print("FLAG TEST - COUNT")
## [1] "FLAG TEST - COUNT"
for (col_nam in fdom_qf_names){
totl_qf_in_col <- length(which(waq_down[idxs_Feb22,col_nam] == 1))
print(paste0(col_nam,": ",totl_qf_in_col))
}
## [1] "fDOMRangeQF: 0"
## [1] "fDOMStepQF: 8"
## [1] "fDOMNullQF: 0"
## [1] "fDOMGapQF: 0"
## [1] "fDOMSpikeQF: 0"
## [1] "fDOMValidCalQF: 0"
## [1] "fDOMSuspectCalQF: 0"
## [1] "fDOMPersistenceQF: 0"
## [1] "fDOMAlphaQF: 1440"
## [1] "fDOMBetaQF: 0"
## [1] "fDOMTempQF: 0"
## [1] "fDOMAbsQF: 1440"
## [1] "fDOMFinalQF: 1440"
## [1] "fDOMFinalQFSciRvw: 0"
Looks like all Feb 22, 2020 data were flagged with fDOMAbsQF
, with a few
step test quality flags as well.
Let's take a closer look at each fDOMAbsQF
flag value by grouping data based
on each fDOMAbsQF
value:
ggplot2::ggplot(data = waq_down,
aes(x = endDateTime, y = fDOM, group = fDOMAbsQF)) +
ggplot2::geom_step() +
facet_grid(fDOMAbsQF ~ .) +
ggplot2::ggtitle("PRIN Sensor Set 102 fDOMAbsQF comparison")
## Warning: Removed 233 row(s) containing missing values (geom_path).
The fDOMAbsQF = 1
is the most common quality flag from any single test.
This means the absorbance correction could not be applied to fDOM data. This
absorbance test also causes the final quality flag test fail, but some users
may wish to ignore the absorbance quality test entirely.
Note the fDOMAbsQF = 1
time frame corresponds to the missing surface water
nitrate data, as shown in the surface water nitrate plot we generated earlier.
Here is a reminder of our nitrate data:
plot_NSW
Some types of automated quality flags may be worth ignoring. Rather than use
the FinalQF
column to omit any quality flags, let's create a custom final
quality flag by ignoring thefDOMAbsQF
column, allowing us omit quality
flagged fDOM data regardless of absorbance correction status.
# Remove the absorbance and aggregated quality flag tests from list of fDOM QF tests:
fdom_qf_non_abs_names <- fdom_qf_names[which(!fdom_qf_names %in% c("fDOMAlphaQF","fDOMBetaQF","fDOMAbsQF","fDOMFinalQF"))]
# Create a custom quality flag column as the maximum QF value within each row
waq_down$aggr_non_abs_QF <- apply( waq_down[,fdom_qf_non_abs_names],1,max, na.rm = TRUE)
# The 'apply' function above allows us avoid a for-loop and more efficiently
# iterate over each row.
# Plot fDOM data, grouping by the custom quality flag column's value
ggplot2::ggplot(data = waq_down,
aes(x = endDateTime, y = fDOM,
group = aggr_non_abs_QF)) +
ggplot2::geom_step() +
facet_grid(aggr_non_abs_QF ~ .) +
ggplot2::ggtitle("PRIN Sensor Set 102 fDOM custom QF aggregation")
Using the custom quality flag aggregation that ignored fDOMAbsQF
, the
aggregated aggr_non_abs_QF
column we created increases the quantity of
data that could be used for further analyses.
Note that the automated quality flag algorithms are not perfect, and a few suspect data points may occasionally pass the quality tests.
Data Aggregation
Sensor data users commonly wish to aggregate data such that time stamps match across two different datasets. In the following example, we will show how to combine elevation of surface water (DP1.20016.001) and water quality (DP1.20288.001) data products into a single dataframe.
Water quality is published as an instantaneous record, which should be every
minute at non-buoy sites such as PRIN. We know a data product does not come
from the buoy if the HOR location is different from "103"
. Because elevation
of surface water is already aggregated to 30-minute intervals, we want to
aggregate the water quality data product to 30-minute intervals as well.
At PRIN in February 2020, the elevation of surface water sensor is co-located
with the water quality sonde at horizontalPosition = "102"
, meaning the
downstream sensor set. In this lesson, let's ignore the upstream data at
HOR 101 and just aggregate water quality's downstream data from HOR 102.
Data can easily be aggregated in different forms, such as the mean, min, max,
and sum. In the following code chunk, we'll aggregate the data values to 15
minutes as a mean, and the finalQF values as a max between 0 and 1. More complex
functions may be needed for aggregating other types of data, such as the
measurement uncertainty or special, non-binary quality flags like fDOMAbsQF
.
# Recall we already created the downstream object for water quality, waq_down
# We first need to name each data stream within water quality.
# One trick is to find all the variable names by searching for "BetaQF"
waq_strm_betaqf_cols <- names(waq_down)[grep("BetaQF",names(waq_down))]
print(paste0("BetaQF column names: ",
paste0(waq_strm_betaqf_cols, collapse = ", ")))
## [1] "BetaQF column names: sensorDepthBetaQF, specificConductanceBetaQF, dissolvedOxygenBetaQF, dissolvedOxygenSatBetaQF, pHBetaQF, chlorophyllBetaQF, turbidityBetaQF, fDOMBetaQF"
# Now let's remove the BetaQF from the column name:
waq_strm_cols <- base::gsub("BetaQF","",waq_strm_betaqf_cols)
# To keep column names short, some variable names had to be shortened
# when appending "BetaQF", so let's add "uration" to "dissolvedOxygenSat"
waq_strm_cols <- base::gsub("dissolvedOxygenSat",
"dissolvedOxygenSaturation",waq_strm_cols)
print(paste0("Water quality sensor data stream names: ",
paste0(waq_strm_cols, collapse = ", ")))
## [1] "Water quality sensor data stream names: sensorDepth, specificConductance, dissolvedOxygen, dissolvedOxygenSaturation, pH, chlorophyll, turbidity, fDOM"
# We will also aggregate the final quality flags:
waq_final_qf_cols <- names(waq_down)[grep("FinalQF",names(waq_down))]
# Let's check to make sure our time column is in POSIXct format, which is
# needed if you download and read-in NEON data files without using the
# neonUtilities package.
if("POSIXct" %in% class(waq_down$endDateTime)){
print("Time column in waq_down is appropriately in POSIXct format")
} else {
print("Converting waq_down endDateTime column to POSIXct")
waq_down$endDateTime <- as.POSIXct(waq_down$endDateTime, tz = "UTC")
}
## [1] "Time column in waq_down is appropriately in POSIXct format"
Now that we have the column names of data and quality flags which we wish to
aggregate, we can now move to the aggregation steps! We're going to use some
more advanced features using the dplyr
and padr
packages. Instead of looping
by each column, let's employ the dplyr
pipe operator, %>%
, and call a
function that acts on each data column of interest, which we've determined above.
# Aggregate water quality data columns to 30 minute intervals,
# taking the mean of non-NA values within each 30-minute period.
# We explain each step in the dplyr piping operation in code
# comments:
waq_30min_down <- waq_down %>%
# pass the downstream data frame to the next function
# padr's thicken function adds a new column, roundedTime,
# that shows the closest 30 min timestamp to
# to a given observation in time
padr::thicken(interval = "30 min",
by = "endDateTime",
colname = "roundedTime",
rounding = "down") %>%
# In 1-min data, there should now be sets of 30
# corresponding to each 30-minute roundedTime
# We use dplyr to group data by unique roundedTime
# values, and summarise each 30-min group
# by the the mean, for all data columns provided
# in waq_strm_cols and waq_final_qf_cols
dplyr::group_by(roundedTime) %>%
dplyr::summarise_at(vars(dplyr::all_of(c(waq_strm_cols,
waq_final_qf_cols))),
mean, na.rm = TRUE)
# Rather than binary values, quality flags are more like "quality
# metrics", defining the fraction of data flagged within an
# aggregation interval.
Now, we have a new dataframe for water quality data and associated final quality flags aggregated to 30 minute time intervals. Now the downstream water quality data may be easily combined with the nearby, albeit non-co-located downstream 30-minute averaged elevation of surface water data.
The following code chunk merges the data:
# We have to specify the matching column from each dataframe
all_30min_data_down <- base::merge(x = waq_30min_down,
y = eos_down,
by.x = "roundedTime",
by.y = "endDateTime")
# Let's take a peek at the combined data frame's column names:
colnames(all_30min_data_down)
## [1] "roundedTime" "sensorDepth"
## [3] "specificConductance" "dissolvedOxygen"
## [5] "dissolvedOxygenSaturation" "pH"
## [7] "chlorophyll" "turbidity"
## [9] "fDOM" "sensorDepthFinalQF"
## [11] "sensorDepthFinalQFSciRvw" "specificCondFinalQF"
## [13] "specificCondFinalQFSciRvw" "dissolvedOxygenFinalQF"
## [15] "dissolvedOxygenFinalQFSciRvw" "dissolvedOxygenSatFinalQF"
## [17] "dissolvedOxygenSatFinalQFSciRvw" "pHFinalQF"
## [19] "pHFinalQFSciRvw" "chlorophyllFinalQF"
## [21] "chlorophyllFinalQFSciRvw" "turbidityFinalQF"
## [23] "turbidityFinalQFSciRvw" "fDOMFinalQF"
## [25] "fDOMFinalQFSciRvw" "domainID"
## [27] "siteID" "horizontalPosition"
## [29] "verticalPosition" "startDateTime"
## [31] "surfacewaterElevMean" "surfacewaterElevMinimum"
## [33] "surfacewaterElevMaximum" "surfacewaterElevVariance"
## [35] "surfacewaterElevNumPts" "surfacewaterElevExpUncert"
## [37] "surfacewaterElevStdErMean" "sWatElevRangeFailQM"
## [39] "sWatElevRangePassQM" "sWatElevRangeNAQM"
## [41] "sWatElevPersistenceFailQM" "sWatElevPersistencePassQM"
## [43] "sWatElevPersistenceNAQM" "sWatElevStepFailQM"
## [45] "sWatElevStepPassQM" "sWatElevStepNAQM"
## [47] "sWatElevNullFailQM" "sWatElevNullPassQM"
## [49] "sWatElevNullNAQM" "sWatElevGapFailQM"
## [51] "sWatElevGapPassQM" "sWatElevGapNAQM"
## [53] "sWatElevSpikeFailQM" "sWatElevSpikePassQM"
## [55] "sWatElevSpikeNAQM" "validCalFailQM"
## [57] "validCalPassQM" "validCalNAQM"
## [59] "sWatElevAlphaQM" "sWatElevBetaQM"
## [61] "sWatElevFinalQF" "sWatElevFinalQFSciRvw"
## [63] "publicationDate" "release"
We now have matching time stamps for water quality and any other 30-minute
averaged data product, such as elevation of surface water. The merged data
frame facilitates direct comparison across different sensors.
Let's take a look with a plot of specific conductance versus water surface elevation:
ggplot(data = all_30min_data_down,
aes(x = surfacewaterElevMean, y = specificConductance)) +
geom_point() +
ggtitle("PRIN specific conductance vs. surface water elevation") +
xlab("Elevation [m ASL]") +
ylab("Specific conductance [uS/cm]")
## Warning: Removed 5 rows containing missing values (geom_point).
Aggregating high frequency time series data is a useful tool for understanding relationships between variables collected at different time intervals, and may also be a required format for model inputs.
Now that you have the basic tools and knowledge on how to read and wrangle NEON AIS data, go have fun working on your scientific questions!
Citations
Watras, C. J., Hanson, P. C., Stacy, T. L., Morrison, K. M., Mather, J., Hu, Y. H., & Milewski, P. (2011). A temperature compensation method for CDOM fluorescence sensors in freshwater. Limnology and Oceanography: Methods, 9(7), 296-301.
Downing, B. D., Pellerin, B. A., Bergamaschi, B. A., Saraceno, J. F., & Kraus, T. E. (2012). Seeing the light: The effects of particles, dissolved materials, and temperature on in situ measurements of DOM fluorescence in rivers and streams. Limnology and Oceanography: Methods, 10(10), 767-775.
Get Lesson Code
Using the neonOS Package to Check for Duplicates and Join Tables
Authors: [Claire K. Lunch]
Last Updated: Oct 3, 2022
NEON observational data are diverse in both content and in data structure. Generally, data are published in a set of data tables, each of which corresponds to a particular activity: for example, several data products include a field collection data table, a sample processing data table, and a table of laboratory analytical results.
Joining data tables
Depending on the analysis you want to carry out, there may be data in multiple tables that you want to bring together. For example, it is very common that species identifications are in a different table from chemical composition or pathogen status. For species-specific analyses, you would need to draw on multiple tables. There are a variety of ways to do this, but one of the simplest is to join the tables and create a single flat table.
The Quick Start Guides and Data Product User Guides provide information about the relationships between different data tables, and we recommend you consult these documents for the data products you are working with to gain the full picture. Quick Start Guides (QSGs) are included in the download package when you download data via the Data Portal, and can also be viewed on the Data Product Details page for each data product; for example, see the page for Plant foliar traits. Scroll down to the Documentation section to see the QSG. The Data Product User Guide is also available in the Documentation menu.
To join related tables, the neonOS
package provides the
joinTableNEON()
function, which checks the Table joining section of
the QSG, and uses the information there to join the tables if
possible. If the join isn't possible, or if it requires customized code,
the function will return an informative error, and usually refer you
to the QSG for more details.
Duplicates
One of the most common data entry errors that occurs in NEON OS data is
duplicate entry. NEON data entry applications and ingest validation rules
are designed to prevent duplicate entry where possible, but errors can't
be avoided completely. Consequently, NEON metadata for each OS data product
(the variables
file) includes an indicator of which data fields, taken
together, should define a unique record. This combination of fields is
called the "primary key" for the data table. The neonOS
function
removeDups()
uses these metadata to identify duplicate records.
Depending on the content of the duplicate records, they may be resolved to
a single record or marked as unresolvable - see below for details.
In this tutorial, we will de-duplicate and then join two data tables in the Aquatic plant bryophyte chemical properties (DP1.20063.001) data product, using it as an example to demonstrate the operation of these two functions. Then we will take a look at Mosquitoes sampled from CO2 traps (DP1.10043.001), which contains more complicated table relationships, and see how to modify the code to work with those tables.
Objectives
After completing this activity, you will be able to:
- Identify and resolve duplicate data using the
neonOS
package - Join NEON data tables using the
neonOS
package
Things You’ll Need To Complete This Tutorial
You will need a version of R (4.0 or higher. This code may work with
earlier versions but it hasn't been tested) and, preferably, RStudio
loaded on your computer to complete this tutorial.
Install R Packages
-
neonUtilities:
install.packages("neonUtilities")
-
neonOS:
install.packages("neonOS")
Additional Resources
- NEON Data Portal
- NEON Code Hub
Set Up R Environment and Download Data
First install and load the necessary packages.
# install packages. you can skip this step if
# the packages are already installed
install.packages("neonUtilities")
install.packages("neonOS")
install.packages("ggplot2")
# load packages
library(neonUtilities)
library(neonOS)
library(ggplot2)
We'll use aquatic plant chemistry (DP1.20063.001) as the example dataset. Use
the neonUtilities
function loadByProduct()
to download and read in the
data. If you're not familiar with the neonUtilities
package and how to use
it to access NEON data, we recommend you follow the Download and Explore NEON Data
tutorial before proceeding with this one.
Here, we'll use the same subset of aquatic plant chemistry data as in the Download and Explore tutorial. Get the data from the Prairie Lake, Suggs Lake, and Toolik Lake sites, specifying the 2022 data release.
apchem <- loadByProduct(dpID="DP1.20063.001",
site=c("PRLA","SUGG","TOOK"),
package="expanded",
release="RELEASE-2022",
check.size=F)
Copy each of the downloaded tables into the R environment.
list2env(apchem, .GlobalEnv)
Identify and Resolve Duplicates
As noted above, duplicate data entry is a common error in human
data entry. removeDups()
uses the definitional metadata in the
variables file to identify duplicates. It requires two inputs:
the data table, and the corresponding variables file.
apl_biomass <- removeDups(data=apl_biomass,
variables=variables_20063)
## No duplicated key values found!
There were no duplicates found in the apl_biomass
data table.
A duplicateRecordQF
quality flag has been added to the table,
and you can confirm that there were no duplicates by checking
the values of the flag.
unique(apl_biomass$duplicateRecordQF)
## [1] 0
All data have flag value = 0, indicating they are not duplicated.
Let's check the apl_plantExternalLabDataPerSample
table.
apl_plantExternalLabDataPerSample <- removeDups(
data=apl_plantExternalLabDataPerSample,
variables=variables_20063)
10 duplicated key values found, representing 20 non-unique records. Attempting to resolve.
|==========================================================================================| 100%
2 resolveable duplicates merged into matching records
2 resolved records flagged with duplicateRecordQF=1
16 unresolveable duplicates flagged with duplicateRecordQF=2
The function output tells you there were 20 duplicates (out of 26108
total data records). Four of those duplicates were resolved, leaving two
records flagged as resolved duplicates, with duplicateRecordQF=1
. The
remaining 16 couldn't be resolved, and were flagged with
duplicateRecordQF=2
.
What does it mean for duplicates to be resolved? Some duplicates represent
identical data that have been entered multiple times, whereas some duplicates
have the same values in the primary key, but differ in data values.
removeDups()
has a fairly narrow set of criteria for resolving to a single
record:
- If one data record has empty fields that are populated in the other record, the non-empty values are retained.
- If records are identical except for uid, remarks, and/or personnel (
identifiedBy
,recordedBy
, etc) fields, unique values are concatenated within the non-matching fields, separated by|
(pipes).
Records that can be merged to a single record by these criteria are flagged
with duplicateRecordQF=1
. Records with mis-matched data that can't be merged
are retained as-is and flagged with duplicateRecordQF=2
.
Note that even in fully identical duplicates, the uid
field (universal
identifier) will always be unique. Thus the uid
field in merged records
will always contain the pipe-delimited set of uids of the original records.
What does this look like in practice? Let's look at the two resolved duplicates:
apl_plantExternalLabDataPerSample[which(
apl_plantExternalLabDataPerSample$duplicateRecordQF==1),]
## uid domainID siteID
## 55 cca9850e-165d-4cb7-9872-eff490c79ffa|1f5a5389-c18c-4fbf-81d6-9cd45e65f48a D03 SUGG
## 60 e6cab47f-427b-47a9-a281-cbfe7c101fc3|368e16bb-23d4-4596-b624-b338777bc9bc D03 SUGG
## namedLocation collectDate sampleID sampleCondition
## 55 SUGG.AOS.reach 2016-02-22 SUGG.20160222.UTFO.Q5 condition ok
## 60 SUGG.AOS.reach 2016-02-22 SUGG.20160222.UTFO.Q5 condition ok
## laboratoryName analysisDate analyzedBy sampleType replicate
## 55 Academy of Natural Sciences of Drexel University 2016-07-12 OKgcKejxXbI= CN 1
## 60 Academy of Natural Sciences of Drexel University 2016-07-12 OKgcKejxXbI= CN 1
## sampleVolumeFiltered filterSize percentFilterAnalyzed analyte analyteConcentration plantAlgaeLabUnits
## 55 NA 25 100 carbon 34.01 percent
## 60 NA 25 100 nitrogen 2.84 percent
## method externalRemarks publicationDate release duplicateRecordQF
## 55 <NA> <NA> 20211221T225348Z RELEASE-2022 1
## 60 <NA> <NA> 20211221T225348Z RELEASE-2022 1
You can see that both records have two pipe-delimited uids, and are flagged.
Let's look at the unresolveable duplicates:
apl_plantExternalLabDataPerSample[which(
apl_plantExternalLabDataPerSample$duplicateRecordQF==2),]
## uid domainID siteID namedLocation collectDate sampleID
## 1 5fa69a8b-d19e-40f1-84a8-e46ed08cdc59 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.1
## 2 cebefd95-f4c7-4e60-9b79-93efd3f27691 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.3
## 3 a4b6d931-b093-419e-bb69-0b684219405e D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.1
## 4 f462390b-2d4a-4aec-92e1-b46a740ec40d D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.7
## 5 d4fa7f65-6b2b-46fa-9bca-529745970c56 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.7
## 6 ab43c466-b235-4f3d-9146-a1f81a91012d D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.3
## 7 1140a4c2-139e-40dc-a14d-c2446e3a5664 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.9
## 8 5201dc04-da3d-4bc2-b2bf-668ad4a3ddb5 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.7
## 9 1e36cdeb-4e94-431d-8342-5ce5ff6ac6ae D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.9
## 10 b98069a4-74fb-40d8-8b98-7a5aedea7912 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.9
## 11 9633d367-bd10-4588-a6e2-cbe954327bf6 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.9
## 12 309ccbe3-d4a8-4bd3-b773-85aa98dd7627 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.1
## 13 6d6b1da2-049f-439c-81c9-add482f6fab8 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.1
## 14 841e51f4-dcc9-45fc-937b-871678ff16f2 D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.3
## 15 3e04c3a1-7e2f-4349-a1c2-822d736fc2cf D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.NULU.7
## 16 431a4418-fdd0-4566-8791-d16ecfce76eb D03 SUGG SUGG.AOS.reach 2014-07-09 SUGG.20140709.LISP2.3
## sampleCondition laboratoryName analysisDate analyzedBy sampleType
## 1 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 2 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 3 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 4 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 5 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 6 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 7 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 8 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 9 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 10 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 11 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 12 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 13 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 14 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 15 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## 16 condition ok Academy of Natural Sciences of Drexel University 2015-02-05 OKgcKejxXbI= CN
## replicate sampleVolumeFiltered filterSize percentFilterAnalyzed analyte analyteConcentration
## 1 1 NA 25 100 carbon 38.22
## 2 1 NA 25 100 nitrogen 2.26
## 3 1 NA 25 100 nitrogen 1.98
## 4 1 NA 25 100 carbon 43.79
## 5 1 NA 25 100 nitrogen 3.16
## 6 1 NA 25 100 carbon 39.77
## 7 1 NA 25 100 carbon 43.26
## 8 1 NA 25 100 nitrogen 3.14
## 9 1 NA 25 100 nitrogen 2.94
## 10 1 NA 25 100 carbon 43.20
## 11 1 NA 25 100 nitrogen 3.00
## 12 1 NA 25 100 carbon 38.23
## 13 1 NA 25 100 nitrogen 2.02
## 14 1 NA 25 100 nitrogen 2.23
## 15 1 NA 25 100 carbon 43.93
## 16 1 NA 25 100 carbon 39.66
## plantAlgaeLabUnits method externalRemarks publicationDate release duplicateRecordQF
## 1 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
## 2 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
## 3 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
## 4 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 5 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 6 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
## 7 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 8 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 9 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 10 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 11 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 12 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
## 13 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
## 14 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
## 15 percent <NA> Nuphar Luteum 20220110T211020Z RELEASE-2022 2
## 16 percent <NA> Limnobium spongia 20220110T211020Z RELEASE-2022 2
The key for this data table is the sample identifier and analyte, and here there are multiple records with the same sample identifier, both for carbon and nitrogen values. The most likely scenario is that these are unlabeled replicate analyses, i.e., the lab ran multiple analyses on the same samples for quality control purposes, but forgot to label them accordingly.
Now, how should you proceed with your analysis? Of course, that is up to
you, and depends on your analysis. Because these appear to be unlabeled
analytical replicates, I would probably average the analyte values, but
a decision like this can't be made automatically - removeDups()
can
identify the records of concern, but what to do with them is a
judgement call.
Of course, NEON scientists also review NEON data and identify duplicates as part of quality assurance and quality control procedures, and resolve them if possible. In the data download step above, we accessed RELEASE-2022. The data release is stable and reproducible over time, but duplicates you find in one release may be resolved in future releases.
Join Data Tables
If you are using neonOS
to check for duplicates and also to join data
tables, the duplicate step should always come first. Because duplicate
identification uses the variables file to determine uniqueness of data
records, the duplicate check step requires that the data match the
variables file exactly, which they won't after being joined.
To join the apl_biomass
and apl_plantExternalLabDataPerSample
tables,
we input both tables to the joinTableNEON()
function. It uses the
information provided in NEON quick start guides to determine whether the
join is possible, and if it is, which fields to use to perform the join.
aqbc <- joinTableNEON(apl_biomass,
apl_plantExternalLabDataPerSample)
After joining tables, always take a look at the resulting table and make sure it makes sense. Errors in joining can easily result in completely nonsensical data. If you're not familiar with table joining operations, check out a lesson on the basics. The chapter on relational data in R for Data Science is a good one.
When checking your results, keep in mind that the default behavior of
joinTableNEON()
is a full join, i.e., all records from both original
tables are retained, even if they don't match. For a small number of
table pairs, the Quick Start Guide specifies a left join, and in those
cases joinTableNEON()
performs a left join.
Let's take a look at the aquatic plant table join:
nrow(apl_biomass)
## [1] 268
nrow(apl_plantExternalLabDataPerSample)
## [1] 572
nrow(aqbc)
## [1] 661
The number of rows in the joined table is larger than both of the original tables, but smaller than the sum of the original two. This suggests that most of the records in each of the original tables had a matching record in the other table, but some didn't.
View the full table:
View(aqbc)
(Table not displayed here due to size)
Here we can see we have several rows per chemSubsampleID
for most
chemSubsampleID
s. Each row corresponds to one of the chemical
analytes, and the biomass data are repeated on each row. At the bottom
of the table are a number of biomass records with no corresponding
chemistry data; these explain why the merged table is larger than the
original chemistry table.
This table structure is consistent with the original tables and with the intended join, so we're satisfied all is well. If you're working with a different data product and encounter something unexpected or undesirable in a joined table, contact NEON using the Contact Us page.
We can now connect chemical content to taxon, as in the Download and Explore NEON Data tutorial. Let's look at nitrogen content by species and site:
gg <- ggplot(subset(aqbc, analyte=="nitrogen"),
aes(scientificName, analyteConcentration,
group=scientificName,
color=scientificName)) +
geom_boxplot() +
facet_wrap(~siteID) +
theme(axis.text.x=element_text(angle=90,
hjust=1,
size=4)) +
theme(legend.position="none") +
ylab("Nitrogen (%)") +
xlab("Scientific name")
gg
Other Input Options
After downloading the data above, we ran list2env()
to make each table
an independent object in the R environment, and then provided the tables
to the two functions as-is. This worked because the names of the objects
were identical to the table names, so the functions were able to figure
out which tables they were. If the object names are not exactly equal to
the table names, you will need to input the table names separately. If
we hadn't used list2env()
, this is how we would proceed:
bio.dup <- removeDups(data=apchem$apl_biomass,
variables=apchem$variables_20063,
table="apl_biomass")
chem.dup <- removeDups(data=apchem$apl_plantExternalLabDataPerSample,
variables=apchem$variables_20063,
table="apl_plantExternalLabDataPerSample")
aq.join <- joinTableNEON(table1=bio.dup,
table2=chem.dup,
name1="apl_biomass",
name2="apl_plantExternalLabDataPerSample")
More Complicated Table Joins
In the aquatic plant chemistry example, we were able to join two tables
in a single step. In some cases, the relationship between tables is
more complicated, and joining is more difficult. In these cases,
joinTableNEON()
will provide an error message, and usually direct
you to the Quick Start Guide for more information. Let's walk through
how you can use this information, using Mosquitoes sampled from CO2 traps
(DP1.10043.001) from Toolik Lake as an example.
mos <- loadByProduct(dpID="DP1.10043.001",
site="TOOL",
release="RELEASE-2022",
check.size=F)
list2env(mos, .GlobalEnv)
Let's say we're interested in evaluating which mosquito species are found in
which vegetation types. The species identifications are in the
mos_expertTaxonomistIDProcessed
table, and the trapping conditions are in
the mos_trapping
table. So we'll attempt to join those two tables.
mos.sp <- joinTableNEON(mos_trapping,
mos_expertTaxonomistIDProcessed)
Error in joinTableNEON(mos_trapping, mos_expertTaxonomistIDProcessed) :
Tables mos_trapping and mos_expertTaxonomistIDProcessed can't be joined directly, but can each be joined to a common table(s). Consult quick start guide for details.
The function returns an error, telling us it can't perform a simple join on the two tables, but there is a table they can each join to, which can be used to join them indirectly. As directed, we refer to the Quick Start Guide for DP1.10043.001, on the Data Product Details page.
From the QSG, we learn that mos_sorting
is the intermediate table
between mos_trapping
and mos_expertTaxonomistIDProcessed
.
First, let's join mos_trapping
and mos_sorting
:
mos.trap <- joinTableNEON(mos_trapping,
mos_sorting)
Now, this next step is a bit odd. We've created a merged table of
mos_trapping
and mos_sorting
, but we know
mos_expertTaxonomistIDProcessed
can only join to mos_sorting
. So
we pass the merged table to joinTableNEON()
, telling the function to
use the join instructions for mos_sorting
and
mos_expertTaxonomistIDProcessed
.
mos.tax <- joinTableNEON(mos.trap,
mos_expertTaxonomistIDProcessed,
name1="mos_sorting")
When you join data in this way, check carefully that the resulting table is structured logically and contains the data you expect it to. Looking at the merged table, we now have multiple records for each trapping event, with one record for each species captured in that event, plus a set of records for traps that either caught no mosquitoes or couldn't be deployed, and thus have no species identifications. This is consistent with the table join we performed, so everything appears to be correct.
Let's take a look at species occurrence by habitat, as we set out to do:
gg <- ggplot(mos.tax,
aes(scientificName, individualCount,
group=scientificName,
color=scientificName)) +
geom_boxplot() +
facet_wrap(~nlcdClass) +
theme(axis.text.x=element_blank()) +
ylab("Count") +
xlab("Scientific name")
gg
## Warning: Removed 599 rows containing non-finite values (stat_boxplot).
Get Lesson Code
Explore and work with NEON biodiversity data from aquatic ecosystems
Authors: Eric R. Sokol
Last Updated: May 5, 2022
Learning Objectives
After completing this tutorial you will be able to:
- Download NEON macroinvertebrate data.
- Organize those data into long and wide tables.
- Calculate alpha, beta, and gamma diversity following Jost (2007).
Things You’ll Need To Complete This Tutorial
R Programming Language
You will need a current version of R to complete this tutorial. We also recommend the RStudio IDE to work with R.
R Packages to Install
Prior to starting the tutorial ensure that the following packages are installed.
-
tidyverse:
install.packages("tidyverse")
-
neonUtilities:
install.packages("neonUtilities")
-
vegan:
install.packages("vegan")
More on Packages in R – Adapted from Software Carpentry.
Introduction
Biodiversity is a popular topic within ecology, but quantifying and describing biodiversity precisely can be elusive. In this tutorial, we will describe many of the aspects of biodiversity using NEON's Macroinvertebrate Collection data.
Load Libraries and Prepare Workspace
First, we will load all necessary libraries into our R environment. If you have not already installed these libraries, please see the 'R Packages to Install' section above.
There are also two optional sections in this code chunk: clearing your environment, and loading your NEON API token. Clearing out your environment will erase all of the variables and data that are currently loaded in your R session. This is a good practice for many reasons, but only do this if you are completely sure that you won't be losing any important information! Secondly, your NEON API token will allow you increased download speeds, and helps NEON anonymously track data usage statistics, which helps us optimize our data delivery platforms, and informs our monthly and annual reporting to our funding agency, the National Science Foundation. Please consider signing up for a NEON data user account and using your token as described in this tutorial here.
# clean out workspace
#rm(list = ls()) # OPTIONAL - clear out your environment
#gc() # Uncomment these lines if desired
# load libraries
library(tidyverse)
library(neonUtilities)
library(vegan)
# source .r file with my NEON_TOKEN
# source("my_neon_token.R") # OPTIONAL - load NEON token
# See: https://www.neonscience.org/neon-api-tokens-tutorial
Download NEON Macroinvertebrate Data
Now that the workspace is prepared, we will download NEON macroinvertebrate data using the neonUtilities function loadByProduct()
.
# Macroinvert dpid
my_dpid <- 'DP1.20120.001'
# list of sites
my_site_list <- c('ARIK', 'POSE', 'MAYF')
# get all tables for these sites from the API -- takes < 1 minute
all_tabs_inv <- neonUtilities::loadByProduct(
dpID = my_dpid,
site = my_site_list,
#token = NEON_TOKEN, #Uncomment to use your token
check.size = F)
Macroinvertebrate Data Munging
Now that we have the data downloaded, we will need to do some 'data munging' to reorganize our data into a more useful format for this analysis. First, let's take a look at some of the tables that were generated by loadByProduct()
:
# what tables do you get with macroinvertebrate
# data product
names(all_tabs_inv)
## [1] "categoricalCodes_20120" "inv_fieldData" "inv_persample" "inv_taxonomyProcessed" "issueLog_20120"
## [6] "readme_20120" "validation_20120" "variables_20120"
# extract items from list and put in R env.
all_tabs_inv %>% list2env(.GlobalEnv)
## <environment: R_GlobalEnv>
# readme has the same informaiton as what you
# will find on the landing page on the data portal
# The variables file describes each field in
# the returned data tables
View(variables_20120)
# The validation file provides the rules that
# constrain data upon ingest into the NEON database:
View(validation_20120)
# the categoricalCodes file provides controlled
# lists used in the data
View(categoricalCodes_20120)
Next, we will perform several operations in a row to re-organize our data. Each step is described by a code comment.
# It is good to check for duplicate records. This had occurred in the past in
# data published in the inv_fieldData table in 2021. Those duplicates were
# fixed in the 2022 data release.
# Here we use sampleID as primary key and if we find duplicate records, we
# keep the first uid associated with any sampleID that has multiple uids
de_duped_uids <- inv_fieldData %>%
# remove records where no sample was collected
filter(!is.na(sampleID)) %>%
group_by(sampleID) %>%
summarise(n_recs = length(uid),
n_unique_uids = length(unique(uid)),
uid_to_keep = dplyr::first(uid))
# Are there any records that have more than one unique uid?
max_dups <- max(de_duped_uids$n_unique_uids %>% unique())
# filter data using de-duped uids if they exist
if(max_dups > 1){
inv_fieldData <- inv_fieldData %>%
dplyr::filter(uid %in% de_duped_uids$uid_to_keep)
}
# extract year from date, add it as a new column
inv_fieldData <- inv_fieldData %>%
mutate(
year = collectDate %>%
lubridate::as_date() %>%
lubridate::year())
# extract location data into a separate table
table_location <- inv_fieldData %>%
# keep only the columns listed below
select(siteID,
domainID,
namedLocation,
decimalLatitude,
decimalLongitude,
elevation) %>%
# keep rows with unique combinations of values,
# i.e., no duplicate records
distinct()
# create a taxon table, which describes each
# taxonID that appears in the data set
# start with inv_taxonomyProcessed
table_taxon <- inv_taxonomyProcessed %>%
# keep only the coluns listed below
select(acceptedTaxonID, taxonRank, scientificName,
order, family, genus,
identificationQualifier,
identificationReferences) %>%
# remove rows with duplicate information
distinct()
# taxon table information for all taxa in
# our database can be downloaded here:
# takes 1-2 minutes
# full_taxon_table_from_api <- neonUtilities::getTaxonTable("MACROINVERTEBRATE", token = NEON_TOKEN)
# Make the observation table.
# start with inv_taxonomyProcessed
# check for repeated taxa within a sampleID that need to be added together
inv_taxonomyProcessed_summed <- inv_taxonomyProcessed %>%
select(sampleID,
acceptedTaxonID,
individualCount,
estimatedTotalCount) %>%
group_by(sampleID, acceptedTaxonID) %>%
summarize(
across(c(individualCount, estimatedTotalCount), ~sum(.x, na.rm = TRUE)))
# join summed taxon counts back with sample and field data
table_observation <- inv_taxonomyProcessed_summed %>%
# Join relevant sample info back in by sampleID
left_join(inv_taxonomyProcessed %>%
select(sampleID,
domainID,
siteID,
namedLocation,
collectDate,
acceptedTaxonID,
order, family, genus,
scientificName,
taxonRank) %>%
distinct()) %>%
# Join the columns selected above with two
# columns from inv_fieldData (the two columns
# are sampleID and benthicArea)
left_join(inv_fieldData %>%
select(sampleID, eventID, year,
habitatType, samplerType,
benthicArea)) %>%
# some new columns called 'variable_name',
# 'value', and 'unit', and assign values for
# all rows in the table.
# variable_name and unit are both assigned the
# same text strint for all rows.
mutate(inv_dens = estimatedTotalCount / benthicArea,
inv_dens_unit = 'count per square meter')
# check for duplicate records, should return a table with 0 rows
table_observation %>%
group_by(sampleID, acceptedTaxonID) %>%
summarize(n_obs = length(sampleID)) %>%
filter(n_obs > 1)
## # A tibble: 0 x 3
## # Groups: sampleID [0]
## # ... with 3 variables: sampleID <chr>, acceptedTaxonID <chr>, n_obs <int>
# extract sample info
table_sample_info <- table_observation %>%
select(sampleID, domainID, siteID, namedLocation,
collectDate, eventID, year,
habitatType, samplerType, benthicArea,
inv_dens_unit) %>%
distinct()
# remove singletons and doubletons
# create an occurrence summary table
taxa_occurrence_summary <- table_observation %>%
select(sampleID, acceptedTaxonID) %>%
distinct() %>%
group_by(acceptedTaxonID) %>%
summarize(occurrences = n())
# filter out taxa that are only observed 1 or 2 times
taxa_list_cleaned <- taxa_occurrence_summary %>%
filter(occurrences > 2)
# filter observation table based on taxon list above
table_observation_cleaned <- table_observation %>%
filter(acceptedTaxonID %in%
taxa_list_cleaned$acceptedTaxonID,
!sampleID %in% c("MAYF.20190729.CORE.1",
"MAYF.20200713.CORE.1",
"MAYF.20210721.CORE.1",
"POSE.20160718.HESS.1"))
#this is an outlier sampleID
# some summary data
sampling_effort_summary <- table_sample_info %>%
# group by siteID, year
group_by(siteID, year, samplerType) %>%
# count samples and habitat types within each event
summarise(
event_count = eventID %>% unique() %>% length(),
sample_count = sampleID %>% unique() %>% length(),
habitat_count = habitatType %>%
unique() %>% length())
# check out the summary table
sampling_effort_summary %>% as.data.frame() %>%
head() %>% print()
## siteID year samplerType event_count sample_count habitat_count
## 1 ARIK 2014 core 2 6 1
## 2 ARIK 2014 modifiedKicknet 2 10 1
## 3 ARIK 2015 core 3 11 2
## 4 ARIK 2015 modifiedKicknet 3 13 2
## 5 ARIK 2016 core 3 9 1
## 6 ARIK 2016 modifiedKicknet 3 15 1
Working with 'Long' data
'Reshaping' your data to use as an input to a particular fuction may require you to consider: do I want 'long' or 'wide' data? Here's a link to a great article from 'the analysis factor' that describes the differences.
For this first step, we will use data in a 'long' table:
# no. taxa by rank by site
table_observation_cleaned %>%
group_by(domainID, siteID, taxonRank) %>%
summarize(
n_taxa = acceptedTaxonID %>%
unique() %>% length()) %>%
ggplot(aes(n_taxa, taxonRank)) +
facet_wrap(~ domainID + siteID) +
geom_col()
# library(scales)
# sum densities by order for each sampleID
table_observation_by_order <-
table_observation_cleaned %>%
filter(!is.na(order)) %>%
group_by(domainID, siteID, year,
eventID, sampleID, habitatType, order) %>%
summarize(order_dens = sum(inv_dens, na.rm = TRUE))
# rank occurrence by order
table_observation_by_order %>% head()
## # A tibble: 6 x 8
## # Groups: domainID, siteID, year, eventID, sampleID, habitatType [1]
## domainID siteID year eventID sampleID habitatType order order_dens
## <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 D02 POSE 2014 POSE.20140722 POSE.20140722.SURBER.1 riffle Branchiobdellida 516.
## 2 D02 POSE 2014 POSE.20140722 POSE.20140722.SURBER.1 riffle Coleoptera 516.
## 3 D02 POSE 2014 POSE.20140722 POSE.20140722.SURBER.1 riffle Decapoda 86.0
## 4 D02 POSE 2014 POSE.20140722 POSE.20140722.SURBER.1 riffle Diptera 5419.
## 5 D02 POSE 2014 POSE.20140722 POSE.20140722.SURBER.1 riffle Ephemeroptera 5301.
## 6 D02 POSE 2014 POSE.20140722 POSE.20140722.SURBER.1 riffle Megaloptera 387.
# stacked rank occurrence plot
table_observation_by_order %>%
group_by(order, siteID) %>%
summarize(
occurrence = (order_dens > 0) %>% sum()) %>%
ggplot(aes(
x = reorder(order, -occurrence),
y = occurrence,
color = siteID,
fill = siteID)) +
geom_col() +
theme(axis.text.x =
element_text(angle = 45, hjust = 1))
# faceted densities plot
table_observation_by_order %>%
ggplot(aes(
x = reorder(order, -order_dens),
y = log10(order_dens),
color = siteID,
fill = siteID)) +
geom_boxplot(alpha = .5) +
facet_grid(siteID ~ .) +
theme(axis.text.x =
element_text(angle = 45, hjust = 1))
Making Data 'wide'
For the next process, we will need to make our data table in the 'wide' format.
# select only site by species density info and remove duplicate records
table_sample_by_taxon_density_long <- table_observation_cleaned %>%
select(sampleID, acceptedTaxonID, inv_dens) %>%
distinct() %>%
filter(!is.na(inv_dens))
# table_sample_by_taxon_density_long %>% nrow()
# table_sample_by_taxon_density_long %>% distinct() %>% nrow()
# pivot to wide format, sum multiple counts per sampleID
table_sample_by_taxon_density_wide <- table_sample_by_taxon_density_long %>%
tidyr::pivot_wider(id_cols = sampleID,
names_from = acceptedTaxonID,
values_from = inv_dens,
values_fill = list(inv_dens = 0),
values_fn = list(inv_dens = sum)) %>%
column_to_rownames(var = "sampleID")
# check col and row sums -- mins should all be > 0
colSums(table_sample_by_taxon_density_wide) %>% min()
## [1] 12
rowSums(table_sample_by_taxon_density_wide) %>% min()
## [1] 25.55004
Multiscale Biodiversity
Reference: Jost, L. 2007. Partitioning diversity into independent alpha and beta components. Ecology 88:2427–2439. https://doi.org/10.1890/06-1736.1.
These metrics are based on Robert Whittaker's multiplicative diversity where
- gamma is regional biodiversity
- alpha is local biodiversity (e.g., the mean diversity at a patch)
- and beta diversity is a measure of among-patch variability in community composition.
Beta could be interpreted as the number of "distinct" communities present within the region.
The relationship among alpha, beta, and gamma diversity is: beta = gamma / alpha
The influence of relative abundances over the calculation of alpha, beta, and gamma diversity metrics is determined by the coefficient q. The coefficient "q" determines the "order" of the diversity metric, where q = 0 provides diversity measures based on richness, and higher orders of q give more weight to taxa that have higher abundances in the data. Order q = 1 is related to Shannon diveristy metrics, and order q = 2 is related to Simpson diversity metrics.
Alpha diversity is average local richness.
Order q = 0 alpha diversity calculated for our dataset returns a mean local richness (i.e., species counts) of ~30 taxa per sample across the entire data set.
# Here we use vegan::renyi to calculate Hill numbers
# If hill = FALSE, the function returns an entropy
# If hill = TRUE, the function returns the exponentiated
# entropy. In other words:
# exp(renyi entropy) = Hill number = "species equivalent"
# Note that for this function, the "scales" argument
# determines the order of q used in the calculation
table_sample_by_taxon_density_wide %>%
vegan::renyi(scales = 0, hill = TRUE) %>%
mean()
## [1] 30.06114
Comparing alpha diversity calculated using different orders:
Order q = 1 alpha diversity returns mean number of "species equivalents" per sample in the data set. This approach incorporates evenness because when abundances are more even across taxa, taxa are weighted more equally toward counting as a "species equivalent". For example, if you have a sample with 100 individuals, spread across 10 species, and each species is represented by 10 individuals, the number of order q = 1 species equivalents will equal the richness (10).
Alternatively, if 90 of the 100 individuals in the sample are one species, and the other 10 individuals are spread across the other 9 species, there will only be 1.72 order q = 1 species equivalents, whereas, there are still 10 species in the sample.
# even distribution, orders q = 0 and q = 1 for 10 taxa
vegan::renyi(
c(spp.a = 10, spp.b = 10, spp.c = 10,
spp.d = 10, spp.e = 10, spp.f = 10,
spp.g = 10, spp.h = 10, spp.i = 10,
spp.j = 10),
hill = TRUE,
scales = c(0, 1))
## 0 1
## 10 10
## attr(,"class")
## [1] "renyi" "numeric"
# uneven distribution, orders q = 0 and q = 1 for 10 taxa
vegan::renyi(
c(spp.a = 90, spp.b = 2, spp.c = 1,
spp.d = 1, spp.e = 1, spp.f = 1,
spp.g = 1, spp.h = 1, spp.i = 1,
spp.j = 1),
hill = TRUE,
scales = c(0, 1))
## 0 1
## 10.000000 1.718546
## attr(,"class")
## [1] "renyi" "numeric"
Comparing orders of q for NEON data
Let's compare the different orders q = 0, 1, and 2 measures of alpha diversity across the samples collected from ARIK, POSE, and MAYF.
# Nest data by siteID
data_nested_by_siteID <- table_sample_by_taxon_density_wide %>%
tibble::rownames_to_column("sampleID") %>%
left_join(table_sample_info %>%
select(sampleID, siteID)) %>%
tibble::column_to_rownames("sampleID") %>%
nest(data = -siteID)
data_nested_by_siteID$data[[1]] %>%
vegan::renyi(scales = 0, hill = TRUE) %>%
mean()
## [1] 24.69388
# apply the calculation by site for alpha diversity
# for each order of q
data_nested_by_siteID %>% mutate(
alpha_q0 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = .,
hill = TRUE,
scales = 0) %>% mean()),
alpha_q1 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = .,
hill = TRUE,
scales = 1) %>% mean()),
alpha_q2 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = .,
hill = TRUE,
scales = 2) %>% mean())
)
## # A tibble: 3 x 5
## siteID data alpha_q0 alpha_q1 alpha_q2
## <chr> <list> <dbl> <dbl> <dbl>
## 1 ARIK <tibble [147 x 458]> 24.7 10.2 6.52
## 2 MAYF <tibble [149 x 458]> 22.2 12.0 8.19
## 3 POSE <tibble [162 x 458]> 42.1 20.7 13.0
# Note that POSE has the highest mean alpha diversity
# To calculate gamma diversity at the site scale,
# calculate the column means and then calculate
# the renyi entropy and Hill number
# Here we are only calcuating order
# q = 0 gamma diversity
data_nested_by_siteID %>% mutate(
gamma_q0 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = colMeans(.),
hill = TRUE,
scales = 0)))
## # A tibble: 3 x 3
## siteID data gamma_q0
## <chr> <list> <dbl>
## 1 ARIK <tibble [147 x 458]> 243
## 2 MAYF <tibble [149 x 458]> 239
## 3 POSE <tibble [162 x 458]> 337
# Note that POSE has the highest gamma diversity
# Now calculate alpha, beta, and gamma using orders 0 and 1
# for each siteID
diversity_partitioning_results <-
data_nested_by_siteID %>%
mutate(
n_samples = purrr::map_int(data, ~ nrow(.)),
alpha_q0 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = .,
hill = TRUE,
scales = 0) %>% mean()),
alpha_q1 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = .,
hill = TRUE,
scales = 1) %>% mean()),
gamma_q0 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = colMeans(.),
hill = TRUE,
scales = 0)),
gamma_q1 = purrr::map_dbl(
.x = data,
.f = ~ vegan::renyi(x = colMeans(.),
hill = TRUE,
scales = 1)),
beta_q0 = gamma_q0 / alpha_q0,
beta_q1 = gamma_q1 / alpha_q1)
diversity_partitioning_results %>%
select(-data) %>% as.data.frame() %>% print()
## siteID n_samples alpha_q0 alpha_q1 gamma_q0 gamma_q1 beta_q0 beta_q1
## 1 ARIK 147 24.69388 10.19950 243 35.70716 9.840496 3.500873
## 2 MAYF 149 22.24832 12.02405 239 65.77590 10.742383 5.470360
## 3 POSE 162 42.11728 20.70184 337 100.16506 8.001466 4.838462
Using NMDS to ordinate samples
Finally, we will use Nonmetric Multidimensional Scaling (NMDS) to ordinate samples as shown below:
# create ordination using NMDS
my_nmds_result <- table_sample_by_taxon_density_wide %>% vegan::metaMDS()
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.2280867
## Run 1 stress 0.2297516
## Run 2 stress 0.2322618
## Run 3 stress 0.2492232
## Run 4 stress 0.2335912
## Run 5 stress 0.235082
## Run 6 stress 0.2396413
## Run 7 stress 0.2303469
## Run 8 stress 0.2363123
## Run 9 stress 0.2523796
## Run 10 stress 0.2288613
## Run 11 stress 0.2302371
## Run 12 stress 0.2302613
## Run 13 stress 0.2409554
## Run 14 stress 0.2308922
## Run 15 stress 0.2528171
## Run 16 stress 0.2534587
## Run 17 stress 0.2320313
## Run 18 stress 0.239435
## Run 19 stress 0.2293618
## Run 20 stress 0.2307903
## *** No convergence -- monoMDS stopping criteria:
## 1: no. of iterations >= maxit
## 18: stress ratio > sratmax
## 1: scale factor of the gradient < sfgrmin
# plot stress
my_nmds_result$stress
## [1] 0.2280867
p1 <- vegan::ordiplot(my_nmds_result)
vegan::ordilabel(p1, "species")
# merge NMDS scores with sampleID information for plotting
nmds_scores <- my_nmds_result %>%
vegan::scores() %>%
.[["sites"]] %>%
as.data.frame() %>%
tibble::rownames_to_column("sampleID") %>%
left_join(table_sample_info)
# # How I determined the outlier(s)
nmds_scores %>% arrange(desc(NMDS1)) %>% head()
## sampleID NMDS1 NMDS2 domainID siteID namedLocation collectDate eventID year habitatType
## 1 MAYF.20190311.CORE.2 1.590745 1.0833382 D08 MAYF MAYF.AOS.reach 2019-03-11 15:00:00 MAYF.20190311 2019 run
## 2 MAYF.20201117.CORE.2 1.395784 0.4986856 D08 MAYF MAYF.AOS.reach 2020-11-17 16:33:00 MAYF.20201117 2020 run
## 3 MAYF.20180726.CORE.2 1.372494 0.2603682 D08 MAYF MAYF.AOS.reach 2018-07-26 14:17:00 MAYF.20180726 2018 run
## 4 MAYF.20190311.CORE.1 1.299395 1.0075703 D08 MAYF MAYF.AOS.reach 2019-03-11 15:00:00 MAYF.20190311 2019 run
## 5 MAYF.20170314.CORE.1 1.132679 1.6469463 D08 MAYF MAYF.AOS.reach 2017-03-14 14:11:00 MAYF.20170314 2017 run
## 6 MAYF.20180326.CORE.3 1.130687 -0.7139679 D08 MAYF MAYF.AOS.reach 2018-03-26 14:50:00 MAYF.20180326 2018 run
## samplerType benthicArea inv_dens_unit
## 1 core 0.006 count per square meter
## 2 core 0.006 count per square meter
## 3 core 0.006 count per square meter
## 4 core 0.006 count per square meter
## 5 core 0.006 count per square meter
## 6 core 0.006 count per square meter
nmds_scores %>% arrange(desc(NMDS1)) %>% tail()
## sampleID NMDS1 NMDS2 domainID siteID namedLocation collectDate eventID year habitatType
## 453 ARIK.20160919.KICKNET.5 -0.8577931 -0.245144245 D10 ARIK ARIK.AOS.reach 2016-09-19 22:06:00 ARIK.20160919 2016 run
## 454 ARIK.20160919.KICKNET.1 -0.8694139 0.291753483 D10 ARIK ARIK.AOS.reach 2016-09-19 22:06:00 ARIK.20160919 2016 run
## 455 ARIK.20150714.CORE.3 -0.8843672 0.013601377 D10 ARIK ARIK.AOS.reach 2015-07-14 14:55:00 ARIK.20150714 2015 pool
## 456 ARIK.20150714.CORE.2 -1.0465497 0.004066437 D10 ARIK ARIK.AOS.reach 2015-07-14 14:55:00 ARIK.20150714 2015 pool
## 457 ARIK.20160919.KICKNET.4 -1.0937181 -0.148046639 D10 ARIK ARIK.AOS.reach 2016-09-19 22:06:00 ARIK.20160919 2016 run
## 458 ARIK.20160331.CORE.3 -1.1791981 -0.327145374 D10 ARIK ARIK.AOS.reach 2016-03-31 15:41:00 ARIK.20160331 2016 pool
## samplerType benthicArea inv_dens_unit
## 453 modifiedKicknet 0.250 count per square meter
## 454 modifiedKicknet 0.250 count per square meter
## 455 core 0.006 count per square meter
## 456 core 0.006 count per square meter
## 457 modifiedKicknet 0.250 count per square meter
## 458 core 0.006 count per square meter
# Plot samples in community composition space by year
nmds_scores %>%
ggplot(aes(NMDS1, NMDS2, color = siteID,
shape = samplerType)) +
geom_point() +
facet_wrap(~ as.factor(year))
# Plot samples in community composition space
# facet by siteID and habitat type
# color by year
nmds_scores %>%
ggplot(aes(NMDS1, NMDS2, color = as.factor(year),
shape = samplerType)) +
geom_point() +
facet_grid(habitatType ~ siteID, scales = "free")
Get Lesson Code
Create a Canopy Height Model from lidar-derived Rasters in R
Authors: Edmund Hart, Leah A. Wasser
Last Updated: Apr 8, 2021
A common analysis using lidar data are to derive top of the canopy height values from the lidar data. These values are often used to track changes in forest structure over time, to calculate biomass, and even leaf area index (LAI). Let's dive into the basics of working with raster formatted lidar data in R!
Learning Objectives
After completing this tutorial, you will be able to:
- Work with digital terrain model (DTM) & digital surface model (DSM) raster files.
- Create a canopy height model (CHM) raster from DTM & DSM rasters.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Install R Packages
-
raster:
install.packages("raster")
-
rgdal:
install.packages("rgdal")
More on Packages in R - Adapted from Software Carpentry.
Download Data
NEON Teaching Data Subset: Field Site Spatial Data
These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. The entire dataset can be accessed by request from the NEON Data Portal.
Download DatasetThis tutorial is designed for you to set your working directory to the directory created by unzipping this file.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Recommended Reading
What is a CHM, DSM and DTM? About Gridded, Raster lidar DataCreate a lidar-derived Canopy Height Model (CHM)
The National Ecological Observatory Network (NEON) will provide lidar-derived data products as one of its many free ecological data products. These products will come in the GeoTIFF format, which is a .tif raster format that is spatially located on the earth.
In this tutorial, we create a Canopy Height Model. The Canopy Height Model (CHM), represents the heights of the trees on the ground. We can derive the CHM by subtracting the ground elevation from the elevation of the top of the surface (or the tops of the trees).
We will use the raster
R package to work with the the lidar-derived digital
surface model (DSM) and the digital terrain model (DTM).
# Load needed packages
library(raster)
library(rgdal)
# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd="~/Git/data/" #This will depend on your local environment
setwd(wd)
First, we will import the Digital Surface Model (DSM). The DSM represents the elevation of the top of the objects on the ground (trees, buildings, etc).
# assign raster to object
dsm <- raster(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/DigitalSurfaceModel/SJER2013_DSM.tif"))
# view info about the raster.
dsm
## class : RasterLayer
## dimensions : 5060, 4299, 21752940 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : 254570, 258869, 4107302, 4112362 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs
## source : /Users/olearyd/Git/data/NEON-DS-Field-Site-Spatial-Data/SJER/DigitalSurfaceModel/SJER2013_DSM.tif
## names : SJER2013_DSM
# plot the DSM
plot(dsm, main="Lidar Digital Surface Model \n SJER, California")
Note the resolution, extent, and coordinate reference system (CRS) of the raster. To do later steps, our DTM will need to be the same.
Next, we will import the Digital Terrain Model (DTM) for the same area. The DTM represents the ground (terrain) elevation.
# import the digital terrain model
dtm <- raster(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/DigitalTerrainModel/SJER2013_DTM.tif"))
plot(dtm, main="Lidar Digital Terrain Model \n SJER, California")
With both of these rasters now loaded, we can create the Canopy Height Model (CHM). The CHM represents the difference between the DSM and the DTM or the height of all objects on the surface of the earth.
To do this we perform some basic raster math to calculate the CHM. You can perform the same raster math in a GIS program like QGIS.
When you do the math, make sure to subtract the DTM from the DSM or you'll get trees with negative heights!
# use raster math to create CHM
chm <- dsm - dtm
# view CHM attributes
chm
## class : RasterLayer
## dimensions : 5060, 4299, 21752940 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : 254570, 258869, 4107302, 4112362 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs
## source : memory
## names : layer
## values : -1.399994, 40.29001 (min, max)
plot(chm, main="Lidar Canopy Height Model \n SJER, California")
We've now created a CHM from our DSM and DTM. What do you notice about the canopy cover at this location in the San Joaquin Experimental Range?
Convert the CHM from meters to feet. Plot it.
If, in your work you need to create lots of CHMs from different rasters, an efficient way to do this would be to create a function to create your CHMs.
# Create a function that subtracts one raster from another
#
canopyCalc <- function(DTM, DSM) {
return(DSM -DTM)
}
# use the function to create the final CHM
chm2 <- canopyCalc(dsm,dtm)
chm2
## class : RasterLayer
## dimensions : 5060, 4299, 21752940 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : 254570, 258869, 4107302, 4112362 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs
## source : memory
## names : layer
## values : -40.29001, 1.399994 (min, max)
# or use the overlay function
chm3 <- overlay(dsm,dtm,fun = canopyCalc)
chm3
## class : RasterLayer
## dimensions : 5060, 4299, 21752940 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : 254570, 258869, 4107302, 4112362 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs
## source : memory
## names : layer
## values : -40.29001, 1.399994 (min, max)
As with any raster, we can write out the CHM as a GeoTiff using the
writeRaster()
function.
# write out the CHM in tiff format.
writeRaster(chm,paste0(wd,"chm_SJER.tif"),"GTiff")
We've now successfully created a canopy height model using basic raster math -- in
R! We can bring the chm_SJER.tif
file into QGIS (or any GIS program) and look
at it.
Consider going onto the next tutorial Extract Values from a Raster in R to compare this lidar-derived CHM with ground-based observations!
Get Lesson Code
Compare tree height measured from the ground to a Lidar-based Canopy Height Model
Authors: Claire K. Lunch
Last Updated: Aug 31, 2023
This data tutorial provides instruction on working with two different NEON data products to estimate tree height:
- DP3.30015.001, Ecosystem structure, aka Canopy Height Model (CHM)
- DP1.10098.001, Vegetation structure
The CHM data are derived from the Lidar point cloud data collected by the remote sensing platform. The vegetation structure data are collected by by field staff on the ground. We will be using data from the Wind River Experimental Forest NEON field site located in Washington state. The predominant vegetation there are tall evergreen conifers.
If you are coming to this exercise after following tutorials on data download and formatting, and therefore already have the needed data, skip ahead to section 4.
Things You’ll Need To Complete This Tutorial
You will need the most current version of R loaded on your computer to complete this tutorial.
1. Setup
Start by installing and loading packages (if necessary) and setting
options. One of the packages we'll be using, geoNEON
, is only
available via GitHub, so it's installed using the devtools
package.
The other packages can be installed directly from CRAN.
Installation can be run once, then periodically to get package updates.
install.packages("neonUtilities")
install.packages("neonOS")
install.packages("terra")
install.packages("devtools")
devtools::install_github("NEONScience/NEON-geolocation/geoNEON")
Now load packages. This needs to be done every time you run code. We'll also set a working directory for data downloads.
library(terra)
library(neonUtilities)
library(neonOS)
library(geoNEON)
options(stringsAsFactors=F)
# set working directory
# adapt directory path for your system
wd <- "~/data"
setwd(wd)
2. Vegetation structure data
Download the vegetation structure data using the loadByProduct()
function in
the neonUtilities
package. Inputs needed to the function are:
-
dpID
: data product ID; woody vegetation structure = DP1.10098.001 -
site
: (vector of) 4-letter site codes; Wind River = WREF -
package
: basic or expanded; we'll download basic here -
check.size
: should this function prompt the user with an estimated download size? Set toFALSE
here for ease of processing as a script, but good to leave as defaultTRUE
when downloading a dataset for the first time.
Refer to the cheat sheet
for the neonUtilities
package for more details if desired.
veglist <- loadByProduct(dpID="DP1.10098.001",
site="WREF",
package="basic",
check.size = FALSE)
Use the getLocTOS()
function in the geoNEON
package to get
precise locations for the tagged plants. Refer to the package
documentation for more details.
vegmap <- getLocTOS(veglist$vst_mappingandtagging,
"vst_mappingandtagging")
Now we have the mapped locations of individuals in the vst_mappingandtagging
table, and the annual measurements of tree dimensions such as height and
diameter in the vst_apparentindividual
table. To bring these measurements
together, join the two tables, using the joinTableNEON()
function from the
neonOS
package. Refer to the Quick Start Guide
for Vegetation structure for more information about the data tables and the
joining instructions joinTableNEON()
is using.
veg <- joinTableNEON(veglist$vst_apparentindividual,
vegmap,
name1="vst_apparentindividual",
name2="vst_mappingandtagging")
Let's see what the data look like! Make a stem map of the plants in
plot WREF_075. Note that the circles
argument of the symbols()
function expects a radius, but
stemDiameter
is just that, a diameter, so we will need to divide by two.
And stemDiameter
is in centimeters, but the mapping scale is in meters,
so we also need to divide by 100 to get the scale right.
symbols(veg$adjEasting[which(veg$plotID=="WREF_075")],
veg$adjNorthing[which(veg$plotID=="WREF_075")],
circles=veg$stemDiameter[which(veg$plotID=="WREF_075")]/100/2,
inches=F, xlab="Easting", ylab="Northing")
And now overlay the estimated uncertainty in the location of each stem, in blue:
symbols(veg$adjEasting[which(veg$plotID=="WREF_075")],
veg$adjNorthing[which(veg$plotID=="WREF_075")],
circles=veg$stemDiameter[which(veg$plotID=="WREF_075")]/100/2,
inches=F, xlab="Easting", ylab="Northing")
symbols(veg$adjEasting[which(veg$plotID=="WREF_075")],
veg$adjNorthing[which(veg$plotID=="WREF_075")],
circles=veg$adjCoordinateUncertainty[which(veg$plotID=="WREF_075")],
inches=F, add=T, fg="lightblue")
3. Canopy height model data
Now we'll download the CHM tile covering plot WREF_075. Several other plots are also covered by this tile. We could download all tiles that contain vegetation structure plots, but in this exercise we're sticking to one tile to limit download size and processing time.
The tileByAOP()
function in the neonUtilities
package allows for
download of remote sensing tiles based on easting and northing
coordinates, so we'll give it the coordinates of all the trees in
plot WREF_075 and the data product ID, DP3.30015.001 (note that if
WREF_075 crossed tile boundaries, this code would download all
relevant tiles).
The download will include several metadata files as well as the data
tile. Load the data tile into the environment using the terra
package.
byTileAOP(dpID="DP3.30015.001", site="WREF", year="2017",
easting=veg$adjEasting[which(veg$plotID=="WREF_075")],
northing=veg$adjNorthing[which(veg$plotID=="WREF_075")],
check.size=FALSE, savepath=wd)
chm <- rast(paste0(wd, "/DP3.30015.001/neon-aop-products/2017/FullSite/D16/2017_WREF_1/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D16_WREF_DP3_580000_5075000_CHM.tif"))
Let's view the tile.
plot(chm, col=topo.colors(5))
4. Comparing the two datasets
Now we have the heights of individual trees measured from the ground, and the height of the top surface of the canopy, measured from the air. There are many different ways to make a comparison between these two datasets! This section will walk through three different approaches.
First, subset the vegetation structure data to only the trees that fall
within this tile, using the ext()
function from the terra
package.
This step isn't strictly necessary, but it will make the processing faster.
vegsub <- veg[which(veg$adjEasting >= ext(chm)[1] &
veg$adjEasting <= ext(chm)[2] &
veg$adjNorthing >= ext(chm)[3] &
veg$adjNorthing <= ext(chm)[4]),]
Starting with a very simple first pass: use the extract()
function
from the terra
package to get the CHM value matching the coordinates
of each mapped plant. Then make a scatter plot of each tree's height
vs. the CHM value at its location.
valCHM <- extract(chm,
cbind(vegsub$adjEasting,
vegsub$adjNorthing))
plot(valCHM$NEON_D16_WREF_DP3_580000_5075000_CHM~
vegsub$height, pch=20, xlab="Height",
ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")
How strong is the correlation between the ground and lidar measurements?
cor(valCHM$NEON_D16_WREF_DP3_580000_5075000_CHM,
vegsub$height, use="complete")
## [1] 0.3775402
There are a lot of points clustered on the 1-1 line, but there is also a cloud of points above the line, where the measured height is lower than the canopy height model at the same coordinates. This makes sense, because we made no attempt to filter out the understory. There are likely many plants measured in the vegetation structure data that are not at the top of the canopy, and the CHM sees only the top surface of the canopy.
How to exclude understory plants from this analysis? Again, there are many possible approaches. We'll try out two, one map-centric and one tree-centric.
Starting with the map-centric approach: select a pixel size, and aggregate both the vegetation structure data and the CHM data to find the tallest point in each pixel. Let's try this with 10m pixels.
Start by rounding the coordinates of the vegetation structure data, to create
10m bins. Use floor()
instead of round()
so each tree ends up in the pixel
with the same numbering as the raster pixels (the rasters/pixels are
numbered by their southwest corners).
easting10 <- 10*floor(vegsub$adjEasting/10)
northing10 <- 10*floor(vegsub$adjNorthing/10)
vegsub <- cbind(vegsub, easting10, northing10)
Use the aggregate()
function to get the tallest tree in each 10m bin.
vegbin <- stats::aggregate(vegsub,
by=list(vegsub$easting10,
vegsub$northing10),
FUN=max)
To get the CHM values for the 10m bins, use the terra
package version
of the aggregate()
function. Let's take a look at the lower-resolution
image we get as a result.
CHM10 <- terra::aggregate(chm, fact=10, fun=max)
plot(CHM10, col=topo.colors(5))
Use the extract()
function again to get the values from each pixel.
Our grids are numbered by the corners, so add 5 to each tree
coordinate to make sure it's in the correct pixel.
vegbin$easting10 <- vegbin$easting10 + 5
vegbin$northing10 <- vegbin$northing10 + 5
binCHM <- extract(CHM10, cbind(vegbin$easting10,
vegbin$northing10))
plot(binCHM$NEON_D16_WREF_DP3_580000_5075000_CHM~
vegbin$height, pch=20,
xlab="Height", ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")
cor(binCHM$NEON_D16_WREF_DP3_580000_5075000_CHM,
vegbin$height, use="complete")
## [1] 0.2275314
The understory points are thinned out substantially, but so are the rest. We've lost a lot of data by going to a lower resolution.
Let's try and see if we can identify the tallest trees by another approach, using the trees as the starting point instead of map area. Start by sorting the veg structure data by height.
vegsub <- vegsub[order(vegsub$height,
decreasing=T),]
Now, for each tree, let's estimate which nearby trees might be beneath its canopy, and discard those points. To do this:
- Calculate the distance of each tree from the target tree.
- Pick a reasonable estimate for canopy size, and discard shorter trees within that radius. The radius I used is 0.3 times the height, based on some rudimentary googling about Douglas fir allometry. It could definitely be improved on!
- Iterate over all trees.
We'll use a simple for
loop to do this:
vegfil <- vegsub
for(i in 1:nrow(vegsub)) {
if(is.na(vegfil$height[i]))
next
dist <- sqrt((vegsub$adjEasting[i]-vegsub$adjEasting)^2 +
(vegsub$adjNorthing[i]-vegsub$adjNorthing)^2)
vegfil$height[which(dist<0.3*vegsub$height[i] &
vegsub$height<vegsub$height[i])] <- NA
}
vegfil <- vegfil[which(!is.na(vegfil$height)),]
Now extract the raster values, as above.
filterCHM <- extract(chm,
cbind(vegfil$adjEasting,
vegfil$adjNorthing))
plot(filterCHM$NEON_D16_WREF_DP3_580000_5075000_CHM~
vegfil$height, pch=20,
xlab="Height", ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")
cor(filterCHM$NEON_D16_WREF_DP3_580000_5075000_CHM,
vegfil$height)
## [1] 0.7833395
This is quite a bit better! There are still several understory points we failed to exclude, but we were able to filter out most of the understory without losing so many overstory points.
Let's try one last thing. The plantStatus
field in the veg structure data
indicates whether a plant is dead, broken, or otherwise damaged. In theory,
a dead or broken tree can still be the tallest thing around, but it's less
likely, and it's also less likely to get a good Lidar return. Exclude all
trees that aren't alive:
vegfil <- vegfil[which(vegfil$plantStatus=="Live"),]
filterCHM <- extract(chm,
cbind(vegfil$adjEasting,
vegfil$adjNorthing))
plot(filterCHM$NEON_D16_WREF_DP3_580000_5075000_CHM~
vegfil$height, pch=20,
xlab="Height", ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")
cor(filterCHM$NEON_D16_WREF_DP3_580000_5075000_CHM,
vegfil$height)
## [1] 0.9000654
Nice!
One final note: however we slice the data, there is a noticeable bias even in the strongly correlated values. The CHM heights are generally a bit shorter than the ground-based estimates of tree height. There are two biases in the CHM data that contribute to this. (1) Lidar returns from short-statured vegetation are difficult to distinguish from the ground, so the "ground" estimated by Lidar is generally a bit higher than the true ground surface, and (2) the height estimate from Lidar represents the highest return, but the highest return may slightly miss the actual tallest point on a given tree. This is especially likely to happen with conifers, which are the top-of-canopy trees at Wind River.
Get Lesson Code
Introduction to working with NEON eddy flux data
Authors: [Claire K. Lunch]
Last Updated: Mar 12, 2021
This data tutorial provides an introduction to working with NEON eddy
flux data, using the neonUtilities
R package. If you are new to NEON
data, we recommend starting with a more general tutorial, such as the
neonUtilities tutorial
or the Download and Explore tutorial.
Some of the functions and techniques described in those tutorials will
be used here, as well as functions and data formats that are unique to
the eddy flux system.
This tutorial assumes general familiarity with eddy flux data and associated concepts.
1. Setup
Start by installing and loading packages and setting options.
To work with the NEON flux data, we need the rhdf5
package,
which is hosted on Bioconductor, and requires a different
installation process than CRAN packages:
install.packages('BiocManager')
BiocManager::install('rhdf5')
install.packages('neonUtilities')
options(stringsAsFactors=F)
library(neonUtilities)
Use the zipsByProduct()
function from the neonUtilities
package to
download flux data from two sites and two months. The transformations
and functions below will work on any time range and site(s), but two
sites and two months allows us to see all the available functionality
while minimizing download size.
Inputs to the zipsByProduct()
function:
-
dpID
: DP4.00200.001, the bundled eddy covariance product -
package
: basic (the expanded package is not covered in this tutorial) -
site
: NIWO = Niwot Ridge and HARV = Harvard Forest -
startdate
: 2018-06 (both dates are inclusive) -
enddate
: 2018-07 (both dates are inclusive) -
savepath
: modify this to something logical on your machine -
check.size
: T if you want to see file size before downloading, otherwise F
The download may take a while, especially if you're on a slow network. For faster downloads, consider using an API token.
zipsByProduct(dpID="DP4.00200.001", package="basic",
site=c("NIWO", "HARV"),
startdate="2018-06", enddate="2018-07",
savepath="~/Downloads",
check.size=F)
2. Data Levels
There are five levels of data contained in the eddy flux bundle. For full details, refer to the NEON algorithm document.
Briefly, the data levels are:
- Level 0' (dp0p): Calibrated raw observations
- Level 1 (dp01): Time-aggregated observations, e.g. 30-minute mean gas concentrations
- Level 2 (dp02): Time-interpolated data, e.g. rate of change of a gas concentration
- Level 3 (dp03): Spatially interpolated data, i.e. vertical profiles
- Level 4 (dp04): Fluxes
The dp0p data are available in the expanded data package and are beyond the scope of this tutorial.
The dp02 and dp03 data are used in storage calculations, and the dp04 data include both the storage and turbulent components. Since many users will want to focus on the net flux data, we'll start there.
3. Extract Level 4 data (Fluxes!)
To extract the Level 4 data from the HDF5 files and merge them into a
single table, we'll use the stackEddy()
function from the neonUtilities
package.
stackEddy()
requires two inputs:
-
filepath
: Path to a file or folder, which can be any one of:- A zip file of eddy flux data downloaded from the NEON data portal
- A folder of eddy flux data downloaded by the
zipsByProduct()
function - The folder of files resulting from unzipping either of 1 or 2
- One or more HDF5 files of NEON eddy flux data
-
level
: dp01-4
Input the filepath you downloaded to using zipsByProduct()
earlier,
including the filestoStack00200
folder created by the function, and
dp04
:
flux <- stackEddy(filepath="~/Downloads/filesToStack00200",
level="dp04")
We now have an object called flux
. It's a named list containing four
tables: one table for each site's data, and variables
and objDesc
tables.
names(flux)
## [1] "HARV" "NIWO" "variables" "objDesc"
Let's look at the contents of one of the site data files:
head(flux$NIWO)
## timeBgn timeEnd data.fluxCo2.nsae.flux data.fluxCo2.stor.flux data.fluxCo2.turb.flux
## 1 2018-06-01 00:00:00 2018-06-01 00:29:59 0.1713858 -0.06348163 0.2348674
## 2 2018-06-01 00:30:00 2018-06-01 00:59:59 0.9251711 0.08748146 0.8376896
## 3 2018-06-01 01:00:00 2018-06-01 01:29:59 0.5005812 0.02231698 0.4782642
## 4 2018-06-01 01:30:00 2018-06-01 01:59:59 0.8032820 0.25569306 0.5475889
## 5 2018-06-01 02:00:00 2018-06-01 02:29:59 0.4897685 0.23090472 0.2588638
## 6 2018-06-01 02:30:00 2018-06-01 02:59:59 0.9223979 0.06228581 0.8601121
## data.fluxH2o.nsae.flux data.fluxH2o.stor.flux data.fluxH2o.turb.flux data.fluxMome.turb.veloFric
## 1 15.876622 3.3334970 12.543125 0.2047081
## 2 8.089274 -1.2063258 9.295600 0.1923735
## 3 5.290594 -4.4190781 9.709672 0.1200918
## 4 9.190214 0.2030371 8.987177 0.1177545
## 5 3.111909 0.1349363 2.976973 0.1589189
## 6 4.613676 -0.3929445 5.006621 0.1114406
## data.fluxTemp.nsae.flux data.fluxTemp.stor.flux data.fluxTemp.turb.flux data.foot.stat.angZaxsErth
## 1 4.7565505 -1.4575094 6.2140599 94.2262
## 2 -0.2717454 0.3403877 -0.6121331 355.4252
## 3 -4.2055147 0.1870677 -4.3925824 359.8013
## 4 -13.3834484 -2.4904300 -10.8930185 137.7743
## 5 -5.1854815 -0.7514531 -4.4340284 188.4799
## 6 -7.7365481 -1.9046775 -5.8318707 183.1920
## data.foot.stat.distReso data.foot.stat.veloYaxsHorSd data.foot.stat.veloZaxsHorSd data.foot.stat.veloFric
## 1 8.34 0.7955893 0.2713232 0.2025427
## 2 8.34 0.8590177 0.2300000 0.2000000
## 3 8.34 1.2601763 0.2300000 0.2000000
## 4 8.34 0.7332641 0.2300000 0.2000000
## 5 8.34 0.7096286 0.2300000 0.2000000
## 6 8.34 0.3789859 0.2300000 0.2000000
## data.foot.stat.distZaxsMeasDisp data.foot.stat.distZaxsRgh data.foot.stat.distZaxsAbl
## 1 8.34 0.04105708 1000
## 2 8.34 0.27991938 1000
## 3 8.34 0.21293225 1000
## 4 8.34 0.83400000 1000
## 5 8.34 0.83400000 1000
## 6 8.34 0.83400000 1000
## data.foot.stat.distXaxs90 data.foot.stat.distXaxsMax data.foot.stat.distYaxs90 qfqm.fluxCo2.nsae.qfFinl
## 1 325.26 133.44 25.02 1
## 2 266.88 108.42 50.04 1
## 3 275.22 116.76 66.72 1
## 4 208.50 83.40 75.06 1
## 5 208.50 83.40 66.72 1
## 6 208.50 83.40 41.70 1
## qfqm.fluxCo2.stor.qfFinl qfqm.fluxCo2.turb.qfFinl qfqm.fluxH2o.nsae.qfFinl qfqm.fluxH2o.stor.qfFinl
## 1 1 1 1 1
## 2 1 1 1 0
## 3 1 1 1 0
## 4 1 1 1 0
## 5 1 1 1 0
## 6 1 1 1 1
## qfqm.fluxH2o.turb.qfFinl qfqm.fluxMome.turb.qfFinl qfqm.fluxTemp.nsae.qfFinl qfqm.fluxTemp.stor.qfFinl
## 1 1 0 0 0
## 2 1 0 1 0
## 3 1 1 0 0
## 4 1 1 0 0
## 5 1 0 0 0
## 6 1 0 0 0
## qfqm.fluxTemp.turb.qfFinl qfqm.foot.turb.qfFinl
## 1 0 0
## 2 1 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
The variables
and objDesc
tables can help you interpret the column
headers in the data table. The objDesc
table contains definitions for
many of the terms used in the eddy flux data product, but it isn't
complete. To get the terms of interest, we'll break up the column headers
into individual terms and look for them in the objDesc
table:
term <- unlist(strsplit(names(flux$NIWO), split=".", fixed=T))
flux$objDesc[which(flux$objDesc$Object %in% term),]
## Object
## 138 angZaxsErth
## 171 data
## 343 qfFinl
## 420 qfqm
## 604 timeBgn
## 605 timeEnd
## Description
## 138 Wind direction
## 171 Represents data fields
## 343 The final quality flag indicating if the data are valid for the given aggregation period (1=fail, 0=pass)
## 420 Quality flag and quality metrics, represents quality flags and quality metrics that accompany the provided data
## 604 The beginning time of the aggregation period
## 605 The end time of the aggregation period
For the terms that aren't captured here, fluxCo2
, fluxH2o
, and fluxTemp
are self-explanatory. The flux components are
-
turb
: Turbulent flux -
stor
: Storage -
nsae
: Net surface-atmosphere exchange
The variables
table contains the units for each field:
flux$variables
## category system variable stat units
## 1 data fluxCo2 nsae timeBgn NA
## 2 data fluxCo2 nsae timeEnd NA
## 3 data fluxCo2 nsae flux umolCo2 m-2 s-1
## 4 data fluxCo2 stor timeBgn NA
## 5 data fluxCo2 stor timeEnd NA
## 6 data fluxCo2 stor flux umolCo2 m-2 s-1
## 7 data fluxCo2 turb timeBgn NA
## 8 data fluxCo2 turb timeEnd NA
## 9 data fluxCo2 turb flux umolCo2 m-2 s-1
## 10 data fluxH2o nsae timeBgn NA
## 11 data fluxH2o nsae timeEnd NA
## 12 data fluxH2o nsae flux W m-2
## 13 data fluxH2o stor timeBgn NA
## 14 data fluxH2o stor timeEnd NA
## 15 data fluxH2o stor flux W m-2
## 16 data fluxH2o turb timeBgn NA
## 17 data fluxH2o turb timeEnd NA
## 18 data fluxH2o turb flux W m-2
## 19 data fluxMome turb timeBgn NA
## 20 data fluxMome turb timeEnd NA
## 21 data fluxMome turb veloFric m s-1
## 22 data fluxTemp nsae timeBgn NA
## 23 data fluxTemp nsae timeEnd NA
## 24 data fluxTemp nsae flux W m-2
## 25 data fluxTemp stor timeBgn NA
## 26 data fluxTemp stor timeEnd NA
## 27 data fluxTemp stor flux W m-2
## 28 data fluxTemp turb timeBgn NA
## 29 data fluxTemp turb timeEnd NA
## 30 data fluxTemp turb flux W m-2
## 31 data foot stat timeBgn NA
## 32 data foot stat timeEnd NA
## 33 data foot stat angZaxsErth deg
## 34 data foot stat distReso m
## 35 data foot stat veloYaxsHorSd m s-1
## 36 data foot stat veloZaxsHorSd m s-1
## 37 data foot stat veloFric m s-1
## 38 data foot stat distZaxsMeasDisp m
## 39 data foot stat distZaxsRgh m
## 40 data foot stat distZaxsAbl m
## 41 data foot stat distXaxs90 m
## 42 data foot stat distXaxsMax m
## 43 data foot stat distYaxs90 m
## 44 qfqm fluxCo2 nsae timeBgn NA
## 45 qfqm fluxCo2 nsae timeEnd NA
## 46 qfqm fluxCo2 nsae qfFinl NA
## 47 qfqm fluxCo2 stor qfFinl NA
## 48 qfqm fluxCo2 stor timeBgn NA
## 49 qfqm fluxCo2 stor timeEnd NA
## 50 qfqm fluxCo2 turb timeBgn NA
## 51 qfqm fluxCo2 turb timeEnd NA
## 52 qfqm fluxCo2 turb qfFinl NA
## 53 qfqm fluxH2o nsae timeBgn NA
## 54 qfqm fluxH2o nsae timeEnd NA
## 55 qfqm fluxH2o nsae qfFinl NA
## 56 qfqm fluxH2o stor qfFinl NA
## 57 qfqm fluxH2o stor timeBgn NA
## 58 qfqm fluxH2o stor timeEnd NA
## 59 qfqm fluxH2o turb timeBgn NA
## 60 qfqm fluxH2o turb timeEnd NA
## 61 qfqm fluxH2o turb qfFinl NA
## 62 qfqm fluxMome turb timeBgn NA
## 63 qfqm fluxMome turb timeEnd NA
## 64 qfqm fluxMome turb qfFinl NA
## 65 qfqm fluxTemp nsae timeBgn NA
## 66 qfqm fluxTemp nsae timeEnd NA
## 67 qfqm fluxTemp nsae qfFinl NA
## 68 qfqm fluxTemp stor qfFinl NA
## 69 qfqm fluxTemp stor timeBgn NA
## 70 qfqm fluxTemp stor timeEnd NA
## 71 qfqm fluxTemp turb timeBgn NA
## 72 qfqm fluxTemp turb timeEnd NA
## 73 qfqm fluxTemp turb qfFinl NA
## 74 qfqm foot turb timeBgn NA
## 75 qfqm foot turb timeEnd NA
## 76 qfqm foot turb qfFinl NA
Let's plot some data! First, a brief aside about time stamps, since these are time series data.
Time stamps
NEON sensor data come with time stamps for both the start and end of the averaging period. Depending on the analysis you're doing, you may want to use one or the other; for general plotting, re-formatting, and transformations, I prefer to use the start time, because there are some small inconsistencies between data products in a few of the end time stamps.
Note that all NEON data use UTC time, aka Greenwich Mean Time.
This is true across NEON's instrumented, observational, and airborne
measurements. When working with NEON data, it's best to keep
everything in UTC as much as possible, otherwise it's very easy to
end up with data in mismatched times, which can cause insidious and
hard-to-detect problems. In the code below, time stamps and time
zones have been handled by stackEddy()
and loadByProduct()
, so we
don't need to do anything additional. But if you're writing your own
code and need to convert times, remember that if the time zone isn't
specified, R will default to the local time zone it detects on your
operating system.
plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn,
pch=".", xlab="Date", ylab="CO2 flux")
There is a clear diurnal pattern, and an increase in daily carbon uptake as the growing season progresses.
Let's trim down to just two days of data to see a few other details.
plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn,
pch=20, xlab="Date", ylab="CO2 flux",
xlim=c(as.POSIXct("2018-07-07", tz="GMT"),
as.POSIXct("2018-07-09", tz="GMT")),
ylim=c(-20,20), xaxt="n")
axis.POSIXct(1, x=flux$NIWO$timeBgn,
format="%Y-%m-%d %H:%M:%S")
Note the timing of C uptake; the UTC time zone is clear here, where uptake occurs at times that appear to be during the night.
4. Merge flux data with other sensor data
Many of the data sets we would use to interpret and model flux data are measured as part of the NEON project, but are not present in the eddy flux data product bundle. In this section, we'll download PAR data and merge them with the flux data; the steps taken here can be applied to any of the NEON instrumented (IS) data products.
Download PAR data
To get NEON PAR data, use the loadByProduct()
function from the
neonUtilities
package. loadByProduct()
takes the same inputs as
zipsByProduct()
, but it loads the downloaded data directly into the
current R environment.
Let's download PAR data matching the Niwot Ridge flux data. The inputs needed are:
-
dpID
: DP1.00024.001 -
site
: NIWO -
startdate
: 2018-06 -
enddate
: 2018-07 -
package
: basic -
timeIndex
: 30
The new input here is timeIndex=30
, which downloads only the 30-minute data.
Since the flux data are at a 30-minute resolution, we can save on
download time by disregarding the 1-minute data files (which are of course
30 times larger). The timeIndex
input can be left off if you want to download
all available averaging intervals.
pr <- loadByProduct("DP1.00024.001", site="NIWO",
timeIndex=30, package="basic",
startdate="2018-06", enddate="2018-07",
check.size=F)
pr
is another named list, and again, metadata and units can be found in
the variables
table. The PARPAR_30min
table contains a verticalPosition
field. This field indicates the position on the tower, with 10 being the
first tower level, and 20, 30, etc going up the tower.
Join PAR to flux data
We'll connect PAR data from the tower top to the flux data.
pr.top <- pr$PARPAR_30min[which(pr$PARPAR_30min$verticalPosition==
max(pr$PARPAR_30min$verticalPosition)),]
As noted above, loadByProduct()
automatically converts time stamps
to a recognized date-time format when it reads the data. However, the
field names for the time stamps differ between the flux data and the
other meteorological data: the start of the averaging interval is
timeBgn
in the flux data and startDateTime
in the PAR data.
Let's create a new variable in the PAR data:
pr.top$timeBgn <- pr.top$startDateTime
And now use the matching time stamp fields to merge the flux and PAR data.
fx.pr <- merge(pr.top, flux$NIWO, by="timeBgn")
And now we can plot net carbon exchange as a function of light availability:
plot(fx.pr$data.fluxCo2.nsae.flux~fx.pr$PARMean,
pch=".", ylim=c(-20,20),
xlab="PAR", ylab="CO2 flux")
If you're interested in data in the eddy covariance bundle besides the net flux data, the rest of this tutorial will guide you through how to get those data out of the bundle.
5. Vertical profile data (Level 3)
The Level 3 (dp03
) data are the spatially interpolated profiles of
the rates of change of CO2, H2O, and temperature.
Extract the Level 3 data from the HDF5 file using stackEddy()
with
the same syntax as for the Level 4 data.
prof <- stackEddy(filepath="~/Downloads/filesToStack00200/",
level="dp03")
As with the Level 4 data, the result is a named list with data tables for each site.
head(prof$NIWO)
## timeBgn timeEnd data.co2Stor.rateRtioMoleDryCo2.X0.1.m data.co2Stor.rateRtioMoleDryCo2.X0.2.m
## 1 2018-06-01 2018-06-01 00:29:59 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X0.3.m data.co2Stor.rateRtioMoleDryCo2.X0.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X0.5.m data.co2Stor.rateRtioMoleDryCo2.X0.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X0.7.m data.co2Stor.rateRtioMoleDryCo2.X0.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X0.9.m data.co2Stor.rateRtioMoleDryCo2.X1.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X1.1.m data.co2Stor.rateRtioMoleDryCo2.X1.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X1.3.m data.co2Stor.rateRtioMoleDryCo2.X1.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X1.5.m data.co2Stor.rateRtioMoleDryCo2.X1.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X1.7.m data.co2Stor.rateRtioMoleDryCo2.X1.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X1.9.m data.co2Stor.rateRtioMoleDryCo2.X2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X2.1.m data.co2Stor.rateRtioMoleDryCo2.X2.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X2.3.m data.co2Stor.rateRtioMoleDryCo2.X2.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X2.5.m data.co2Stor.rateRtioMoleDryCo2.X2.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X2.7.m data.co2Stor.rateRtioMoleDryCo2.X2.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X2.9.m data.co2Stor.rateRtioMoleDryCo2.X3.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X3.1.m data.co2Stor.rateRtioMoleDryCo2.X3.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X3.3.m data.co2Stor.rateRtioMoleDryCo2.X3.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X3.5.m data.co2Stor.rateRtioMoleDryCo2.X3.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X3.7.m data.co2Stor.rateRtioMoleDryCo2.X3.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X3.9.m data.co2Stor.rateRtioMoleDryCo2.X4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X4.1.m data.co2Stor.rateRtioMoleDryCo2.X4.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X4.3.m data.co2Stor.rateRtioMoleDryCo2.X4.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X4.5.m data.co2Stor.rateRtioMoleDryCo2.X4.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X4.7.m data.co2Stor.rateRtioMoleDryCo2.X4.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X4.9.m data.co2Stor.rateRtioMoleDryCo2.X5.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X5.1.m data.co2Stor.rateRtioMoleDryCo2.X5.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X5.3.m data.co2Stor.rateRtioMoleDryCo2.X5.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X5.5.m data.co2Stor.rateRtioMoleDryCo2.X5.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X5.7.m data.co2Stor.rateRtioMoleDryCo2.X5.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X5.9.m data.co2Stor.rateRtioMoleDryCo2.X6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X6.1.m data.co2Stor.rateRtioMoleDryCo2.X6.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X6.3.m data.co2Stor.rateRtioMoleDryCo2.X6.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X6.5.m data.co2Stor.rateRtioMoleDryCo2.X6.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X6.7.m data.co2Stor.rateRtioMoleDryCo2.X6.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X6.9.m data.co2Stor.rateRtioMoleDryCo2.X7.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X7.1.m data.co2Stor.rateRtioMoleDryCo2.X7.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X7.3.m data.co2Stor.rateRtioMoleDryCo2.X7.4.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X7.5.m data.co2Stor.rateRtioMoleDryCo2.X7.6.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X7.7.m data.co2Stor.rateRtioMoleDryCo2.X7.8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X7.9.m data.co2Stor.rateRtioMoleDryCo2.X8.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X8.1.m data.co2Stor.rateRtioMoleDryCo2.X8.2.m
## 1 -0.0002681938 -0.0002681938
## data.co2Stor.rateRtioMoleDryCo2.X8.3.m data.co2Stor.rateRtioMoleDryCo2.X8.4.m
## 1 -0.0002681938 -0.0002681938
## data.h2oStor.rateRtioMoleDryH2o.X0.1.m data.h2oStor.rateRtioMoleDryH2o.X0.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X0.3.m data.h2oStor.rateRtioMoleDryH2o.X0.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X0.5.m data.h2oStor.rateRtioMoleDryH2o.X0.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X0.7.m data.h2oStor.rateRtioMoleDryH2o.X0.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X0.9.m data.h2oStor.rateRtioMoleDryH2o.X1.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X1.1.m data.h2oStor.rateRtioMoleDryH2o.X1.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X1.3.m data.h2oStor.rateRtioMoleDryH2o.X1.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X1.5.m data.h2oStor.rateRtioMoleDryH2o.X1.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X1.7.m data.h2oStor.rateRtioMoleDryH2o.X1.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X1.9.m data.h2oStor.rateRtioMoleDryH2o.X2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X2.1.m data.h2oStor.rateRtioMoleDryH2o.X2.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X2.3.m data.h2oStor.rateRtioMoleDryH2o.X2.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X2.5.m data.h2oStor.rateRtioMoleDryH2o.X2.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X2.7.m data.h2oStor.rateRtioMoleDryH2o.X2.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X2.9.m data.h2oStor.rateRtioMoleDryH2o.X3.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X3.1.m data.h2oStor.rateRtioMoleDryH2o.X3.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X3.3.m data.h2oStor.rateRtioMoleDryH2o.X3.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X3.5.m data.h2oStor.rateRtioMoleDryH2o.X3.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X3.7.m data.h2oStor.rateRtioMoleDryH2o.X3.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X3.9.m data.h2oStor.rateRtioMoleDryH2o.X4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X4.1.m data.h2oStor.rateRtioMoleDryH2o.X4.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X4.3.m data.h2oStor.rateRtioMoleDryH2o.X4.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X4.5.m data.h2oStor.rateRtioMoleDryH2o.X4.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X4.7.m data.h2oStor.rateRtioMoleDryH2o.X4.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X4.9.m data.h2oStor.rateRtioMoleDryH2o.X5.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X5.1.m data.h2oStor.rateRtioMoleDryH2o.X5.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X5.3.m data.h2oStor.rateRtioMoleDryH2o.X5.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X5.5.m data.h2oStor.rateRtioMoleDryH2o.X5.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X5.7.m data.h2oStor.rateRtioMoleDryH2o.X5.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X5.9.m data.h2oStor.rateRtioMoleDryH2o.X6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X6.1.m data.h2oStor.rateRtioMoleDryH2o.X6.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X6.3.m data.h2oStor.rateRtioMoleDryH2o.X6.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X6.5.m data.h2oStor.rateRtioMoleDryH2o.X6.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X6.7.m data.h2oStor.rateRtioMoleDryH2o.X6.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X6.9.m data.h2oStor.rateRtioMoleDryH2o.X7.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X7.1.m data.h2oStor.rateRtioMoleDryH2o.X7.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X7.3.m data.h2oStor.rateRtioMoleDryH2o.X7.4.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X7.5.m data.h2oStor.rateRtioMoleDryH2o.X7.6.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X7.7.m data.h2oStor.rateRtioMoleDryH2o.X7.8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X7.9.m data.h2oStor.rateRtioMoleDryH2o.X8.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X8.1.m data.h2oStor.rateRtioMoleDryH2o.X8.2.m
## 1 0.000315911 0.000315911
## data.h2oStor.rateRtioMoleDryH2o.X8.3.m data.h2oStor.rateRtioMoleDryH2o.X8.4.m data.tempStor.rateTemp.X0.1.m
## 1 0.000315911 0.000315911 -0.0001014444
## data.tempStor.rateTemp.X0.2.m data.tempStor.rateTemp.X0.3.m data.tempStor.rateTemp.X0.4.m
## 1 -0.0001014444 -0.0001014444 -0.0001014444
## data.tempStor.rateTemp.X0.5.m data.tempStor.rateTemp.X0.6.m data.tempStor.rateTemp.X0.7.m
## 1 -0.0001014444 -0.0001050874 -0.000111159
## data.tempStor.rateTemp.X0.8.m data.tempStor.rateTemp.X0.9.m data.tempStor.rateTemp.X1.m
## 1 -0.0001172305 -0.0001233021 -0.0001293737
## data.tempStor.rateTemp.X1.1.m data.tempStor.rateTemp.X1.2.m data.tempStor.rateTemp.X1.3.m
## 1 -0.0001354453 -0.0001415168 -0.0001475884
## data.tempStor.rateTemp.X1.4.m data.tempStor.rateTemp.X1.5.m data.tempStor.rateTemp.X1.6.m
## 1 -0.00015366 -0.0001597315 -0.0001658031
## data.tempStor.rateTemp.X1.7.m data.tempStor.rateTemp.X1.8.m data.tempStor.rateTemp.X1.9.m
## 1 -0.0001718747 -0.0001779463 -0.0001840178
## data.tempStor.rateTemp.X2.m data.tempStor.rateTemp.X2.1.m data.tempStor.rateTemp.X2.2.m
## 1 -0.000185739 -0.0001869767 -0.0001882144
## data.tempStor.rateTemp.X2.3.m data.tempStor.rateTemp.X2.4.m data.tempStor.rateTemp.X2.5.m
## 1 -0.0001894521 -0.0001906899 -0.0001919276
## data.tempStor.rateTemp.X2.6.m data.tempStor.rateTemp.X2.7.m data.tempStor.rateTemp.X2.8.m
## 1 -0.0001931653 -0.0001944031 -0.0001956408
## data.tempStor.rateTemp.X2.9.m data.tempStor.rateTemp.X3.m data.tempStor.rateTemp.X3.1.m
## 1 -0.0001968785 -0.0001981162 -0.000199354
## data.tempStor.rateTemp.X3.2.m data.tempStor.rateTemp.X3.3.m data.tempStor.rateTemp.X3.4.m
## 1 -0.0002005917 -0.0002018294 -0.0002030672
## data.tempStor.rateTemp.X3.5.m data.tempStor.rateTemp.X3.6.m data.tempStor.rateTemp.X3.7.m
## 1 -0.0002043049 -0.0002055426 -0.0002067803
## data.tempStor.rateTemp.X3.8.m data.tempStor.rateTemp.X3.9.m data.tempStor.rateTemp.X4.m
## 1 -0.0002080181 -0.0002092558 -0.0002104935
## data.tempStor.rateTemp.X4.1.m data.tempStor.rateTemp.X4.2.m data.tempStor.rateTemp.X4.3.m
## 1 -0.0002117313 -0.000212969 -0.0002142067
## data.tempStor.rateTemp.X4.4.m data.tempStor.rateTemp.X4.5.m data.tempStor.rateTemp.X4.6.m
## 1 -0.0002154444 -0.0002172161 -0.0002189878
## data.tempStor.rateTemp.X4.7.m data.tempStor.rateTemp.X4.8.m data.tempStor.rateTemp.X4.9.m
## 1 -0.0002207595 -0.0002225312 -0.0002243029
## data.tempStor.rateTemp.X5.m data.tempStor.rateTemp.X5.1.m data.tempStor.rateTemp.X5.2.m
## 1 -0.0002260746 -0.0002278463 -0.000229618
## data.tempStor.rateTemp.X5.3.m data.tempStor.rateTemp.X5.4.m data.tempStor.rateTemp.X5.5.m
## 1 -0.0002313896 -0.0002331613 -0.000234933
## data.tempStor.rateTemp.X5.6.m data.tempStor.rateTemp.X5.7.m data.tempStor.rateTemp.X5.8.m
## 1 -0.0002367047 -0.0002384764 -0.0002402481
## data.tempStor.rateTemp.X5.9.m data.tempStor.rateTemp.X6.m data.tempStor.rateTemp.X6.1.m
## 1 -0.0002420198 -0.0002437915 -0.0002455631
## data.tempStor.rateTemp.X6.2.m data.tempStor.rateTemp.X6.3.m data.tempStor.rateTemp.X6.4.m
## 1 -0.0002473348 -0.0002491065 -0.0002508782
## data.tempStor.rateTemp.X6.5.m data.tempStor.rateTemp.X6.6.m data.tempStor.rateTemp.X6.7.m
## 1 -0.0002526499 -0.0002544216 -0.0002561933
## data.tempStor.rateTemp.X6.8.m data.tempStor.rateTemp.X6.9.m data.tempStor.rateTemp.X7.m
## 1 -0.000257965 -0.0002597367 -0.0002615083
## data.tempStor.rateTemp.X7.1.m data.tempStor.rateTemp.X7.2.m data.tempStor.rateTemp.X7.3.m
## 1 -0.00026328 -0.0002650517 -0.0002668234
## data.tempStor.rateTemp.X7.4.m data.tempStor.rateTemp.X7.5.m data.tempStor.rateTemp.X7.6.m
## 1 -0.0002685951 -0.0002703668 -0.0002721385
## data.tempStor.rateTemp.X7.7.m data.tempStor.rateTemp.X7.8.m data.tempStor.rateTemp.X7.9.m
## 1 -0.0002739102 -0.0002756819 -0.0002774535
## data.tempStor.rateTemp.X8.m data.tempStor.rateTemp.X8.1.m data.tempStor.rateTemp.X8.2.m
## 1 -0.0002792252 -0.0002809969 -0.0002827686
## data.tempStor.rateTemp.X8.3.m data.tempStor.rateTemp.X8.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.1.m
## 1 -0.0002845403 -0.000286312 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X0.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X0.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X0.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X0.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X1.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X1.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X1.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X1.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X1.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X2.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X2.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X2.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X2.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X2.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X3.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X3.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X3.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X3.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X3.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X4.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X4.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X4.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X4.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X4.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X5.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X5.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X5.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X5.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X5.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X6.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X6.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X6.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X6.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X6.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X7.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X7.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X7.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.5.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X7.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.7.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X7.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.9.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X8.m qfqm.co2Stor.rateRtioMoleDryCo2.X8.1.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X8.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X8.3.m
## 1 1 1
## qfqm.co2Stor.rateRtioMoleDryCo2.X8.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X0.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X0.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X0.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X0.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X1.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X1.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X1.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X1.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X1.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X2.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X2.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X2.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X2.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X2.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X3.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X3.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X3.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X3.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X3.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X4.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X4.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X4.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X4.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X4.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X5.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X5.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X5.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X5.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X5.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X6.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X6.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X6.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X6.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X6.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X7.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X7.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X7.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.5.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X7.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.7.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X7.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.9.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X8.m qfqm.h2oStor.rateRtioMoleDryH2o.X8.1.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X8.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X8.3.m
## 1 1 1
## qfqm.h2oStor.rateRtioMoleDryH2o.X8.4.m qfqm.tempStor.rateTemp.X0.1.m qfqm.tempStor.rateTemp.X0.2.m
## 1 1 0 0
## qfqm.tempStor.rateTemp.X0.3.m qfqm.tempStor.rateTemp.X0.4.m qfqm.tempStor.rateTemp.X0.5.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X0.6.m qfqm.tempStor.rateTemp.X0.7.m qfqm.tempStor.rateTemp.X0.8.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X0.9.m qfqm.tempStor.rateTemp.X1.m qfqm.tempStor.rateTemp.X1.1.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X1.2.m qfqm.tempStor.rateTemp.X1.3.m qfqm.tempStor.rateTemp.X1.4.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X1.5.m qfqm.tempStor.rateTemp.X1.6.m qfqm.tempStor.rateTemp.X1.7.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X1.8.m qfqm.tempStor.rateTemp.X1.9.m qfqm.tempStor.rateTemp.X2.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X2.1.m qfqm.tempStor.rateTemp.X2.2.m qfqm.tempStor.rateTemp.X2.3.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X2.4.m qfqm.tempStor.rateTemp.X2.5.m qfqm.tempStor.rateTemp.X2.6.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X2.7.m qfqm.tempStor.rateTemp.X2.8.m qfqm.tempStor.rateTemp.X2.9.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X3.m qfqm.tempStor.rateTemp.X3.1.m qfqm.tempStor.rateTemp.X3.2.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X3.3.m qfqm.tempStor.rateTemp.X3.4.m qfqm.tempStor.rateTemp.X3.5.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X3.6.m qfqm.tempStor.rateTemp.X3.7.m qfqm.tempStor.rateTemp.X3.8.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X3.9.m qfqm.tempStor.rateTemp.X4.m qfqm.tempStor.rateTemp.X4.1.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X4.2.m qfqm.tempStor.rateTemp.X4.3.m qfqm.tempStor.rateTemp.X4.4.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X4.5.m qfqm.tempStor.rateTemp.X4.6.m qfqm.tempStor.rateTemp.X4.7.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X4.8.m qfqm.tempStor.rateTemp.X4.9.m qfqm.tempStor.rateTemp.X5.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X5.1.m qfqm.tempStor.rateTemp.X5.2.m qfqm.tempStor.rateTemp.X5.3.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X5.4.m qfqm.tempStor.rateTemp.X5.5.m qfqm.tempStor.rateTemp.X5.6.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X5.7.m qfqm.tempStor.rateTemp.X5.8.m qfqm.tempStor.rateTemp.X5.9.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X6.m qfqm.tempStor.rateTemp.X6.1.m qfqm.tempStor.rateTemp.X6.2.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X6.3.m qfqm.tempStor.rateTemp.X6.4.m qfqm.tempStor.rateTemp.X6.5.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X6.6.m qfqm.tempStor.rateTemp.X6.7.m qfqm.tempStor.rateTemp.X6.8.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X6.9.m qfqm.tempStor.rateTemp.X7.m qfqm.tempStor.rateTemp.X7.1.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X7.2.m qfqm.tempStor.rateTemp.X7.3.m qfqm.tempStor.rateTemp.X7.4.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X7.5.m qfqm.tempStor.rateTemp.X7.6.m qfqm.tempStor.rateTemp.X7.7.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X7.8.m qfqm.tempStor.rateTemp.X7.9.m qfqm.tempStor.rateTemp.X8.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X8.1.m qfqm.tempStor.rateTemp.X8.2.m qfqm.tempStor.rateTemp.X8.3.m
## 1 0 0 0
## qfqm.tempStor.rateTemp.X8.4.m
## 1 0
## [ reached 'max' / getOption("max.print") -- omitted 5 rows ]
6. Un-interpolated vertical profile data (Level 2)
The Level 2 data are interpolated in time but not in space. They contain the rates of change at each of the measurement heights.
Again, they can be extracted from the HDF5 files using stackEddy()
with the same syntax:
prof.l2 <- stackEddy(filepath="~/Downloads/filesToStack00200/",
level="dp02")
head(prof.l2$HARV)
## verticalPosition timeBgn timeEnd data.co2Stor.rateRtioMoleDryCo2.mean
## 1 010 2018-06-01 00:00:00 2018-06-01 00:29:59 NaN
## 2 010 2018-06-01 00:30:00 2018-06-01 00:59:59 0.002666576
## 3 010 2018-06-01 01:00:00 2018-06-01 01:29:59 -0.011224223
## 4 010 2018-06-01 01:30:00 2018-06-01 01:59:59 0.006133056
## 5 010 2018-06-01 02:00:00 2018-06-01 02:29:59 -0.019554655
## 6 010 2018-06-01 02:30:00 2018-06-01 02:59:59 -0.007855632
## data.h2oStor.rateRtioMoleDryH2o.mean data.tempStor.rateTemp.mean qfqm.co2Stor.rateRtioMoleDryCo2.qfFinl
## 1 NaN 2.583333e-05 1
## 2 NaN -2.008056e-04 1
## 3 NaN -1.901111e-04 1
## 4 NaN -7.419444e-05 1
## 5 NaN -1.537083e-04 1
## 6 NaN -1.874861e-04 1
## qfqm.h2oStor.rateRtioMoleDryH2o.qfFinl qfqm.tempStor.rateTemp.qfFinl
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 0
## 5 1 0
## 6 1 0
Note that here, as in the PAR data, there is a verticalPosition
field.
It has the same meaning as in the PAR data, indicating the tower level of
the measurement.
7. Calibrated raw data (Level 1)
Level 1 (dp01
) data are calibrated, and aggregated in time, but
otherwise untransformed. Use Level 1 data for raw gas
concentrations and atmospheric stable isotopes.
Using stackEddy()
to extract Level 1 data requires additional
inputs. The Level 1 files are too large to simply pull out all the
variables by default, and they include multiple averaging intervals,
which can't be merged. So two additional inputs are needed:
-
avg
: The averaging interval to extract -
var
: One or more variables to extract
What variables are available, at what averaging intervals? Another
function in the neonUtilities
package, getVarsEddy()
, returns
a list of HDF5 file contents. It requires only one input, a filepath
to a single NEON HDF5 file:
vars <- getVarsEddy("~/Downloads/filesToStack00200/NEON.D01.HARV.DP4.00200.001.nsae.2018-07.basic.20201020T201317Z.h5")
head(vars)
## site level category system hor ver tmi name otype dclass dim oth
## 5 HARV dp01 data amrs 000 060 01m angNedXaxs H5I_DATASET COMPOUND 43200 <NA>
## 6 HARV dp01 data amrs 000 060 01m angNedYaxs H5I_DATASET COMPOUND 43200 <NA>
## 7 HARV dp01 data amrs 000 060 01m angNedZaxs H5I_DATASET COMPOUND 43200 <NA>
## 9 HARV dp01 data amrs 000 060 30m angNedXaxs H5I_DATASET COMPOUND 1440 <NA>
## 10 HARV dp01 data amrs 000 060 30m angNedYaxs H5I_DATASET COMPOUND 1440 <NA>
## 11 HARV dp01 data amrs 000 060 30m angNedZaxs H5I_DATASET COMPOUND 1440 <NA>
Inputs to var
can be any values from the name
field in the table
returned by getVarsEddy()
. Let's take a look at CO2 and
H2O, 13C in CO2 and 18O in
H2O, at 30-minute aggregation. Let's look at Harvard Forest
for these data, since deeper canopies generally have more interesting
profiles:
iso <- stackEddy(filepath="~/Downloads/filesToStack00200/",
level="dp01", var=c("rtioMoleDryCo2","rtioMoleDryH2o",
"dlta13CCo2","dlta18OH2o"), avg=30)
head(iso$HARV)
## verticalPosition timeBgn timeEnd data.co2Stor.rtioMoleDryCo2.mean
## 1 010 2018-06-01 00:00:00 2018-06-01 00:29:59 509.3375
## 2 010 2018-06-01 00:30:00 2018-06-01 00:59:59 502.2736
## 3 010 2018-06-01 01:00:00 2018-06-01 01:29:59 521.6139
## 4 010 2018-06-01 01:30:00 2018-06-01 01:59:59 469.6317
## 5 010 2018-06-01 02:00:00 2018-06-01 02:29:59 484.7725
## 6 010 2018-06-01 02:30:00 2018-06-01 02:59:59 476.8554
## data.co2Stor.rtioMoleDryCo2.min data.co2Stor.rtioMoleDryCo2.max data.co2Stor.rtioMoleDryCo2.vari
## 1 451.4786 579.3518 845.0795
## 2 463.5470 533.6622 161.3652
## 3 442.8649 563.0518 547.9924
## 4 432.6588 508.7463 396.8379
## 5 436.2842 537.4641 662.9449
## 6 443.7055 515.6598 246.6969
## data.co2Stor.rtioMoleDryCo2.numSamp data.co2Turb.rtioMoleDryCo2.mean data.co2Turb.rtioMoleDryCo2.min
## 1 235 NA NA
## 2 175 NA NA
## 3 235 NA NA
## 4 175 NA NA
## 5 235 NA NA
## 6 175 NA NA
## data.co2Turb.rtioMoleDryCo2.max data.co2Turb.rtioMoleDryCo2.vari data.co2Turb.rtioMoleDryCo2.numSamp
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## data.h2oStor.rtioMoleDryH2o.mean data.h2oStor.rtioMoleDryH2o.min data.h2oStor.rtioMoleDryH2o.max
## 1 NaN NaN NaN
## 2 NaN NaN NaN
## 3 NaN NaN NaN
## 4 NaN NaN NaN
## 5 NaN NaN NaN
## 6 NaN NaN NaN
## data.h2oStor.rtioMoleDryH2o.vari data.h2oStor.rtioMoleDryH2o.numSamp data.h2oTurb.rtioMoleDryH2o.mean
## 1 NA 0 NA
## 2 NA 0 NA
## 3 NA 0 NA
## 4 NA 0 NA
## 5 NA 0 NA
## 6 NA 0 NA
## data.h2oTurb.rtioMoleDryH2o.min data.h2oTurb.rtioMoleDryH2o.max data.h2oTurb.rtioMoleDryH2o.vari
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## data.h2oTurb.rtioMoleDryH2o.numSamp data.isoCo2.dlta13CCo2.mean data.isoCo2.dlta13CCo2.min
## 1 NA NaN NaN
## 2 NA -11.40646 -14.992
## 3 NA NaN NaN
## 4 NA -10.69318 -14.065
## 5 NA NaN NaN
## 6 NA -11.02814 -13.280
## data.isoCo2.dlta13CCo2.max data.isoCo2.dlta13CCo2.vari data.isoCo2.dlta13CCo2.numSamp
## 1 NaN NA 0
## 2 -8.022 1.9624355 305
## 3 NaN NA 0
## 4 -7.385 1.5766385 304
## 5 NaN NA 0
## 6 -7.966 0.9929341 308
## data.isoCo2.rtioMoleDryCo2.mean data.isoCo2.rtioMoleDryCo2.min data.isoCo2.rtioMoleDryCo2.max
## 1 NaN NaN NaN
## 2 458.3546 415.875 531.066
## 3 NaN NaN NaN
## 4 439.9582 415.777 475.736
## 5 NaN NaN NaN
## 6 446.5563 420.845 468.312
## data.isoCo2.rtioMoleDryCo2.vari data.isoCo2.rtioMoleDryCo2.numSamp data.isoCo2.rtioMoleDryH2o.mean
## 1 NA 0 NaN
## 2 953.2212 306 22.11830
## 3 NA 0 NaN
## 4 404.0365 306 22.38925
## 5 NA 0 NaN
## 6 138.7560 309 22.15731
## data.isoCo2.rtioMoleDryH2o.min data.isoCo2.rtioMoleDryH2o.max data.isoCo2.rtioMoleDryH2o.vari
## 1 NaN NaN NA
## 2 21.85753 22.34854 0.01746926
## 3 NaN NaN NA
## 4 22.09775 22.59945 0.02626762
## 5 NaN NaN NA
## 6 22.06641 22.26493 0.00277579
## data.isoCo2.rtioMoleDryH2o.numSamp data.isoH2o.dlta18OH2o.mean data.isoH2o.dlta18OH2o.min
## 1 0 NaN NaN
## 2 85 -12.24437 -12.901
## 3 0 NaN NaN
## 4 84 -12.04580 -12.787
## 5 0 NaN NaN
## 6 80 -11.81500 -12.375
## data.isoH2o.dlta18OH2o.max data.isoH2o.dlta18OH2o.vari data.isoH2o.dlta18OH2o.numSamp
## 1 NaN NA 0
## 2 -11.569 0.03557313 540
## 3 NaN NA 0
## 4 -11.542 0.03970481 539
## 5 NaN NA 0
## 6 -11.282 0.03498614 540
## data.isoH2o.rtioMoleDryH2o.mean data.isoH2o.rtioMoleDryH2o.min data.isoH2o.rtioMoleDryH2o.max
## 1 NaN NaN NaN
## 2 20.89354 20.36980 21.13160
## 3 NaN NaN NaN
## 4 21.12872 20.74663 21.33272
## 5 NaN NaN NaN
## 6 20.93480 20.63463 21.00702
## data.isoH2o.rtioMoleDryH2o.vari data.isoH2o.rtioMoleDryH2o.numSamp qfqm.co2Stor.rtioMoleDryCo2.qfFinl
## 1 NA 0 1
## 2 0.025376207 540 1
## 3 NA 0 1
## 4 0.017612293 540 1
## 5 NA 0 1
## 6 0.003805751 540 1
## qfqm.co2Turb.rtioMoleDryCo2.qfFinl qfqm.h2oStor.rtioMoleDryH2o.qfFinl qfqm.h2oTurb.rtioMoleDryH2o.qfFinl
## 1 NA 1 NA
## 2 NA 1 NA
## 3 NA 1 NA
## 4 NA 1 NA
## 5 NA 1 NA
## 6 NA 1 NA
## qfqm.isoCo2.dlta13CCo2.qfFinl qfqm.isoCo2.rtioMoleDryCo2.qfFinl qfqm.isoCo2.rtioMoleDryH2o.qfFinl
## 1 1 1 1
## 2 0 0 0
## 3 1 1 1
## 4 0 0 0
## 5 1 1 1
## 6 0 0 0
## qfqm.isoH2o.dlta18OH2o.qfFinl qfqm.isoH2o.rtioMoleDryH2o.qfFinl ucrt.co2Stor.rtioMoleDryCo2.mean
## 1 1 1 10.0248527
## 2 0 0 1.1077243
## 3 1 1 7.5181428
## 4 0 0 8.4017805
## 5 1 1 0.9465824
## 6 0 0 1.3629090
## ucrt.co2Stor.rtioMoleDryCo2.vari ucrt.co2Stor.rtioMoleDryCo2.se ucrt.co2Turb.rtioMoleDryCo2.mean
## 1 170.28091 1.8963340 NA
## 2 34.29589 0.9602536 NA
## 3 151.35746 1.5270503 NA
## 4 93.41077 1.5058703 NA
## 5 14.02753 1.6795958 NA
## 6 8.50861 1.1873064 NA
## ucrt.co2Turb.rtioMoleDryCo2.vari ucrt.co2Turb.rtioMoleDryCo2.se ucrt.h2oStor.rtioMoleDryH2o.mean
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## ucrt.h2oStor.rtioMoleDryH2o.vari ucrt.h2oStor.rtioMoleDryH2o.se ucrt.h2oTurb.rtioMoleDryH2o.mean
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## ucrt.h2oTurb.rtioMoleDryH2o.vari ucrt.h2oTurb.rtioMoleDryH2o.se ucrt.isoCo2.dlta13CCo2.mean
## 1 NA NA NaN
## 2 NA NA 0.5812574
## 3 NA NA NaN
## 4 NA NA 0.3653442
## 5 NA NA NaN
## 6 NA NA 0.2428672
## ucrt.isoCo2.dlta13CCo2.vari ucrt.isoCo2.dlta13CCo2.se ucrt.isoCo2.rtioMoleDryCo2.mean
## 1 NaN NA NaN
## 2 0.6827844 0.08021356 16.931819
## 3 NaN NA NaN
## 4 0.3761155 0.07201605 10.078698
## 5 NaN NA NaN
## 6 0.1544487 0.05677862 7.140787
## ucrt.isoCo2.rtioMoleDryCo2.vari ucrt.isoCo2.rtioMoleDryCo2.se ucrt.isoCo2.rtioMoleDryH2o.mean
## 1 NaN NA NaN
## 2 614.01630 1.764965 0.08848440
## 3 NaN NA NaN
## 4 196.99445 1.149078 0.08917388
## 5 NaN NA NaN
## 6 55.90843 0.670111 NA
## ucrt.isoCo2.rtioMoleDryH2o.vari ucrt.isoCo2.rtioMoleDryH2o.se ucrt.isoH2o.dlta18OH2o.mean
## 1 NaN NA NaN
## 2 0.01226428 0.014335993 0.02544454
## 3 NaN NA NaN
## 4 0.01542679 0.017683602 0.01373503
## 5 NaN NA NaN
## 6 NA 0.005890447 0.01932110
## ucrt.isoH2o.dlta18OH2o.vari ucrt.isoH2o.dlta18OH2o.se ucrt.isoH2o.rtioMoleDryH2o.mean
## 1 NaN NA NaN
## 2 0.003017400 0.008116413 0.06937514
## 3 NaN NA NaN
## 4 0.002704220 0.008582764 0.08489408
## 5 NaN NA NaN
## 6 0.002095066 0.008049170 0.02813808
## ucrt.isoH2o.rtioMoleDryH2o.vari ucrt.isoH2o.rtioMoleDryH2o.se
## 1 NaN NA
## 2 0.009640249 0.006855142
## 3 NaN NA
## 4 0.008572288 0.005710986
## 5 NaN NA
## 6 0.002551672 0.002654748
Let's plot vertical profiles of CO2 and 13C in CO2 on a single day.
Here we'll use the time stamps in a different way, using grep()
to select
all of the records for a single day. And discard the verticalPosition
values that are string values - those are the calibration gases.
iso.d <- iso$HARV[grep("2018-06-25", iso$HARV$timeBgn, fixed=T),]
iso.d <- iso.d[-which(is.na(as.numeric(iso.d$verticalPosition))),]
ggplot
is well suited to these types of data, let's use it to plot
the profiles. If you don't have the package yet, use install.packages()
to install it first.
library(ggplot2)
Now we can plot CO2 relative to height on the tower, with separate lines for each time interval.
g <- ggplot(iso.d, aes(y=verticalPosition)) +
geom_path(aes(x=data.co2Stor.rtioMoleDryCo2.mean,
group=timeBgn, col=timeBgn)) +
theme(legend.position="none") +
xlab("CO2") + ylab("Tower level")
g
And the same plot for 13C in CO2:
g <- ggplot(iso.d, aes(y=verticalPosition)) +
geom_path(aes(x=data.isoCo2.dlta13CCo2.mean,
group=timeBgn, col=timeBgn)) +
theme(legend.position="none") +
xlab("d13C") + ylab("Tower level")
g
The legends are omitted for space, see if you can use the concentration and isotope ratio buildup and drawdown below the canopy to work out the times of day the different colors represent.