Skip to main content
NSF NEON | Open Data to Understand our Ecosystems logo
Sign In

Main navigation

  • About Us
    • Overview
      • Spatial and Temporal Design
      • History
    • Management
    • Advisory Groups
      • Advisory Committee: STEAC
      • Technical Working Groups (TWGs)
    • FAQ
    • Contact Us
      • Field Offices
    • User Accounts
    • Staff

    About Us

  • Data & Samples
    • Data Portal
      • Explore Data Products
      • Data Availability Charts
      • Spatial Data & Maps
      • Document Library
      • API & GraphQL
      • Prototype Data
      • External Lab Data Ingest (restricted)
    • Samples & Specimens
      • Discover and Use NEON Samples
        • Sample Types
        • Sample Repositories
        • Sample Explorer
        • Megapit and Distributed Initial Characterization Soil Archives
        • Excess Samples
      • Sample Processing
      • Sample Quality
      • Taxonomic Lists
    • Collection Methods
      • Protocols & Standardized Methods
      • AIrborne Remote Sensing
        • Flight Box Design
        • Flight Schedules and Coverage
        • Daily Flight Reports
        • Camera
        • Imaging Spectrometer
        • Lidar
      • Automated Instruments
        • Site Level Sampling Design
        • Sensor Collection Frequency
        • Instrumented Collection Types
          • Meteorology
          • Phenocams
          • Soil Sensors
          • Ground Water
          • Surface Water
      • Observational Sampling
        • Site Level Sampling Design
        • Sampling Schedules
        • Observation Types
          • Aquatic Organisms
            • Aquatic Microbes
            • Fish
            • Macroinvertebrates & Zooplankton
            • Periphyton, Phytoplankton, and Aquatic Plants
          • Terrestrial Organisms
            • Birds
            • Ground Beetles
            • Mosquitoes
            • Small Mammals
            • Soil Microbes
            • Terrestrial Plants
            • Ticks
          • Hydrology & Geomorphology
            • Discharge
            • Geomorphology
          • Biogeochemistry
          • DNA Sequences
          • Pathogens
          • Sediments
          • Soils
            • Soil Descriptions
    • Data Notifications
    • Data Guidelines and Policies
      • Acknowledging and Citing NEON
      • Publishing Research Outputs
      • Usage Policies
    • Data Management
      • Data Availability
      • Data Formats and Conventions
      • Data Processing
      • Data Quality
      • Data Product Revisions and Releases
        • Release 2021
        • Release 2022
      • Externally Hosted Data

    Data & Samples

  • Field Sites
    • About Field Sites and Domains
    • Explore Field Sites
    • Site Management Data Product

    Field Sites

  • Impact
    • Observatory Blog
    • Case Studies
    • Spotlights
    • Papers & Publications
    • Newsroom
      • NEON in the News
      • Newsletter Archive

    Impact

  • Resources
    • Getting Started with NEON Data & Resources
    • Documents and Communication Resources
      • Papers & Publications
      • Document Library
      • Outreach Materials
    • Code Hub
      • Code Resources Guidelines
      • Code Resources Submission
      • NEON's GitHub Organization Homepage
    • Learning Hub
      • Science Videos
      • Tutorials
      • Workshops & Courses
      • Teaching Modules
      • Faculty Mentoring Networks
      • Data Education Fellows
    • Research Support and Assignable Assets
      • Field Site Coordination
      • Letters of Support
      • Mobile Deployment Platforms
      • Permits and Permissions
      • AOP Flight Campaigns
      • Excess Samples
      • Assignable Assets FAQs
    • Funding Opportunities

    Resources

  • Get Involved
    • Advisory Groups
    • Upcoming Events
    • Past Events
    • NEON Ambassador Program
    • Collaborative Works
      • EFI-NEON Ecological Forecasting Challenge
      • NCAR-NEON-Community Collaborations
    • Community Engagement
    • Work Opportunities
      • Careers
      • Seasonal Fieldwork
      • Postdoctoral Fellows
      • Internships
        • Intern Alumni
    • Partners

    Get Involved

  • My Account
  • Search

Search

Learning Hub

  • Science Videos
  • Tutorials
  • Workshops & Courses
  • Teaching Modules
  • Faculty Mentoring Networks
  • Data Education Fellows

Breadcrumb

  1. Resources
  2. Learning Hub
  3. Tutorials
  4. Get Started with NEON Data: A Series of Data Tutorials

Series

Get Started with NEON Data: A Series of Data Tutorials

This Data Tutorial Series is designed to provide you with an introduction to how to access and use NEON data. It includes both foundational skills for working with NEON data, and tutorials that focus on specific data types, which you can choose from based on your interests.  

Foundational Skills and Tools to Access NEON Data

  • Start with a short video guide to downloading data from the NEON Data Portal.
  • The Download and Explore NEON Data tutorial will guide you through using the neonUtilities package in R to transform NEON data, and how to use the metadata that accompany data downloads to help you understand the data. 
  • If you prefer to use Python, the Using neonUtilities in Python tutorial provides instructions for using the rpy2 package to run neonUtilities in a Python environment.
  • Using an API token can make your downloads faster, and helps NEON by linking your user account to your downloads. See more information about API tokens here, and learn how to use a token with neonUtilities in this tutorial.
  • Learn how to work with NEON location data, using examples from vegetation structure observations and soil temperature sensors.  

Introductions to Working with Different Data Types

  • Explore the intersection of sensor and observational data with the Plant Phenology & Temperature tutorial series (individual tutorials that make up the series are listed in the sidebar). This is also a good introduction for inexperienced R users.
  • Get familiar with NEON sensor data flagging and data quality metrics, using aquatic instrument data as exemplar datasets. 
  • Calculate biodiversity metrics from NEON aquatic macroinvertebrate data.
  • For a quick introduction to working with remote sensing data, calculate a canopy height model from discrete return Lidar. NEON has an extensive catalog of tutorials about remote sensing principles and data; search the tutorials and tutorial series if you are interested in other topics.
  • Connecting ground observations to remote sensing imagery is important to many NEON users; get familiar with the process, as well as some of the challenges of comparing these data sources, by comparing tree height observations to a canopy height model. 
  • Use the neonUtilities package to wrangle NEON surface-atmosphere exchange data (published in HDF5 format). 

Download and Explore NEON Data

Authors: Claire K. Lunch

Last Updated: Aug 26, 2021

This tutorial covers downloading NEON data, using the Data Portal and the neonUtilities R package, as well as basic instruction in beginning to explore and work with the downloaded data, including guidance in navigating data documentation.

NEON data

There are 3 basic categories of NEON data:

  1. Remote sensing (AOP) - Data collected by the airborne observation platform, e.g. LIDAR, surface reflectance
  2. Observational (OS) - Data collected by a human in the field, or in an analytical laboratory, e.g. beetle identification, foliar isotopes
  3. Instrumentation (IS) - Data collected by an automated, streaming sensor, e.g. net radiation, soil carbon dioxide. This category also includes the eddy covariance (EC) data, which are processed and structured in a unique way, distinct from other instrumentation data (see Tutorial for EC data for details).

This lesson covers all three types of data. The download procedures are similar for all types, but data navigation differs significantly by type.

Objectives

After completing this activity, you will be able to:

  • Download NEON data using the neonUtilities package.
  • Understand downloaded data sets and load them into R for analyses.

Things You’ll Need To Complete This Tutorial

To complete this tutorial you will need the most current version of R and, preferably, RStudio loaded on your computer.

Install R Packages

  • neonUtilities: Basic functions for accessing NEON data
  • raster: Raster package; needed for remote sensing data

Both of these packages can be installed from CRAN:

install.packages("neonUtilities")
install.packages("raster")

Additional Resources

  • Tutorial for neonUtilities. Some overlap with this tutorial but goes into more detail about the neonUtilities package.
  • Tutorial for using neonUtilities from a Python environment.
  • GitHub repository for neonUtilities
  • neonUtilities cheat sheet. A quick reference guide for users.

Getting started: Download data from the Portal and load packages

Go to the NEON Data Portal and download some data! Almost any IS or OS data product can be used for this section of the tutorial, but we will proceed assuming you've downloaded Photosynthetically Active Radiation (PAR) (DP1.00024.001) data. For optimal results, download three months of data from one site. The downloaded file should be a zip file named NEON_par.zip. For this tutorial, we will be using PAR data from the Wind River Experimental Forest (WREF) in Washington state from September-November 2019.

Now switch over to R and load all the packages installed above.

# load packages
library(neonUtilities)
library(raster)

# Set global option to NOT convert all character variables to factors
options(stringsAsFactors=F)

Stack the downloaded data files: stackByTable()

The stackByTable() function will unzip and join the files in the downloaded zip file.

# Modify the file path to match the path to your zip file
stackByTable("~/Downloads/NEON_par.zip")

In the same directory as the zipped file, you should now have an unzipped folder of the same name. When you open this you will see a new folder called stackedFiles, which should contain five files: PARPAR_30min.csv, PARPAR_1min.csv, sensor_positions.csv, variables.csv, and readme.txt.

We'll look at these files in more detail below.

Download files and load directly to R: loadByProduct()

In the section above, we downloaded a .zip file from the data portal to our downloads folder, then used the stackByTable() function to transform those data into a usable format. However, there is a faster way to load data directly into the R Global Environment using loadByProduct().

The most popular function in neonUtilities is loadByProduct(). This function downloads data from the NEON API, merges the site-by-month files, and loads the resulting data tables into the R environment, assigning each data type to the appropriate R class. This is a popular choice because it ensures you're always working with the latest data, and it ends with ready-to-use tables in R. However, if you use it in a workflow you run repeatedly, keep in mind it will re-download the data every time.

loadByProduct() works on most observational (OS) and sensor (IS) data, but not on surface-atmosphere exchange (SAE) data, remote sensing (AOP) data, and some of the data tables in the microbial data products. For functions that download AOP data, see the byFileAOP() and byTileAOP() sections in this tutorial. For functions that work with SAE data, see the NEON eddy flux data tutorial.

The inputs to loadByProduct() control which data to download and how to manage the processing:

  • dpID: the data product ID, e.g. DP1.00002.001
  • site: defaults to "all", meaning all sites with available data; can be a vector of 4-letter NEON site codes, e.g. c("HARV","CPER","ABBY").
  • startdate and enddate: defaults to NA, meaning all dates with available data; or a date in the form YYYY-MM, e.g. 2017-06. Since NEON data are provided in month packages, finer scale querying is not available. Both start and end date are inclusive.
  • package: either basic or expanded data package. Expanded data packages generally include additional information about data quality, such as chemical standards and quality flags. Not every data product has an expanded package; if the expanded package is requested but there isn't one, the basic package will be downloaded.
  • avg: defaults to "all", to download all data; or the number of minutes in the averaging interval. See example below; only applicable to IS data.
  • savepath: the file path you want to download to; defaults to the working directory.
  • check.size: T or F: should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set it to F.
  • nCores: Number of cores to use for parallel processing. Defaults to 1, i.e. no parallelization.
  • forceParallel: If the data volume to be processed does not meet minimum requirements to run in parallel, this overrides.

The dpID is the data product identifier of the data you want to download. The DPID can be found on the Explore Data Products page. It will be in the form DP#.#####.###

Here, we'll download aquatic plant chemistry data from three lake sites: Prairie Lake (PRLA), Suggs Lake (SUGG), and Toolik Lake (TOOK).

apchem <- loadByProduct(dpID="DP1.20063.001", 
                  site=c("PRLA","SUGG","TOOK"), 
                  package="expanded", check.size=T)

The object returned by loadByProduct() is a named list of data frames. To work with each of them, select them from the list using the $ operator.

names(apchem)
View(apchem$apl_plantExternalLabDataPerSample)

If you prefer to extract each table from the list and work with it as an independent object, you can use the list2env() function:

list2env(apchem, .GlobalEnv)

## <environment: R_GlobalEnv>

If you want to be able to close R and come back to these data without re-downloading, you'll want to save the tables locally. We recommend also saving the variables file, both so you'll have it to refer to, and so you can use it with readTableNEON() (see below).

write.csv(apl_clipHarvest, 
          "~/Downloads/apl_clipHarvest.csv", 
          row.names=F)
write.csv(apl_biomass, 
          "~/Downloads/apl_biomass.csv", 
          row.names=F)
write.csv(apl_plantExternalLabDataPerSample, 
          "~/Downloads/apl_plantExternalLabDataPerSample.csv", 
          row.names=F)
write.csv(variables_20063, 
          "~/Downloads/variables_20063.csv", 
          row.names=F)

But, if you want to save files locally and load them into R (or another platform) each time you run a script, instead of downloading from the API every time, you may prefer to use zipsByProduct() and stackByTable() instead of loadByProduct(), as we did in the first section above. Details can be found in our neonUtilities tutorial. You can also try out the community-developed neonstore package, which is designed for maintaining a local store of the NEON data you use.

Download remote sensing data: byFileAOP() and byTileAOP()

Remote sensing data files are very large, so downloading them can take a long time. byFileAOP() and byTileAOP() enable easier programmatic downloads, but be aware it can take a very long time to download large amounts of data.

Input options for the AOP functions are:

  • dpID: the data product ID, e.g. DP1.00002.001
  • site: the 4-letter code of a single site, e.g. HARV
  • year: the 4-digit year to download
  • savepath: the file path you want to download to; defaults to the working directory
  • check.size: T or F: should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set it to F.
  • easting: byTileAOP() only. Vector of easting UTM coordinates whose corresponding tiles you want to download
  • northing: byTileAOP() only. Vector of northing UTM coordinates whose corresponding tiles you want to download
  • buffer: byTileAOP() only. Size in meters of buffer to include around coordinates when deciding which tiles to download

Here, we'll download one tile of Ecosystem structure (Canopy Height Model) (DP3.30015.001) from WREF in 2017.

byTileAOP("DP3.30015.001", site="WREF", year="2017", check.size = T,
          easting=580000, northing=5075000, savepath="~/Downloads")

In the directory indicated in savepath, you should now have a folder named DP3.30015.001 with several nested subfolders, leading to a tif file of a canopy height model tile. We'll look at this in more detail below.

Navigate data downloads: IS

Let's take a look at the PAR data we downloaded earlier. We'll read in the 30-minute file using the function readTableNEON(), which uses the variables.csv file to assign data types to each column of data:

par30 <- readTableNEON(
  dataFile="~/Downloads/NEON_par/stackedFiles/PARPAR_30min.csv", 
  varFile="~/Downloads/NEON_par/stackedFiles/variables_00024.csv")
View(par30)

The first four columns are added by stackByTable() when it merges files across sites, months, and tower heights. The final column, publicationDate, is the date-time stamp indicating when the data were published. This can be used as an indicator for whether data have been updated since the last time you downloaded them.

The remaining columns are described by the variables file:

parvar <- read.csv("~/Downloads/NEON_par/stackedFiles/variables_00024.csv")
View(parvar)

The variables file shows you the definition and units for each column of data.

Now that we know what we're looking at, let's plot PAR from the top tower level:

plot(PARMean~startDateTime, 
     data=par30[which(par30$verticalPosition=="080"),],
     type="l")

Looks good! The sun comes up and goes down every day, and some days are cloudy. If you want to dig in a little deeper, try plotting PAR from lower tower levels on the same axes to see light attenuation through the canopy.

Navigate data downloads: OS

Let's take a look at the aquatic plant data. OS data products are simple in that the data generally tabular, and data volumes are lower than the other NEON data types, but they are complex in that almost all consist of multiple tables containing information collected at different times in different ways. For example, samples collected in the field may be shipped to a laboratory for analysis. Data associated with the field collection will appear in one data table, and the analytical results will appear in another. Complexity in working with OS data usually involves bringing data together from multiple measurements or scales of analysis.

As with the IS data, the variables file can tell you more about the data. OS data also come with a validation file, which contains information about the validation and controlled data entry that were applied to the data:

View(variables_20063)

View(validation_20063)

OS data products each come with a Data Product User Guide, which can be downloaded with the data, or accessed from the document library on the Data Portal, or the Product Details page for the data product. The User Guide is designed to give a basic introduction to the data product, including a brief summary of the protocol and descriptions of data format and structure.

To get started with the aquatic plant chemistry data, let's take a look at carbon isotope ratios in plants across the three sites we downloaded. The chemical analytes are reported in the apl_plantExternalLabDataPerSample table, and the table is in long format, with one record per sample per analyte, so we'll subset to only the carbon isotope analyte:

boxplot(analyteConcentration~siteID, 
        data=apl_plantExternalLabDataPerSample, 
        subset=analyte=="d13C",
        xlab="Site", ylab="d13C")

We see plants at Suggs and Toolik are quite low in 13C, with more spread at Toolik than Suggs, and plants at Prairie Lake are relatively enriched. Clearly the next question is what species these data represent. But taxonomic data aren't present in the apl_plantExternalLabDataPerSample table, they're in the apl_biomass table. We'll need to join the two tables to get chemistry by taxon.

The Data Relationships section of the User Guide can help you determine which fields to use as the key to join the tables. Here, sampleID is the joining variable. We'll also include the basic spatial variables, to avoid creating unnecessary duplicates of those columns.

apct <- merge(apl_biomass, 
              apl_plantExternalLabDataPerSample, 
              by=c("sampleID","namedLocation",
                   "domainID","siteID"))

Using the merged data, now we can plot carbon isotope ratio for each taxon.

boxplot(analyteConcentration~scientificName, 
        data=apct, subset=analyte=="d13C", 
        xlab=NA, ylab="d13C", 
        las=2, cex.axis=0.7)

And now we can see most of the sampled plants have carbon isotope ratios around -30, with just two species accounting for most of the more enriched samples.

Navigate data downloads: AOP

To work with AOP data, the best bet is the raster package. It has functionality for most analyses you might want to do.

We'll use it to read in the tile we downloaded:

chm <- raster("~/Downloads/DP3.30015.001/2017/FullSite/D16/2017_WREF_1/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D16_WREF_DP3_580000_5075000_CHM.tif")

The raster package includes plotting functions:

plot(chm, col=topo.colors(6))

Now we can see canopy height across the downloaded tile; the tallest trees are over 60 meters, not surprising in the Pacific Northwest. There is a clearing or clear cut in the lower right corner.

Get Lesson Code

NEON-download-explore.R

Using neonUtilities in Python

Authors: Claire K. Lunch

Last Updated: Apr 5, 2022

The instructions below will guide you through using the neonUtilities R package in Python, via the rpy2 package. rpy2 creates an R environment you can interact with from Python.

The assumption in this tutorial is that you want to work with NEON data in Python, but you want to use the handy download and merge functions provided by the neonUtilities R package to access and format the data for analysis. If you want to do your analyses in R, use one of the R-based tutorials linked below.

For more information about the neonUtilities package, and instructions for running it in R directly, see the Download and Explore tutorial and/or the neonUtilities tutorial.

Install and set up

Before starting, you will need:

  1. Python 3 installed. It is probably possible to use this workflow in Python 2, but these instructions were developed and tested using 3.7.4.
  2. R installed. You don't need to have ever used it directly. We wrote this tutorial using R 4.1.1, but most other recent versions should also work.
  3. rpy2 installed. Run the line below from the command line, it won't run within Jupyter. See Python documentation for more information on how to install packages. rpy2 often has install problems on Windows, see "Windows Users" section below if you are running Windows.
  4. You may need to install pip before installing rpy2, if you don't have it installed already.

From the command line, run:

pip install rpy2

Windows users

The rpy2 package was built for Mac, and doesn't always work smoothly on Windows. If you have trouble with the install, try these steps.

  1. Add C:\Program Files\R\R-3.3.1\bin\x64 to the Windows Environment Variable “Path”
  2. Install rpy2 manually from https://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2
    1. Pick the correct version. At the download page the portion of the files with cp## relate to the Python version. e.g., rpy2 2.9.2 cp36 cp36m win_amd64.whl is the correct download when 2.9.2 is the latest version of rpy2 and you are running Python 36 and 64 bit Windows (amd64).
    2. Save the whl file, navigate to it in windows then run pip directly on the file as follows “pip install rpy2 2.9.2 cp36 cp36m win_amd64.whl”
  3. Add an R_HOME Windows environment variable with the path C:\Program Files\R\R-3.4.3 (or whichever version you are running)
  4. Add an R_USER Windows environment variable with the path C:\Users\yourUserName\AppData\Local\Continuum\Anaconda3\Lib\site-packages\rpy2

Additional troubleshooting

If you're still having trouble getting R to communicate with Python, you can try pointing Python directly to your R installation path.

  1. Run R.home() in R.
  2. Run import os in Python.
  3. Run os.environ['R_HOME'] = '/Library/Frameworks/R.framework/Resources' in Python, substituting the file path you found in step 1.

Load packages

Now import rpy2 into your session.

import rpy2
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr

Load the base R functionality, using the rpy2 function importr().

base = importr('base')
utils = importr('utils')
stats = importr('stats')

The basic syntax for running R code via rpy2 is package.function(inputs), where package is the R package in use, function is the name of the function within the R package, and inputs are the inputs to the function. In other words, it's very similar to running code in R as package::function(inputs). For example:

stats.rnorm(6, 0, 1)

FloatVector with 6 elements.

<td>
-0.938409
</td>

<td>
0.189041
</td>

<td>
-0.169062
</td>

<td>
0.976939
</td>

<td>
-0.862790
</td>

<td>
0.648383
</td>

Suppress R warnings. This step can be skipped, but will result in messages getting passed through from R that Python will interpret as warnings.

from rpy2.rinterface_lib.callbacks import logger as rpy2_logger
import logging
rpy2_logger.setLevel(logging.ERROR)

Install the neonUtilities R package. Here I've specified the RStudio CRAN mirror as the source, but you can use a different one if you prefer.

You only need to do this step once to use the package, but we update the neonUtilities package every few months, so reinstalling periodically is recommended.

This installation step carries out the same steps in the same places on your hard drive that it would if run in R directly, so if you use R regularly and have already installed neonUtilities on your machine, you can skip this step. And be aware, this also means if you install other packages, or new versions of packages, via rpy2, they'll be updated the next time you use R, too.

The semicolon at the end of the line (here, and in some other function calls below) can be omitted. It suppresses a note indicating the output of the function is null. The output is null because these functions download or modify files on your local drive, but none of the data are read into the Python or R environments.

utils.install_packages('neonUtilities', repos='https://cran.rstudio.com/');
The downloaded binary packages are in
	/var/folders/_k/gbjn452j1h3fk7880d5ppkx1_9xf6m/T//Rtmpdy9fY1/downloaded_packages

Now load the neonUtilities package. This does need to be run every time you use the code; if you're familiar with R, importr() is roughly equivalent to the library() function in R.

neonUtilities = importr('neonUtilities')

Join data files: stackByTable()

The function stackByTable() in neonUtilities merges the monthly, site-level files the NEON Data Portal provides. Start by downloading the dataset you're interested in from the Portal. Here, we'll assume you've downloaded IR Biological Temperature. It will download as a single zip file named NEON_temp-bio.zip. Note the file path it's saved to and proceed.

Run the stackByTable() function to stack the data. It requires only one input, the path to the zip file you downloaded from the NEON Data Portal. Modify the file path in the code below to match the path on your machine.

For additional, optional inputs to stackByTable(), see the R tutorial for neonUtilities.

neonUtilities.stackByTable(filepath='/Users/Shared/NEON_temp-bio.zip');
Stacking operation across a single core.
Stacking table IRBT_1_minute
Stacking table IRBT_30_minute
Merged the most recent publication of sensor position files for each site and saved to /stackedFiles
Copied the most recent publication of variable definition file to /stackedFiles
Finished: Stacked 2 data tables and 3 metadata tables!
Stacking took 1.585054 secs
All unzipped monthly data folders have been removed.

Check the folder containing the original zip file from the Data Portal; you should now have a subfolder containing the unzipped and stacked files called stackedFiles. To import these data to Python, skip ahead to the "Read downloaded and stacked files into Python" section; to learn how to use neonUtilities to download data, proceed to the next section.

Download files to be stacked: zipsByProduct()

The function zipsByProduct() uses the NEON API to programmatically download data files for a given product. The files downloaded by zipsByProduct() can then be fed into stackByTable().

Run the downloader with these inputs: a data product ID (DPID), a set of 4-letter site IDs (or "all" for all sites), a download package (either basic or expanded), the filepath to download the data to, and an indicator to check the size of your download before proceeding or not (TRUE/FALSE).

The DPID is the data product identifier, and can be found in the data product box on the NEON Explore Data page. Here we'll download Breeding landbird point counts, DP1.10003.001.

There are two differences relative to running zipsByProduct() in R directly:

  1. check.size becomes check_size, because dots have programmatic meaning in Python
  2. TRUE (or T) becomes 'TRUE' because the values TRUE and FALSE don't have special meaning in Python the way they do in R, so it interprets them as variables if they're unquoted.

check_size='TRUE' does not work correctly in the Python environment. It estimates the size of the download and asks you to confirm before proceeding, and this interactive display doesn't work correctly outside R. Set check_size='FALSE' to avoid this problem, but be thoughtful about the size of your query since it will proceed to download without checking.

neonUtilities.zipsByProduct(dpID='DP1.10003.001', 
                            site=base.c('HARV','BART'), 
                            savepath='/Users/Shared',
                            package='basic', 
                            check_size='FALSE');
Finding available files
  |======================================================================| 100%

Downloading files totaling approximately 3.718684 MB
Downloading 16 files
  |======================================================================| 100%
16 files successfully downloaded to /Users/Shared/filesToStack10003

The message output by zipsByProduct() indicates the file path where the files have been downloaded.

Now take that file path and pass it to stackByTable().

neonUtilities.stackByTable(filepath='/Users/Shared/filesToStack10003');
Unpacking zip files using 1 cores.
Stacking operation across a single core.
Stacking table brd_countdata
Stacking table brd_perpoint
Copied the most recent publication of validation file to /stackedFiles
Copied the most recent publication of categoricalCodes file to /stackedFiles
Copied the most recent publication of variable definition file to /stackedFiles
Finished: Stacked 2 data tables and 4 metadata tables!
Stacking took 0.3076231 secs
All unzipped monthly data folders have been removed.

Read downloaded and stacked files into Python

We've downloaded biological temperature and bird data, and merged the site by month files. Now let's read those data into Python so you can proceed with analyses.

First let's take a look at what's in the output folders.

import os
os.listdir('/Users/Shared/filesToStack10003/stackedFiles/')
['categoricalCodes_10003.csv',
 'issueLog_10003.csv',
 'brd_countdata.csv',
 'brd_perpoint.csv',
 'readme_10003.txt',
 'variables_10003.csv',
 'validation_10003.csv']
os.listdir('/Users/Shared/NEON_temp-bio/stackedFiles/')
['IRBT_1_minute.csv',
 'sensor_positions_00005.csv',
 'issueLog_00005.csv',
 'IRBT_30_minute.csv',
 'variables_00005.csv',
 'readme_00005.txt']

Each data product folder contains a set of data files and metadata files. Here, we'll read in the data files and take a look at the contents; for more details about the contents of NEON data files and how to interpret them, see the Download and Explore tutorial.

There are a variety of modules and methods for reading tabular data into Python; here we'll use the pandas module, but feel free to use your own preferred method.

First, let's read in the two data tables in the bird data: brd_countdata and brd_perpoint.

import pandas
brd_perpoint = pandas.read_csv('/Users/Shared/filesToStack10003/stackedFiles/brd_perpoint.csv')
brd_countdata = pandas.read_csv('/Users/Shared/filesToStack10003/stackedFiles/brd_countdata.csv')

And take a look at the contents of each file. For descriptions and unit of each column, see the variables_10003 file.

brd_perpoint
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
uid namedLocation domainID siteID plotID plotType pointID nlcdClass decimalLatitude decimalLongitude ... endRH observedHabitat observedAirTemp kmPerHourObservedWindSpeed laboratoryName samplingProtocolVersion remarks measuredBy publicationDate release
0 32ab1419-b087-47e1-829d-b1a67a223a01 BART_025.birdGrid.brd D01 BART BART_025 distributed C1 evergreenForest 44.060146 -71.315479 ... 56.0 evergreen forest 18.0 1.0 Bird Conservancy of the Rockies NEON.DOC.014041vG NaN JRUEB 20211222T013942Z RELEASE-2022
1 f02e2458-caab-44d8-a21a-b3b210b71006 BART_025.birdGrid.brd D01 BART BART_025 distributed B1 evergreenForest 44.060146 -71.315479 ... 56.0 deciduous forest 19.0 3.0 Bird Conservancy of the Rockies NEON.DOC.014041vG NaN JRUEB 20211222T013942Z RELEASE-2022
2 58ccefb8-7904-4aa6-8447-d6f6590ccdae BART_025.birdGrid.brd D01 BART BART_025 distributed A1 evergreenForest 44.060146 -71.315479 ... 56.0 mixed deciduous/evergreen forest 17.0 0.0 Bird Conservancy of the Rockies NEON.DOC.014041vG NaN JRUEB 20211222T013942Z RELEASE-2022
3 1b14ead4-03fc-4d47-bd00-2f6e31cfe971 BART_025.birdGrid.brd D01 BART BART_025 distributed A2 evergreenForest 44.060146 -71.315479 ... 56.0 deciduous forest 19.0 0.0 Bird Conservancy of the Rockies NEON.DOC.014041vG NaN JRUEB 20211222T013942Z RELEASE-2022
4 3055a0a5-57ae-4e56-9415-eeb7704fab02 BART_025.birdGrid.brd D01 BART BART_025 distributed B2 evergreenForest 44.060146 -71.315479 ... 56.0 deciduous forest 16.0 0.0 Bird Conservancy of the Rockies NEON.DOC.014041vG NaN JRUEB 20211222T013942Z RELEASE-2022
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1234 3400dfdf-54f1-4921-a3b0-61f03c6db3e9 HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A1 deciduousForest 42.401149 -72.253238 ... 43.0 other 16.0 10.0 Bird Conservancy of the Rockies NEON.DOC.014041vK The RH would not stay still today, kept swingi... JGLAG 20211222T011332Z PROVISIONAL
1235 b43b199c-51b6-4222-b575-7564315e47bb HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A2 deciduousForest 42.401149 -72.253238 ... 43.0 deciduous forest 15.0 4.0 Bird Conservancy of the Rockies NEON.DOC.014041vK The RH would not stay still today, kept swingi... JGLAG 20211222T011332Z PROVISIONAL
1236 a7040ad5-d253-47b7-964d-2711dafa42c4 HARV_006.birdGrid.brd D01 HARV HARV_006 distributed B2 deciduousForest 42.401149 -72.253238 ... 43.0 deciduous forest 16.0 1.0 Bird Conservancy of the Rockies NEON.DOC.014041vK The RH would not stay still today, kept swingi... JGLAG 20211222T011332Z PROVISIONAL
1237 97a3c2dc-d8b0-436f-af62-00c88167b60e HARV_006.birdGrid.brd D01 HARV HARV_006 distributed B3 deciduousForest 42.401149 -72.253238 ... 43.0 deciduous forest 17.0 1.0 Bird Conservancy of the Rockies NEON.DOC.014041vK The RH would not stay still today, kept swingi... JGLAG 20211222T011332Z PROVISIONAL
1238 b8a27ff5-3aa3-432a-858e-c8d31324ab2e HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A3 deciduousForest 42.401149 -72.253238 ... 43.0 deciduous forest 18.0 1.0 Bird Conservancy of the Rockies NEON.DOC.014041vK The RH would not stay still today, kept swingi... JGLAG 20211222T011332Z PROVISIONAL

1239 rows × 31 columns

brd_countdata
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
uid namedLocation domainID siteID plotID plotType pointID startDate eventID pointCountMinute ... vernacularName observerDistance detectionMethod visualConfirmation sexOrAge clusterSize clusterCode identifiedBy publicationDate release
0 4e22256f-5e86-4a2c-99be-dd1c7da7af28 BART_025.birdGrid.brd D01 BART BART_025 distributed C1 2015-06-14T09:23Z BART_025.C1.2015-06-14 1 ... Black-capped Chickadee 42.0 singing No Male 1.0 NaN JRUEB 20211222T013942Z RELEASE-2022
1 93106c0d-06d8-4816-9892-15c99de03c91 BART_025.birdGrid.brd D01 BART BART_025 distributed C1 2015-06-14T09:23Z BART_025.C1.2015-06-14 1 ... Red-eyed Vireo 9.0 singing No Male 1.0 NaN JRUEB 20211222T013942Z RELEASE-2022
2 5eb23904-9ae9-45bf-af27-a4fa1efd4e8a BART_025.birdGrid.brd D01 BART BART_025 distributed C1 2015-06-14T09:23Z BART_025.C1.2015-06-14 2 ... Black-and-white Warbler 17.0 singing No Male 1.0 NaN JRUEB 20211222T013942Z RELEASE-2022
3 99592c6c-4cf7-4de8-9502-b321e925684d BART_025.birdGrid.brd D01 BART BART_025 distributed C1 2015-06-14T09:23Z BART_025.C1.2015-06-14 2 ... Black-throated Green Warbler 50.0 singing No Male 1.0 NaN JRUEB 20211222T013942Z RELEASE-2022
4 6c07d9fb-8813-452b-8182-3bc5e139d920 BART_025.birdGrid.brd D01 BART BART_025 distributed C1 2015-06-14T09:23Z BART_025.C1.2015-06-14 1 ... Black-throated Green Warbler 12.0 singing No Male 1.0 NaN JRUEB 20211222T013942Z RELEASE-2022
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13579 87c9dae4-ee30-4673-b669-5ca8acdc7bd7 HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A3 2021-06-16T13:08Z HARV_006.A3.2021-06-16 1 ... Eastern Towhee 13.0 calling No Unknown 1.0 NaN JGLAG 20211222T011332Z PROVISIONAL
13580 1a65553a-6189-4c74-a1e3-2ada0f1d9f63 HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A3 2021-06-16T13:08Z HARV_006.A3.2021-06-16 4 ... NaN 20.0 visual No Unknown 1.0 NaN JGLAG 20211222T011332Z PROVISIONAL
13581 e33deb1c-e79d-41dc-8fc1-8e984b9d0450 HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A3 2021-06-16T13:08Z HARV_006.A3.2021-06-16 1 ... Eastern Towhee 48.0 calling No Unknown 1.0 NaN JGLAG 20211222T011332Z PROVISIONAL
13582 070ec577-9aec-4d05-91df-86124d383697 HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A3 2021-06-16T13:08Z HARV_006.A3.2021-06-16 1 ... Eastern Towhee 61.0 singing No Unknown 1.0 NaN JGLAG 20211222T011332Z PROVISIONAL
13583 7a3be1a1-03c3-49e7-a486-343708c3b271 HARV_006.birdGrid.brd D01 HARV HARV_006 distributed A3 2021-06-16T13:08Z HARV_006.A3.2021-06-16 2 ... Veery 64.0 calling No Unknown 1.0 NaN JGLAG 20211222T011332Z PROVISIONAL

13584 rows × 24 columns

And now let's do the same with the 30-minute data table for biological temperature.

IRBT30 = pandas.read_csv('/Users/Shared/NEON_temp-bio/stackedFiles/IRBT_30_minute.csv')
IRBT30
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
domainID siteID horizontalPosition verticalPosition startDateTime endDateTime bioTempMean bioTempMinimum bioTempMaximum bioTempVariance bioTempNumPts bioTempExpUncert bioTempStdErMean finalQF publicationDate release
0 D18 BARR 0 10 2021-09-01T00:00:00Z 2021-09-01T00:30:00Z 7.82 7.43 8.39 0.03 1800.0 0.60 0.00 0 20211219T025212Z PROVISIONAL
1 D18 BARR 0 10 2021-09-01T00:30:00Z 2021-09-01T01:00:00Z 7.47 7.16 7.75 0.01 1800.0 0.60 0.00 0 20211219T025212Z PROVISIONAL
2 D18 BARR 0 10 2021-09-01T01:00:00Z 2021-09-01T01:30:00Z 7.43 6.89 8.11 0.07 1800.0 0.60 0.01 0 20211219T025212Z PROVISIONAL
3 D18 BARR 0 10 2021-09-01T01:30:00Z 2021-09-01T02:00:00Z 7.36 6.78 8.15 0.06 1800.0 0.60 0.01 0 20211219T025212Z PROVISIONAL
4 D18 BARR 0 10 2021-09-01T02:00:00Z 2021-09-01T02:30:00Z 6.91 6.50 7.27 0.03 1800.0 0.60 0.00 0 20211219T025212Z PROVISIONAL
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13099 D18 BARR 3 0 2021-11-30T21:30:00Z 2021-11-30T22:00:00Z -14.62 -14.78 -14.46 0.00 1800.0 0.57 0.00 0 20211206T221914Z PROVISIONAL
13100 D18 BARR 3 0 2021-11-30T22:00:00Z 2021-11-30T22:30:00Z -14.59 -14.72 -14.50 0.00 1800.0 0.57 0.00 0 20211206T221914Z PROVISIONAL
13101 D18 BARR 3 0 2021-11-30T22:30:00Z 2021-11-30T23:00:00Z -14.56 -14.65 -14.45 0.00 1800.0 0.57 0.00 0 20211206T221914Z PROVISIONAL
13102 D18 BARR 3 0 2021-11-30T23:00:00Z 2021-11-30T23:30:00Z -14.50 -14.60 -14.39 0.00 1800.0 0.57 0.00 0 20211206T221914Z PROVISIONAL
13103 D18 BARR 3 0 2021-11-30T23:30:00Z 2021-12-01T00:00:00Z -14.45 -14.57 -14.32 0.00 1800.0 0.57 0.00 0 20211206T221914Z PROVISIONAL

13104 rows × 16 columns

Download remote sensing files: byFileAOP()

The function byFileAOP() uses the NEON API to programmatically download data files for remote sensing (AOP) data products. These files cannot be stacked by stackByTable() because they are not tabular data. The function simply creates a folder in your working directory and writes the files there. It preserves the folder structure for the subproducts.

The inputs to byFileAOP() are a data product ID, a site, a year, a filepath to save to, and an indicator to check the size of the download before proceeding, or not. As above, set check_size="FALSE" when working in Python. Be especially cautious about download size when downloading AOP data, since the files are very large.

Here, we'll download Ecosystem structure (Canopy Height Model) data from Hopbrook (HOPB) in 2017.

neonUtilities.byFileAOP(dpID='DP3.30015.001', site='HOPB',
                        year='2017', check_size='FALSE',
                       savepath='/Users/Shared');
Downloading files totaling approximately 147.930656 MB 
Downloading 217 files
  |======================================================================| 100%
Successfully downloaded  217  files.

Let's read one tile of data into Python and view it. We'll use the rasterio and matplotlib modules here, but as with tabular data, there are other options available.

import rasterio
CHMtile = rasterio.open('/Users/Shared/DP3.30015.001/neon-aop-products/2017/FullSite/D01/2017_HOPB_2/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D01_HOPB_DP3_718000_4709000_CHM.tif')
import matplotlib.pyplot as plt
from rasterio.plot import show
fig, ax = plt.subplots(figsize = (8,3))
show(CHMtile)
<Figure size 800x300 with 1 Axes>





<matplotlib.axes._subplots.AxesSubplot at 0x7fa16298fd50>
fig

Canopy Height Model at Hopbrook in 2017


Get Lesson Code

neonUtilitiesPython.py

Using an API Token when Accessing NEON Data with neonUtilities

Authors: [Claire K. Lunch]

Last Updated: Nov 23, 2020

NEON data can be downloaded from either the NEON Data Portal or the NEON API. When downloading from the Data Portal, you can create a user account. Read about the benefits of an account on the User Account page. You can also use your account to create a token for using the API. Your token is unique to your account, so don't share it.

Using a token is optional! You can download data without a token, and without a user account. Using a token when downloading data via the API, including when using the neonUtilities package, links your downloads to your user account, as well as enabling faster download speeds. For more information about token usage and benefits, see the NEON API documentation page.

For now, in addition to faster downloads, using a token helps NEON to track data downloads. Using anonymized user information, we can then calculate data access statistics, such as which data products are downloaded most frequently, which data products are downloaded in groups by the same users, and how many users in total are downloading data. This information helps NEON to evaluate the growth and reach of the observatory, and to advocate for training activities, workshops, and software development.

Tokens can be used whenever you use the NEON API. In this tutorial, we'll focus on using tokens with the neonUtilities R package.

Objectives

After completing this activity, you will be able to:

  • Create a NEON API token
  • Use your token when downloading data with neonUtilities

Things You’ll Need To Complete This Tutorial

You will need a version of R (3.4.1 or higher) and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • neonUtilities: install.packages("neonUtilities")

Additional Resources

  • NEON Data Portal
  • NEONScience GitHub Organization
  • neonUtilities tutorial

If you've never downloaded NEON data using the neonUtilities package before, we recommend starting with the Download and Explore tutorial before proceeding with this tutorial.

In the next sections, we'll get an API token from the NEON Data Portal, and then use it in neonUtilities when downloading data.

Get a NEON API Token

The first step is create a NEON user account, if you don't have one. Follow the instructions on the Data Portal User Accounts page. If you do already have an account, go to the NEON Data Portal, sign in, and go to your My Account profile page.

Once you have an account, you can create an API token for yourself. At the bottom of the My Account page, you should see this bar:

Account page on NEON Data Portal showing Get API Token button.

Click the 'GET API TOKEN' button. After a moment, you should see this:

Account page on NEON Data Portal showing API token has been created.

Click on the Copy button to copy your API token to the clipboard:

Account page on NEON Data Portal showing API token with Copy button highlighted

Use API token in neonUtilities

In the next section, we'll walk through saving your token somewhere secure but accessible to your code. But first let's try out using the token the easy way.

First, we need to load the neonUtilities package and set the working directory:

# install neonUtilities - can skip if already installed, but
# API tokens are only enabled in neonUtilities v1.3.4 and higher
# if your version number is lower, re-install
install.packages("neonUtilities")

# load neonUtilities
library(neonUtilities)

# set working directory
wd <- "~/data" # this will depend on your local machine
setwd(wd)

NEON API tokens are very long, so it would be annoying to keep pasting the entire text string into functions. Assign your token an object name:

NEON_TOKEN <- "PASTE YOUR TOKEN HERE"

Now we'll use the loadByProduct() function to download data. Your API token is entered as the optional token input parameter. For this example, we'll download Plant foliar traits (DP1.10026.001).

foliar <- loadByProduct(dpID="DP1.10026.001", site="all", 
                        package="expanded", check.size=F,
                        token=NEON_TOKEN)

You should now have data saved in the foliar object; the API silently used your token. If you've downloaded data without a token before, you may notice this is faster!

This format applies to all neonUtilities functions that involve downloading data or otherwise accessing the API; you can use the token input with all of them. For example, when downloading remote sensing data:

chm <- byTileAOP(dpID="DP3.30015.001", site="WREF", 
                 year=2017, check.size=F,
                 easting=c(571000,578000), 
                 northing=c(5079000,5080000), 
                 savepath=wd,
                 token=NEON_TOKEN)

Token management for open code

Your API token is unique to your account, so don't share it!

If you're writing code that will be shared with colleagues or available publicly, such as in a GitHub repository or supplemental materials of a published paper, you can't include the line of code above where we assigned your token to NEON_TOKEN, since your token is fully visible in the code there. Instead, you'll need to save your token locally on your computer, and pull it into your code without displaying it. There are a few ways to do this, we'll show two options here.

  • Option 1: Save the token in a local file, and source() that file at the start of every script. This is fairly simple but requires a line of code in every script.

  • Option 2: Add the token to a .Renviron file to create an environment variable that gets loaded when you open R. This is a little harder to set up initially, but once it's done, it's done globally, and it will work in every script you run.

Option 1: Save token in a local file

Open a new, empty R script (.R). Put a single line of code in the script:

NEON_TOKEN <- "PASTE YOUR TOKEN HERE"

Save this file in a logical place on your machine, somewhere that won't be visible publicly. Here, let's call the file neon_token_source.R, and save it to the working directory. Then, at the start of every script where you're going to use the NEON API, you would run this line of code:

source(paste0(wd, "/neon_token_source.R"))

Then you'll be able to use token=NEON_TOKEN when you run neonUtilities functions, and you can share your code without accidentally sharing your token.

Option 2: Save token to the R environment

To create a persistent environment variable, we use a .Renviron file. Before creating a file, check which directory R is using as your home directory:

# For Windows:
Sys.getenv("R_USER")

# For Mac/Linux:
Sys.getenv("HOME")

Check the home directory to see if you already have a .Renviron file, using the file browse pane in RStudio, or using another file browse method with hidden files shown. Files that begin with . are hidden by default, but RStudio recognizes files that begin with .R and displays them.

File browse pane in RStudio showing .Renviron file. Screenshot of file browse pane with .Renviron file.

If you already have a .Renviron file, open it and follow the instructions below to add to it. If you don't have one, create one using File -> New File -> Text File in the RStudio menus.

Add one line to the text file. In this option, there are no quotes around the token value.

NEON_TOKEN=PASTE YOUR TOKEN HERE

Save the file as .Renviron, in the RStudio home directory identified above. Double check the spelling, this will not work if you have a typo. Re-start R to load the environment.

Once your token is assigned to an environment variable, use the function Sys.getenv() to access it. For example, in loadByProduct():

foliar <- loadByProduct(dpID="DP1.10026.001", site="all", 
                        package="expanded", check.size=F,
                        token=Sys.getenv("NEON_TOKEN"))

Get Lesson Code

neon-api-tokens-tutorial.R

Access and Work with NEON Geolocation Data

Authors: Claire K. Lunch

Last Updated: May 12, 2022

This tutorial explores NEON geolocation data. The focus is on the locations of NEON observational sampling and sensor data; NEON remote sensing data are inherently spatial and have dedicated tutorials. If you are interested in connecting remote sensing with ground-based measurements, the methods in the vegetation structure and canopy height model tutorial can be generalized to other data products.

In planning your analyses, consider what level of spatial resolution is required. There is no reason to carefully map each measurement if precise spatial locations aren't required to address your hypothesis! For example, if you want to use the Woody vegetation structure data product to calculate a site-scale estimate of biomass and production, the spatial coordinates of each tree are probably not needed. If you want to explore relationships between vegetation and beetle communities, you will need to identify the sampling plots where NEON measures both beetles and vegetation, but finer-scale coordinates may not be needed. Finally, if you want to relate vegetation measurements to airborne remote sensing data, you will need very accurate coordinates for each measurement on the ground.

Learning Objectives

After completing this tutorial you will be able to:

  • access NEON spatial data through data downloaded with the neonUtilities package.
  • access and plot specific sampling locations for TOS data products.
  • access and use sensor location data.

Things You’ll Need To Complete This Tutorial

R Programming Language

You will need a current version of R to complete this tutorial. We also recommend the RStudio IDE to work with R.

Setup R Environment

We'll need several R packages in this tutorial. Install the packages, if not already installed, and load the libraries for each.

# run once to get the package, and re-run if you need to get updates
install.packages("ggplot2")  # plotting
install.packages("neonUtilities")  # work with NEON data
install.packages("devtools")  # to use the install_github() function
devtools::install_github("NEONScience/NEON-geolocation/geoNEON")  # work with NEON spatial data



# run every time you start a script
library(ggplot2)
library(neonUtilities)
library(geoNEON)

options(stringsAsFactors=F)

Locations for observational data

Plot level locations

Both aquatic and terrestrial observational data downloads include spatial data in the downloaded files. The spatial data in the aquatic data files are the most precise locations available for the sampling events. The spatial data in the terrestrial data downloads represent the locations of the sampling plots. In some cases, the plot is the most precise location available, but for many terrestrial data products, more precise locations can be calculated for specific sampling events.

Here, we'll download the Woody vegetation structure (DP1.10098.001) data product, examine the plot location data in the download, then calculate the locations of individual trees. These steps can be extrapolated to other terrestrial observational data products; the specific sampling layout varies from data product to data product, but the methods for working with the data are similar.

First, let's download the vegetation structure data from one site, Wind River Experimental Forest (WREF).

If downloading data using the neonUtilities package is new to you, check out the Download and Explore tutorial.

# load veg structure data
vst <- loadByProduct(dpID="DP1.10098.001", site="WREF",
                     check.size=F)

Data downloaded this way are stored in R as a large list. For this tutorial, we'll work with the individual dataframes within this large list. Alternatively, each dataframe can be assigned as its own object.

To find the spatial data for any given data product, view the variables files to figure out which data table the spatial data are contained in.

View(vst$variables_10098)

Looking through the variables, we can see that the spatial data (decimalLatitude and decimalLongitude, etc) are in the vst_perplotperyear table. Let's take a look at the table.

View(vst$vst_perplotperyear)

As noted above, the spatial data here are at the plot level; the latitude and longitude represent the centroid of the sampling plot. We can map these plots on the landscape using the easting and northing variables; these are the UTM coordinates. At this site, tower plots are 40 m x 40 m, and distributed plots are 20 m x 20 m; we can use the symbols() function to draw boxes of the correct size.

We'll also use the treesPresent variable to subset to only those plots where trees were found and measured.

# start by subsetting data to plots with trees
vst.trees <- vst$vst_perplotperyear[which(
        vst$vst_perplotperyear$treesPresent=="Y"),]

# make variable for plot sizes
plot.size <- numeric(nrow(vst.trees))

# populate plot sizes in new variable
plot.size[which(vst.trees$plotType=="tower")] <- 40
plot.size[which(vst.trees$plotType=="distributed")] <- 20

# create map of plots
symbols(vst.trees$easting,
        vst.trees$northing,
        squares=plot.size, inches=F,
        xlab="Easting", ylab="Northing")

All vegetation structure plots at WREF

We can see where the plots are located across the landscape, and we can see the denser cluster of plots in the area near the micrometeorology tower.

For many analyses, this level of spatial data may be sufficient. Calculating the precise location of each tree is only required for certain hypotheses; consider whether you need these data when working with a data product with plot-level spatial data.

Looking back at the variables_10098 table, notice that there is a table in this data product called vst_mappingandtagging, suggesting we can find mapping data there. Let's take a look.

View(vst$vst_mappingandtagging)

Here we see data fields for stemDistance and stemAzimuth. Looking back at the variables_10098 file, we see these fields contain the distance and azimuth from a pointID to a specific stem. To calculate the precise coordinates of each tree, we would need to get the locations of the pointIDs, and then adjust the coordinates based on distance and azimuth. The Data Product User Guide describes how to carry out these steps, and can be downloaded from the Data Product Details page.

However, carrying out these calculations yourself is not the only option! The geoNEON package contains a function that can do this for you, for the TOS data products with location data more precise than the plot level.

Sampling locations

The getLocTOS() function in the geoNEON package uses the NEON API to access NEON location data and then makes protocol-specific calculations to return precise locations for each sampling effort. This function works for a subset of NEON TOS data products. The list of tables and data products that can be entered is in the package documentation on GitHub.

For more information about the NEON API, see the API tutorial and the API web page. For more information about the location calculations used in each data product, see the Data Product User Guide for each product.

The getLocTOS() function requires two inputs:

  • A data table that contains spatial data from a NEON TOS data product
  • The NEON table name of that data table

For vegetation structure locations, the function call looks like this. This function may take a while to download all the location data. For faster downloads, use an API token.

# calculate individual tree locations
vst.loc <- getLocTOS(data=vst$vst_mappingandtagging,
                           dataProd="vst_mappingandtagging")

What additional data are now available in the data obtained by getLocTOS()?

# print variable names that are new
names(vst.loc)[which(!names(vst.loc) %in% 
                             names(vst$vst_mappingandtagging))]

## [1] "utmZone"                  "adjNorthing"              "adjEasting"              
## [4] "adjCoordinateUncertainty" "adjDecimalLatitude"       "adjDecimalLongitude"     
## [7] "adjElevation"             "adjElevationUncertainty"

Now we have adjusted latitude, longitude, and elevation, and the corresponding easting and northing UTM data. We also have coordinate uncertainty data for these coordinates.

As we did with the plots above, we can use the easting and northing data to plot the locations of the individual trees.

plot(vst.loc$adjEasting, vst.loc$adjNorthing, pch=".",
     xlab="Easting", ylab="Northing")

All mapped tree locations at WREF

We can see the mapped trees in the same plots we mapped above. We've plotted each individual tree as a ., so all we can see at this scale is the cluster of dots that make up each plot.

Let's zoom in on a single plot:

plot(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")], 
     vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")], 
     pch=20, xlab="Easting", ylab="Northing")

Tree locations in plot WREF_085

Now we can see the location of each tree within the sampling plot WREF_085. This is interesting, but it would be more interesting if we could see more information about each tree. How are species distributed across the plot, for instance?

We can plot the tree species at each location using the text() function and the vst.loc$taxonID field.

plot(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")], 
     vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")], 
     type="n", xlab="Easting", ylab="Northing")
text(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")], 
     vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")],
     labels=vst.loc$taxonID[which(vst.loc$plotID=="WREF_085")],
     cex=0.5)

Tree species and their locations in plot WREF_085

Almost all of the mapped trees in this plot are either Pseudotsuga menziesii or Tsuga heterophylla (Douglas fir and Western hemlock), not too surprising at Wind River.

But suppose we want to map the diameter of each tree? This is a very common way to present a stem map, it gives a visual as if we were looking down on the plot from overhead and had cut off each tree at its measurement height.

Other than taxon, the attributes of the trees, such as diameter, height, growth form, and canopy position, are found in the vst_apparentindividual table, not in the vst_mappingandtagging table. We'll need to join the two tables to get the tree attributes together with their mapped locations.

The joining variable is individualID, the identifier for each tree, which is found in both tables. We'll also include the plot, site, and domain identifiers, to avoid creating duplicates of those columns.

veg <- merge(vst.loc, vst$vst_apparentindividual,
             by=c("individualID","namedLocation",
                  "domainID","siteID","plotID"))

Now we can use the symbols() function to plot the diameter of each tree, at its spatial coordinates, to create a correctly scaled map of boles in the plot. Note that stemDiameter is in centimeters, while easting and northing UTMs are in meters, so we divide by 100 to scale correctly.

symbols(veg$adjEasting[which(veg$plotID=="WREF_085")], 
        veg$adjNorthing[which(veg$plotID=="WREF_085")], 
        circles=veg$stemDiameter[which(veg$plotID=="WREF_085")]/100/2, 
        inches=F, xlab="Easting", ylab="Northing")

Tree bole diameters in plot WREF_085

If you are interested in taking the vegetation structure data a step further, and connecting measurements of trees on the ground to remotely sensed Lidar data, check out the Vegetation Structure and Canopy Height Model tutorial.

If you are interested in working with other terrestrial observational (TOS) data products, the basic techniques used here to find precise sampling locations and join data tables can be adapted to other TOS data products. Consult the Data Product User Guide for each data product to find details specific to that data product.

Locations for sensor data

Downloads of instrument system (IS) data include a file called sensor_positions.csv. The sensor positions file contains information about the coordinates of each sensor, relative to a reference location.

While the specifics vary, techniques are generalizable for working with sensor data and the sensor_positions.csv file. For this tutorial, let's look at the sensor locations for soil temperature (PAR; DP1.00041.001) at
the NEON Treehaven site (TREE) in July 2018. To reduce our file size, we'll use the 30 minute averaging interval. Our final product from this section is to create a depth profile of soil temperature in one soil plot.

If downloading data using the neonUtilties package is new to you, check out the neonUtilities tutorial.

This function will download about 7 MB of data as written so we have check.size =F for ease of running the code.

# load soil temperature data of interest 
soilT <- loadByProduct(dpID="DP1.00041.001", site="TREE",
                    startdate="2018-07", enddate="2018-07",
                    avg=30, check.size=F)

## Attempting to stack soil sensor data. Note that due to the number of soil sensors at each site, data volume is very high for these data. Consider dividing data processing into chunks, using the nCores= parameter to parallelize stacking, and/or using a high-performance system.

Sensor positions file

Now we can specifically look at the sensor positions file.

# create object for sensor positions file
pos <- soilT$sensor_positions_00041

# view column names
names(pos)

##  [1] "siteID"               "HOR.VER"              "name"                 "description"         
##  [5] "start"                "end"                  "referenceName"        "referenceDescription"
##  [9] "referenceStart"       "referenceEnd"         "xOffset"              "yOffset"             
## [13] "zOffset"              "pitch"                "roll"                 "azimuth"             
## [17] "referenceLatitude"    "referenceLongitude"   "referenceElevation"   "publicationDate"

# view table
View(pos)

The sensor locations are indexed by the HOR.VER variable - see the file naming conventions page for more details.

Using unique() we can view all the location indexes in this file.

unique(pos$HOR.VER)

##  [1] "001.501" "001.502" "001.503" "001.504" "001.505" "001.506" "001.507" "001.508" "001.509" "002.501"
## [11] "002.502" "002.503" "002.504" "002.505" "002.506" "002.507" "002.508" "002.509" "003.501" "003.502"
## [21] "003.503" "003.504" "003.505" "003.506" "003.507" "003.508" "003.509" "004.501" "004.502" "004.503"
## [31] "004.504" "004.505" "004.506" "004.507" "004.508" "004.509" "005.501" "005.502" "005.503" "005.504"
## [41] "005.505" "005.506" "005.507" "005.508" "005.509"

Soil temperature data are collected in 5 instrumented soil plots inside the tower footprint. We see this reflected in the data where HOR = 001 to 005. Within each plot, temperature is measured at 9 depths, seen in VER = 501 to 509. At some sites, the number of depths may differ slightly.

The x, y, and z offsets in the sensor positions file are the relative distance, in meters, to the reference latitude, longitude, and elevation in the file.

The HOR and VER indices in the sensor positions file correspond to the verticalPosition and horizontalPosition fields in soilT$ST_30_minute.

Note that there are two sets of position data for soil plot 001, and that one set has an end date in the file. This indicates sensors either moved or were relocated; in this case there was a frost heave incident. You can read about it in the issue log, both in the readme file and on the Data Product Details page.

Since we're working with data from July 2018, and the change in sensor locations is dated Nov 2018, we'll use the original locations. There are a number of ways to drop the later locations from the table; here, we find the rows in which the end field is empty, indicating no end date, and the rows corresponding to soil plot 001, and drop all the rows that meet both criteria.

pos <- pos[-intersect(grep("001.", pos$HOR.VER),
                      which(pos$end=="")),]

Our goal is to plot a time series of temperature, stratified by depth, so let's start by joining the data file and sensor positions file, to bring the depth measurements into the same data frame with the data.

# paste horizontalPosition and verticalPosition together
# to match HOR.VER in the sensor positions file
soilT$ST_30_minute$HOR.VER <- paste(soilT$ST_30_minute$horizontalPosition,
                                    soilT$ST_30_minute$verticalPosition,
                                    sep=".")

# left join to keep all temperature records
soilTHV <- merge(soilT$ST_30_minute, pos, 
                 by="HOR.VER", all.x=T)

And now we can plot soil temperature over time for each depth. We'll use ggplot since it's well suited to this kind of stratification. Each soil plot is its own panel, and each depth is its own line:

gg <- ggplot(soilTHV, 
             aes(endDateTime, soilTempMean, 
                 group=zOffset, color=zOffset)) +
             geom_line() + 
        facet_wrap(~horizontalPosition)
gg

## Warning: Removed 1488 row(s) containing missing values (geom_path).

Tiled figure of temperature by depth in each plot

We can see that as soil depth increases, temperatures become much more stable, while the shallowest measurement has a clear diurnal cycle. We can also see that something has gone wrong with one of the sensors in plot 002. To remove those data, use only values where the final quality flag passed, i.e. finalQF = 0

gg <- ggplot(subset(soilTHV, finalQF==0), 
             aes(endDateTime, soilTempMean, 
                 group=zOffset, color=zOffset)) +
             geom_line() + 
        facet_wrap(~horizontalPosition)
gg

Tiled figure of temperature by depth in each plot with only passing quality flags

Get Lesson Code

spatialData.R

Work With NEON's Plant Phenology Data

Authors: Megan A. Jones, Natalie Robinson, Lee Stanish

Last Updated: May 13, 2021

Many organisms, including plants, show patterns of change across seasons - the different stages of this observable change are called phenophases. In this tutorial we explore how to work with NEON plant phenophase data.

Objectives

After completing this activity, you will be able to:

  • work with NEON Plant Phenology Observation data.
  • use dplyr functions to filter data.
  • plot time series data in a bar plot using ggplot the function.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • neonUtilities: install.packages("neonUtilities")
  • ggplot2: install.packages("ggplot2")
  • dplyr: install.packages("dplyr")

More on Packages in R – Adapted from Software Carpentry.

Download Data

This tutorial is designed to have you download data directly from the NEON portal API using the neonUtilities package. However, you can also directly download this data, prepackaged, from FigShare. This data set includes all the files needed for the Work with NEON OS & IS Data - Plant Phenology & Temperature tutorial series. The data are in the format you would receive if downloading them using the zipsByProduct() function in the neonUtilities package.

Direct Download: NEON Phenology & Temp Time Series Teaching Data Subset (v2 - 2017-2019 data) (12 MB)


Additional Resources

  • NEON data portal
  • NEON Plant Phenology Observations data product user guide
  • RStudio's data wrangling (dplyr/tidyr) cheatsheet
  • NEONScience GitHub Organization
  • nneo API wrapper on CRAN

Plants change throughout the year - these are phenophases. Why do they change?

Explore Phenology Data

The following sections provide a brief overview of the NEON plant phenology observation data. When designing a research project using this data, you need to consult the documents associated with this data product and not rely solely on this summary.

The following description of the NEON Plant Phenology Observation data is modified from the data product user guide.

NEON Plant Phenology Observation Data

NEON collects plant phenology data and provides it as NEON data product DP1.10055.001.

The plant phenology observations data product provides in-situ observations of the phenological status and intensity of tagged plants (or patches) during discrete observations events.

Sampling occurs at all terrestrial field sites at site and season specific intervals. During Phase I (dominant species) sampling (pre-2021), three species with 30 individuals each are sampled. In 2021, Phase II (community) sampling will begin, with <=20 species with 5 or more individuals sampled will occur.

Status-based Monitoring

NEON employs status-based monitoring, in which the phenological condition of an individual is reported any time that individual is observed. At every observations bout, records are generated for every phenophase that is occurring and for every phenophase not occurring. With this approach, events (such as leaf emergence in Mediterranean climates, or flowering in many desert species) that may occur multiple times during a single year, can be captured. Continuous reporting of phenophase status enables quantification of the duration of phenophases rather than just their date of onset while allows enabling the explicit quantification of uncertainty in phenophase transition dates that are introduced by monitoring in discrete temporal bouts.

Specific products derived from this sampling include the observed phenophase status (whether or not a phenophase is occurring) and the intensity of phenophases for individuals in which phenophase status = ‘yes’. Phenophases reported are derived from the USA National Phenology Network (USA-NPN) categories. The number of phenophases observed varies by growth form and ranges from 1 phenophase (cactus) to 7 phenophases (semi-evergreen broadleaf). In this tutorial we will focus only on the state of the phenophase, not the phenophase intensity data.

Phenology Transects

Plant phenology observations occurs at all terrestrial NEON sites along an 800 meter square loop transect (primary) and within a 200 m x 200 m plot located within view of a canopy level, tower-mounted, phenology camera.

Diagram of a phenology transect layout, with meter layout marked. Point-level geolocations are recorded at eight reference
	points along the perimeter; plot-level geolocation at the plot centroid(star). Diagram of a phenology transect layout, with meter layout marked. Point-level geolocations are recorded at eight reference points along the perimeter, plot-level geolocation at the plot centroid (star). Source: National Ecological Observatory Network (NEON)

Timing of Observations

At each site, there are:

  • ~50 observation bouts per year.
  • no more that 100 sampling points per phenology transect.
  • no more than 9 sampling points per phenocam plot.
  • 1 annual measurement per year to collect annual size and disease status measurements from each sampling point.

Available Data Tables

In the downloaded data packet, data are available in two main files

  • phe_statusintensity: Plant phenophase status and intensity data
  • phe_perindividual: Geolocation and taxonomic identification for phenology plants
  • phe_perindividualperyear: recorded once a year, essentially the "metadata" about the plant: DBH, height, etc.

There are other files in each download including a readme with information on the data product and the download; a variables file that defines the term descriptions, data types, and units; a validation file with data entry validation and parsing rules; and an XML with machine readable metadata.

Stack NEON Data

NEON data are delivered in a site and year-month format. When you download data, you will get a single zipped file containing a directory for each month and site that you've requested data for. Dealing with these separate tables from even one or two sites over a 12 month period can be a bit overwhelming. Luckily NEON provides an R package neonUtilities that takes the unzipped downloaded file and joining the data files. The teaching data downloaded with this tutorial is already stacked. If you are working with other NEON data, please go through the tutorial to stack the data in R or in Python and then return to this tutorial.

Work with NEON Data

When we do this for phenology data we get three files, one for each data table, with all the data from your site and date range of interest.

First, we need to set up our R environment.

# install needed package (only uncomment & run if not already installed)
#install.packages("neonUtilities")
#install.packages("dplyr")
#install.packages("ggplot2")

# load needed packages
library(neonUtilities)
library(dplyr)
library(ggplot2)


options(stringsAsFactors=F) #keep strings as character type not factors

# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd <- "~/Git/data/" # Change this to match your local environment
setwd(wd)

Let's start by loading our data of interest. For this series, we'll work with date from the NEON Domain 02 sites:

  • Blandy Farm (BLAN)
  • Smithsonian Conservation Biology Institute (SCBI)
  • Smithsonian Environmental Research Center (SERC)

And we'll use data from January 2017 to December 2019. This downloads over 9MB of data. If this is too large, use a smaller date range. If you opt to do this, your figures and some output may look different later in the tutorial.

With this information, we can download our data using the neonUtilities package. If you are not using a NEON token to download your data, remove the token = Sys.getenv(NEON_TOKEN) line of code (learn more about NEON API tokens in the Using an API Token when Accessing NEON Data with neonUtilities tutorial).

If you are using the data downloaded at the start of the tutorial, use the commented out code in the second half of this code chunk.

## Two options for accessing data - programmatic or from the example dataset
# Read data from data portal 

phe <- loadByProduct(dpID = "DP1.10055.001", site=c("BLAN","SCBI","SERC"), 
										 startdate = "2017-01", enddate="2019-12", 
										 token = Sys.getenv("NEON_TOKEN"),
										 check.size = F) 

## API token was not recognized. Public rate limit applied.
## Finding available files
## 

|
| | 0% |
|= | 1% |
|= | 2% |
|== | 3% |
|=== | 4% |
|=== | 5% |
|==== | 6% |
|==== | 7% |
|===== | 8% |
|====== | 9% |
|====== | 11% |
|======= | 12% |
|======== | 13% |
|======== | 14% |
|========= | 15% |
|========== | 16% |
|========== | 17% |
|=========== | 18% |
|============ | 19% |
|============ | 20% |
|============= | 21% |
|============= | 22% |
|============== | 23% |
|=============== | 24% |
|=============== | 25% |
|================ | 26% |
|================= | 27% |
|================= | 28% |
|================== | 29% |
|=================== | 31% |
|=================== | 32% |
|==================== | 33% |
|===================== | 34% |
|===================== | 35% |
|====================== | 36% |
|====================== | 37% |
|======================= | 38% |
|======================== | 39% |
|======================== | 40% |
|========================= | 41% |
|========================== | 42% |
|========================== | 43% |
|=========================== | 44% |
|============================ | 45% |
|============================ | 46% |
|============================= | 47% |
|============================== | 48% |
|============================== | 49% |
|=============================== | 51% |
|=============================== | 52% |
|================================ | 53% |
|================================= | 54% |
|================================= | 55% |
|================================== | 56% |
|=================================== | 57% |
|=================================== | 58% |
|==================================== | 59% |
|===================================== | 60% |
|===================================== | 61% |
|====================================== | 62% |
|======================================= | 63% |
|======================================= | 64% |
|======================================== | 65% |
|======================================== | 66% |
|========================================= | 67% |
|========================================== | 68% |
|========================================== | 69% |
|=========================================== | 71% |
|============================================ | 72% |
|============================================ | 73% |
|============================================= | 74% |
|============================================== | 75% |
|============================================== | 76% |
|=============================================== | 77% |
|================================================ | 78% |
|================================================ | 79% |
|================================================= | 80% |
|================================================= | 81% |
|================================================== | 82% |
|=================================================== | 83% |
|=================================================== | 84% |
|==================================================== | 85% |
|===================================================== | 86% |
|===================================================== | 87% |
|====================================================== | 88% |
|======================================================= | 89% |
|======================================================= | 91% |
|======================================================== | 92% |
|========================================================= | 93% |
|========================================================= | 94% |
|========================================================== | 95% |
|========================================================== | 96% |
|=========================================================== | 97% |
|============================================================ | 98% |
|============================================================ | 99% |
|=============================================================| 100% ## ## Downloading files totaling approximately 7.985319 MB ## Downloading 95 files ## |
| | 0% |
|= | 1% |
|= | 2% |
|== | 3% |
|=== | 4% |
|=== | 5% |
|==== | 6% |
|===== | 7% |
|===== | 9% |
|====== | 10% |
|====== | 11% |
|======= | 12% |
|======== | 13% |
|======== | 14% |
|========= | 15% |
|========== | 16% |
|========== | 17% |
|=========== | 18% |
|============ | 19% |
|============ | 20% |
|============= | 21% |
|============== | 22% |
|============== | 23% |
|=============== | 24% |
|================ | 26% |
|================ | 27% |
|================= | 28% |
|================== | 29% |
|================== | 30% |
|=================== | 31% |
|=================== | 32% |
|==================== | 33% |
|===================== | 34% |
|===================== | 35% |
|====================== | 36% |
|======================= | 37% |
|======================= | 38% |
|======================== | 39% |
|========================= | 40% |
|========================= | 41% |
|========================== | 43% |
|=========================== | 44% |
|=========================== | 45% |
|============================ | 46% |
|============================= | 47% |
|============================= | 48% |
|============================== | 49% |
|============================== | 50% |
|=============================== | 51% |
|================================ | 52% |
|================================ | 53% |
|================================= | 54% |
|================================== | 55% |
|================================== | 56% |
|=================================== | 57% |
|==================================== | 59% |
|==================================== | 60% |
|===================================== | 61% |
|====================================== | 62% |
|====================================== | 63% |
|======================================= | 64% |
|======================================== | 65% |
|======================================== | 66% |
|========================================= | 67% |
|========================================== | 68% |
|========================================== | 69% |
|=========================================== | 70% |
|=========================================== | 71% |
|============================================ | 72% |
|============================================= | 73% |
|============================================= | 74% |
|============================================== | 76% |
|=============================================== | 77% |
|=============================================== | 78% |
|================================================ | 79% |
|================================================= | 80% |
|================================================= | 81% |
|================================================== | 82% |
|=================================================== | 83% |
|=================================================== | 84% |
|==================================================== | 85% |
|===================================================== | 86% |
|===================================================== | 87% |
|====================================================== | 88% |
|======================================================= | 89% |
|======================================================= | 90% |
|======================================================== | 91% |
|======================================================== | 93% |
|========================================================= | 94% |
|========================================================== | 95% |
|========================================================== | 96% |
|=========================================================== | 97% |
|============================================================ | 98% |
|============================================================ | 99% |
|=============================================================| 100% ## ## Unpacking zip files using 1 cores. ## Stacking operation across a single core. ## Stacking table phe_perindividual ## Stacking table phe_statusintensity ## Stacking table phe_perindividualperyear ## Copied the most recent publication of validation file to /stackedFiles ## Copied the most recent publication of categoricalCodes file to /stackedFiles ## Copied the most recent publication of variable definition file to /stackedFiles ## Finished: Stacked 3 data tables and 3 metadata tables! ## Stacking took 1.46806 secs

# if you aren't sure you can handle the data file size use check.size = T. 

# save dataframes from the downloaded list
ind <- phe$phe_perindividual  #individual information
status <- phe$phe_statusintensity  #status & intensity info


##If choosing to use example dataset downloaded from this tutorial: 

# Stack multiple files within the downloaded phenology data
#stackByTable("NEON-pheno-temp-timeseries_v2/filesToStack10055", folder = T)

# read in data - readTableNEON uses the variables file to assign the correct
# data type for each variable
#ind <- readTableNEON('NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/phe_perindividual.csv', 'NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/variables_10055.csv')

#status <- readTableNEON('NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/phe_statusintensity.csv', 'NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/variables_10055.csv')

Let's explore the data. Let's get to know what the ind dataframe looks like.

# What are the fieldnames in this dataset?
names(ind)

##  [1] "uid"                         "namedLocation"              
##  [3] "domainID"                    "siteID"                     
##  [5] "plotID"                      "decimalLatitude"            
##  [7] "decimalLongitude"            "geodeticDatum"              
##  [9] "coordinateUncertainty"       "elevation"                  
## [11] "elevationUncertainty"        "subtypeSpecification"       
## [13] "transectMeter"               "directionFromTransect"      
## [15] "ninetyDegreeDistance"        "sampleLatitude"             
## [17] "sampleLongitude"             "sampleGeodeticDatum"        
## [19] "sampleCoordinateUncertainty" "sampleElevation"            
## [21] "sampleElevationUncertainty"  "date"                       
## [23] "editedDate"                  "individualID"               
## [25] "taxonID"                     "scientificName"             
## [27] "identificationQualifier"     "taxonRank"                  
## [29] "nativeStatusCode"            "growthForm"                 
## [31] "vstTag"                      "samplingProtocolVersion"    
## [33] "measuredBy"                  "identifiedBy"               
## [35] "recordedBy"                  "remarks"                    
## [37] "dataQF"                      "publicationDate"            
## [39] "release"

# Unsure of what some of the variables are you? Look at the variables table!
View(phe$variables_10055)
# if using the pre-downloaded data, you need to read in the variables file 
# or open and look at it on your desktop
#var <- read.csv('NEON-pheno-temp-timeseries_v2/filesToStack10055/stackedFiles/variables_10055.csv')
#View(var)

# how many rows are in the data?
nrow(ind)

## [1] 433

# look at the first six rows of data.
#head(ind) #this is a good function to use but looks messy so not rendering it 

# look at the structure of the dataframe.
str(ind)

## 'data.frame':	433 obs. of  39 variables:
##  $ uid                        : chr  "76bf37d9-c834-43fc-a430-83d87e4b9289" "cf0239bb-2953-44a8-8fd2-051539be5727" "833e5f41-d5cb-4550-ba60-e6f000a2b1b6" "6c2e348d-d19e-4543-9d22-0527819ee964" ...
##  $ namedLocation              : chr  "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" ...
##  $ domainID                   : chr  "D02" "D02" "D02" "D02" ...
##  $ siteID                     : chr  "BLAN" "BLAN" "BLAN" "BLAN" ...
##  $ plotID                     : chr  "BLAN_061" "BLAN_061" "BLAN_061" "BLAN_061" ...
##  $ decimalLatitude            : num  39.1 39.1 39.1 39.1 39.1 ...
##  $ decimalLongitude           : num  -78.1 -78.1 -78.1 -78.1 -78.1 ...
##  $ geodeticDatum              : chr  NA NA NA NA ...
##  $ coordinateUncertainty      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ elevation                  : num  183 183 183 183 183 183 183 183 183 183 ...
##  $ elevationUncertainty       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ subtypeSpecification       : chr  "primary" "primary" "primary" "primary" ...
##  $ transectMeter              : num  491 464 537 15 753 506 527 305 627 501 ...
##  $ directionFromTransect      : chr  "Left" "Right" "Left" "Left" ...
##  $ ninetyDegreeDistance       : num  0.5 4 2 3 2 1 2 3 2 3 ...
##  $ sampleLatitude             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sampleLongitude            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sampleGeodeticDatum        : chr  "WGS84" "WGS84" "WGS84" "WGS84" ...
##  $ sampleCoordinateUncertainty: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sampleElevation            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sampleElevationUncertainty : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ date                       : POSIXct, format: "2016-04-20" ...
##  $ editedDate                 : POSIXct, format: "2016-05-09" ...
##  $ individualID               : chr  "NEON.PLA.D02.BLAN.06290" "NEON.PLA.D02.BLAN.06501" "NEON.PLA.D02.BLAN.06204" "NEON.PLA.D02.BLAN.06223" ...
##  $ taxonID                    : chr  "RHDA" "SOAL6" "RHDA" "LOMA6" ...
##  $ scientificName             : chr  "Rhamnus davurica Pall." "Solidago altissima L." "Rhamnus davurica Pall." "Lonicera maackii (Rupr.) Herder" ...
##  $ identificationQualifier    : chr  NA NA NA NA ...
##  $ taxonRank                  : chr  "species" "species" "species" "species" ...
##  $ nativeStatusCode           : chr  "I" "N" "I" "I" ...
##  $ growthForm                 : chr  "Deciduous broadleaf" "Forb" "Deciduous broadleaf" "Deciduous broadleaf" ...
##  $ vstTag                     : chr  NA NA NA NA ...
##  $ samplingProtocolVersion    : chr  NA "NEON.DOC.014040vJ" "NEON.DOC.014040vJ" "NEON.DOC.014040vJ" ...
##  $ measuredBy                 : chr  "jcoloso@neoninc.org" "jward@battelleecology.org" "alandes@field-ops.org" "alandes@field-ops.org" ...
##  $ identifiedBy               : chr  "shackley@neoninc.org" "llemmon@field-ops.org" "llemmon@field-ops.org" "llemmon@field-ops.org" ...
##  $ recordedBy                 : chr  "shackley@neoninc.org" NA NA NA ...
##  $ remarks                    : chr  "Nearly dead shaded out" "no entry" "no entry" "no entry" ...
##  $ dataQF                     : chr  NA NA NA NA ...
##  $ publicationDate            : chr  "20201218T103411Z" "20201218T103411Z" "20201218T103411Z" "20201218T103411Z" ...
##  $ release                    : chr  "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" ...

Notice that the neonUtilities package read the data type from the variables file and then automatically converts the data to the correct date type in R.

(Note that if you first opened your data file in Excel, you might see 06/14/2014 as the format instead of 2014-06-14. Excel can do some weird interesting things to dates.)

Phenology status

Now let's look at the status data.

# What variables are included in this dataset?
names(status)

##  [1] "uid"                           "namedLocation"                
##  [3] "domainID"                      "siteID"                       
##  [5] "plotID"                        "date"                         
##  [7] "editedDate"                    "dayOfYear"                    
##  [9] "individualID"                  "phenophaseName"               
## [11] "phenophaseStatus"              "phenophaseIntensityDefinition"
## [13] "phenophaseIntensity"           "samplingProtocolVersion"      
## [15] "measuredBy"                    "recordedBy"                   
## [17] "remarks"                       "dataQF"                       
## [19] "publicationDate"               "release"

nrow(status)

## [1] 219357

#head(status)   #this is a good function to use but looks messy so not rendering it 
str(status)

## 'data.frame':	219357 obs. of  20 variables:
##  $ uid                          : chr  "b69ada55-41d1-41c7-9031-149c54de51f9" "9be6f7ad-4422-40ac-ba7f-e32e0184782d" "58e7aeaf-163c-4ea2-ad75-db79a580f2f8" "efe7ca02-d09e-4964-b35d-aebdac8f3efb" ...
##  $ namedLocation                : chr  "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" "BLAN_061.phenology.phe" ...
##  $ domainID                     : chr  "D02" "D02" "D02" "D02" ...
##  $ siteID                       : chr  "BLAN" "BLAN" "BLAN" "BLAN" ...
##  $ plotID                       : chr  "BLAN_061" "BLAN_061" "BLAN_061" "BLAN_061" ...
##  $ date                         : POSIXct, format: "2017-02-24" ...
##  $ editedDate                   : POSIXct, format: "2017-03-31" ...
##  $ dayOfYear                    : num  55 55 55 55 55 55 55 55 55 55 ...
##  $ individualID                 : chr  "NEON.PLA.D02.BLAN.06229" "NEON.PLA.D02.BLAN.06226" "NEON.PLA.D02.BLAN.06222" "NEON.PLA.D02.BLAN.06223" ...
##  $ phenophaseName               : chr  "Leaves" "Leaves" "Leaves" "Leaves" ...
##  $ phenophaseStatus             : chr  "no" "no" "no" "no" ...
##  $ phenophaseIntensityDefinition: chr  NA NA NA NA ...
##  $ phenophaseIntensity          : chr  NA NA NA NA ...
##  $ samplingProtocolVersion      : chr  NA NA NA NA ...
##  $ measuredBy                   : chr  "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" ...
##  $ recordedBy                   : chr  "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" "llemmon@neoninc.org" ...
##  $ remarks                      : chr  NA NA NA NA ...
##  $ dataQF                       : chr  "legacyData" "legacyData" "legacyData" "legacyData" ...
##  $ publicationDate              : chr  "20201217T203824Z" "20201217T203824Z" "20201217T203824Z" "20201217T203824Z" ...
##  $ release                      : chr  "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" "RELEASE-2021" ...

# date range
min(status$date)

## [1] "2017-02-24 GMT"

max(status$date)

## [1] "2019-12-12 GMT"

Clean up the Data

  • remove duplicates (full rows)
  • convert to date format
  • retain only the most recent editedDate in the perIndividual and status table.

Remove Duplicates

The individual table (ind) file is included in each site by month-year file. As a result when all the tables are stacked there are many duplicates.

Let's remove any duplicates that exist.

# drop UID as that will be unique for duplicate records
ind_noUID <- select(ind, -(uid))

status_noUID <- select(status, -(uid))

# remove duplicates
## expect many

ind_noD <- distinct(ind_noUID)
nrow(ind_noD)

## [1] 433

status_noD<-distinct(status_noUID)
nrow(status_noD)

## [1] 216837

Variable Overlap between Tables

From the initial inspection of the data we can see there is overlap in variable names between the fields.

Let's see what they are.

# where is there an intersection of names
intersect(names(status_noD), names(ind_noD))

##  [1] "namedLocation"           "domainID"               
##  [3] "siteID"                  "plotID"                 
##  [5] "date"                    "editedDate"             
##  [7] "individualID"            "samplingProtocolVersion"
##  [9] "measuredBy"              "recordedBy"             
## [11] "remarks"                 "dataQF"                 
## [13] "publicationDate"         "release"

There are several fields that overlap between the datasets. Some of these are expected to be the same and will be what we join on.

However, some of these will have different values in each table. We want to keep those distinct value and not join on them. Therefore, we can rename these fields before joining:

  • date
  • editedDate
  • measuredBy
  • recordedBy
  • samplingProtocolVersion
  • remarks
  • dataQF
  • publicationDate

Now we want to rename the variables that would have duplicate names. We can rename all the variables in the status object to have "Stat" at the end of the variable name.

# in Status table rename like columns 
status_noD <- rename(status_noD, dateStat=date, 
										 editedDateStat=editedDate, measuredByStat=measuredBy, 
										 recordedByStat=recordedBy, 
										 samplingProtocolVersionStat=samplingProtocolVersion, 
										 remarksStat=remarks, dataQFStat=dataQF, 
										 publicationDateStat=publicationDate)

Filter to last editedDate

The individual (ind) table contains all instances that any of the location or taxonomy data of an individual was updated. Therefore there are many rows for some individuals. We only want the latest editedDate on ind.

# retain only the max of the date for each individualID
ind_last <- ind_noD %>%
	group_by(individualID) %>%
	filter(editedDate==max(editedDate))

# oh wait, duplicate dates, retain only the most recent editedDate
ind_lastnoD <- ind_last %>%
	group_by(editedDate, individualID) %>%
	filter(row_number()==1)

Join Dataframes

Now we can join the two data frames on all the variables with the same name. We use a left_join() from the dpylr package because we want to match all the rows from the "left" (first) dataframe to any rows that also occur in the "right" (second) dataframe.

Check out RStudio's data wrangling (dplyr/tidyr) cheatsheet for other types of joins.

# Create a new dataframe "phe_ind" with all the data from status and some from ind_lastnoD
phe_ind <- left_join(status_noD, ind_lastnoD)

## Joining, by = c("namedLocation", "domainID", "siteID", "plotID", "individualID", "release")

Now that we have clean datasets we can begin looking into our particular data to address our research question: do plants show patterns of changes in phenophase across season?

Patterns in Phenophase

From our larger dataset (several sites, species, phenophases), let's create a dataframe with only the data from a single site, species, and phenophase and call it phe_1sp.

Select Site(s) of Interest

To do this, we'll first select our site of interest. Note how we set this up with an object that is our site of interest. This will allow us to more easily change which site or sites if we want to adapt our code later.

# set site of interest
siteOfInterest <- "SCBI"

# use filter to select only the site of Interest 
## using %in% allows one to add a vector if you want more than one site. 
## could also do it with == instead of %in% but won't work with vectors

phe_1st <- filter(phe_ind, siteID %in% siteOfInterest)

Select Species of Interest

Now we may only want to view a single species or a set of species. Let's first look at the species that are present in our data. We could do this just by looking at the taxonID field which give the four letter UDSA plant code for each species. But if we don't know all the plant codes, we can get a bit fancier and view both

# see which species are present - taxon ID only
unique(phe_1st$taxonID)

## [1] "JUNI" "MIVI" "LITU"

# or see which species are present with taxon ID + species name
unique(paste(phe_1st$taxonID, phe_1st$scientificName, sep=' - ')) 

## [1] "JUNI - Juglans nigra L."                      
## [2] "MIVI - Microstegium vimineum (Trin.) A. Camus"
## [3] "LITU - Liriodendron tulipifera L."

For now, let's choose only the flowering tree Liriodendron tulipifera (LITU). By writing it this way, we could also add a list of species to the speciesOfInterest object to select for multiple species.

speciesOfInterest <- "LITU"

#subset to just "LITU"
# here just use == but could also use %in%
phe_1sp <- filter(phe_1st, taxonID==speciesOfInterest)

# check that it worked
unique(phe_1sp$taxonID)

## [1] "LITU"

Select Phenophase of Interest

And, perhaps a single phenophase.

# see which phenophases are present
unique(phe_1sp$phenophaseName)

## [1] "Open flowers"         "Breaking leaf buds"  
## [3] "Colored leaves"       "Increasing leaf size"
## [5] "Falling leaves"       "Leaves"

phenophaseOfInterest <- "Leaves"

#subset to just the phenosphase of interest 
phe_1sp <- filter(phe_1sp, phenophaseName %in% phenophaseOfInterest)

# check that it worked
unique(phe_1sp$phenophaseName)

## [1] "Leaves"

Select only Primary Plots

NEON plant phenology observations are collected along two types of plots.

  • Primary plots: an 800 meter square phenology loop transect
  • Phenocam plots: a 200 m x 200 m plot located within view of a canopy level, tower-mounted, phenology camera

In the data, these plots are differentiated by the subtypeSpecification. Depending on your question you may want to use only one or both of these plot types. For this activity, we're going to only look at the primary plots.

**Data Tip:** How do I learn this on my own? Read the Data Product User Guide and use the variables files with the data download to find the corresponding variables names.
# what plots are present?
unique(phe_1sp$subtypeSpecification)

## [1] "primary"  "phenocam"

# filter
phe_1spPrimary <- filter(phe_1sp, subtypeSpecification == 'primary')

# check that it worked
unique(phe_1spPrimary$subtypeSpecification)

## [1] "primary"

Total in Phenophase of Interest

The phenophaseState is recorded as "yes" or "no" that the individual is in that phenophase. The phenophaseIntensity are categories for how much of the individual is in that state. For now, we will stick with phenophaseState.

We can now calculate the total number of individuals with that state. We use n_distinct(indvidualID) count the individuals (and not the records) in case there are duplicate records for an individual.

But later on we'll also want to calculate the percent of the observed individuals in the "leaves" status, therefore, we're also adding in a step here to retain the sample size so that we can calculate % later.

Here we use pipes %>% from the dpylr package to "pass" objects onto the next function.

# Calculate sample size for later use
sampSize <- phe_1spPrimary %>%
  group_by(dateStat) %>%
  summarise(numInd= n_distinct(individualID))

# Total in status by day for distinct individuals
inStat <- phe_1spPrimary%>%
  group_by(dateStat, phenophaseStatus)%>%
  summarise(countYes=n_distinct(individualID))

## `summarise()` has grouped output by 'dateStat'. You can override using the `.groups` argument.

inStat <- full_join(sampSize, inStat, by="dateStat")

# Retain only Yes
inStat_T <- filter(inStat, phenophaseStatus %in% "yes")

# check that it worked
unique(inStat_T$phenophaseStatus)

## [1] "yes"

Now that we have the data we can plot it.

Plot with ggplot

The ggplot() function within the ggplot2 package gives us considerable control over plot appearance. Three basic elements are needed for ggplot() to work:

  1. The data_frame: containing the variables that we wish to plot,
  2. aes (aesthetics): which denotes which variables will map to the x-, y- (and other) axes,
  3. geom_XXXX (geometry): which defines the data's graphical representation (e.g. points (geom_point), bars (geom_bar), lines (geom_line), etc).

The syntax begins with the base statement that includes the data_frame (inStat_T) and associated x (date) and y (n) variables to be plotted:

ggplot(inStat_T, aes(date, n))

**Data Tip:** For a more detailed introduction to using `ggplot()`, visit *Time Series 05: Plot Time Series with ggplot2 in R* tutorial.

Bar Plots with ggplot

To successfully plot, the last piece that is needed is the geometry type. To create a bar plot, we set the geom element from to geom_bar().

The default setting for a ggplot bar plot - geom_bar() - is a histogram designated by stat="bin". However, in this case, we want to plot count values. We can use geom_bar(stat="identity") to force ggplot to plot actual values.

# plot number of individuals in leaf
phenoPlot <- ggplot(inStat_T, aes(dateStat, countYes)) +
    geom_bar(stat="identity", na.rm = TRUE) 

phenoPlot

Bar plot showing the count of Liriodendrum tulipifera (LITU) individuals from January 2017 through December 2019 at the Smithsonian Conservation Biology Institute (SCBI). Counts represent individuals that were recorded as a 'yes' for the phenophase of interest,'Leaves', and were from the primary plots.

# Now let's make the plot look a bit more presentable
phenoPlot <- ggplot(inStat_T, aes(dateStat, countYes)) +
    geom_bar(stat="identity", na.rm = TRUE) +
    ggtitle("Total Individuals in Leaf") +
    xlab("Date") + ylab("Number of Individuals") +
    theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
    theme(text = element_text(size=18))

phenoPlot

Bar plot showing the count of Liriodendrum tulipifera (LITU) individuals from January 2017 through December 2019 at the Smithsonian Conservation Biology Institute (SCBI). Counts represent individuals that were recorded as a 'yes' for the phenophase of interest,'Leaves', and were from the primary plots. Axis labels and title have been added to make the graph more presentable.

We could also covert this to percentage and plot that.

# convert to percent
inStat_T$percent<- ((inStat_T$countYes)/inStat_T$numInd)*100

# plot percent of leaves
phenoPlot_P <- ggplot(inStat_T, aes(dateStat, percent)) +
    geom_bar(stat="identity", na.rm = TRUE) +
    ggtitle("Proportion in Leaf") +
    xlab("Date") + ylab("% of Individuals") +
    theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
    theme(text = element_text(size=18))

phenoPlot_P

It might also be useful to visualize the data in different ways while exploring the data. As such, before plotting, we can convert our count data into a percentage by writting an expression that divides the number of individuals with a 'yes' for the phenophase of interest, 'Leaves', by the total number of individuals and then multiplies the result by 100. Using this newly generated dataset of percentages, we can plot the data similarly to how we did in the previous plot. Only this time, the y-axis range goes from 0 to 100 to reflect the percentage data we just generated. The resulting plot now shows a bar plot of the proportion of Liriodendrum tulipifera (LITU) individuals from January 2017 through December 2019 at the Smithsonian Conservation Biology Institute (SCBI). The y-axis represents the percent of individuals that were recorded as a 'yes' for the phenophase of interest,'Leaves', and were from the primary plots.

The plots demonstrate the nice expected pattern of increasing leaf-out, peak, and drop-off.

Drivers of Phenology

Now that we see that there are differences in and shifts in phenophases, what are the drivers of phenophases?

The NEON phenology measurements track sensitive and easily observed indicators of biotic responses to climate variability by monitoring the timing and duration of phenological stages in plant communities. Plant phenology is affected by forces such as temperature, timing and duration of pest infestations and disease outbreaks, water fluxes, nutrient budgets, carbon dynamics, and food availability and has feedbacks to trophic interactions, carbon sequestration, community composition and ecosystem function. (quoted from Plant Phenology Observations user guide.)

Filter by Date

In the next part of this series, we will be exploring temperature as a driver of phenology. Temperature date is quite large (NEON provides this in 1 minute or 30 minute intervals) so let's trim our phenology date down to only one year so that we aren't working with as large a data.

Let's filter to just 2018 data.

# use filter to select only the date of interest 
phe_1sp_2018 <- filter(inStat_T, dateStat >= "2018-01-01" & dateStat <= "2018-12-31")

# did it work?
range(phe_1sp_2018$dateStat)

## [1] "2018-04-13 GMT" "2018-11-20 GMT"

How does that look?

# Now let's make the plot look a bit more presentable
phenoPlot18 <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
    geom_bar(stat="identity", na.rm = TRUE) +
    ggtitle("Total Individuals in Leaf") +
    xlab("Date") + ylab("Number of Individuals") +
    theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
    theme(text = element_text(size=18))

phenoPlot18

In the previous step, we filtered our data by date to only include data from 2018. Reviewing the newly generated dataset we get a bar plot showing the count of Liriodendrum tulipifera (LITU) individuals at the Smithsonian Conservation Biology Institute (SCBI) for the year 2018. Counts represent individuals that were recorded as a 'yes' for the phenophase of interest,'Leaves', and were from the primary plots.

Now that we've filtered down to just the 2018 data from SCBI for LITU in leaf, we may want to save that subsetted data for another use. To do that you can write the data frame to a .csv file.

You do not need to follow this step if you are continuing on to the next tutorials in this series as you already have the data frame in your environment. Of course if you close R and then come back to it, you will need to re-load this data and instructions for that are provided in the relevant tutorials.

# Write .csv - this step is optional 
# This will write to your current working directory, change as desired.
write.csv( phe_1sp_2018 , file="NEONpheno_LITU_Leaves_SCBI_2018.csv", row.names=F)

#If you are using the downloaded example date, this code will write it to the 
# pheno data file. Note - this file is already a part of the download.

#write.csv( phe_1sp_2018 , file="NEON-pheno-temp-timeseries_v2/NEONpheno_LITU_Leaves_SCBI_2018.csv", row.names=F)

Get Lesson Code

01-explore-phenology-data.R

Work with NEON's Single-Aspirated Air Temperature Data

Authors: Lee Stanish, Megan A. Jones, Natalie Robinson

Last Updated: Apr 8, 2021

In this tutorial, we explore the NEON single-aspirated air temperature data. We then discuss how to interpret the variables, how to work with date-time and date formats, and finally how to plot the data.

This tutorial is part of a series on how to work with both discrete and continuous time series data with NEON plant phenology and temperature data products.

Objectives

After completing this activity, you will be able to:

  • work with "stacked" NEON Single-Aspirated Air Temperature data.
  • correctly format date-time data.
  • use dplyr functions to filter data.
  • plot time series data in scatter plots using ggplot function.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • neonUtilities: install.packages("neonUtilities")
  • ggplot2: install.packages("ggplot2")
  • dplyr: install.packages("dplyr")
  • tidyr: install.packages("tidyr")

More on Packages in R – Adapted from Software Carpentry.

Download Data

This tutorial is designed to have you download data directly from the NEON portal API using the neonUtilities package. However, you can also directly download this data, prepackaged, from FigShare. This data set includes all the files needed for the Work with NEON OS & IS Data - Plant Phenology & Temperature tutorial series. The data are in the format you would receive if downloading them using the zipsByProduct() function in the neonUtilities package.

Direct Download: NEON Phenology & Temp Time Series Teaching Data Subset (v2 - 2017-2019 data) (12 MB)


Additional Resources

  • NEON data portal
  • RStudio's data wrangling (dplyr/tidyr) cheatsheet
  • NEONScience GitHub Organization
  • nneo API wrapper on CRAN
  • RStudio's data wrangling (dplyr/tidyr) cheatsheet
  • Hadley Wickham's documentation on the ggplot2 package.
  • Winston Chang's
*Cookbook for R* site based on his *R Graphics Cookbook* text.

Explore NEON Air Temperature Data

Air temperature is continuously monitored by NEON by two methods. At terrestrial sites temperature for the top of the tower will be derived from a triple redundant aspirated air temperature sensor. This is provided as NEON data product DP1.00003.001. Single Aspirated Air Temperature Sensors (SAATS) are deployed to develop temperature profiles at the tower at NEON terrestrial sites and on the meteorological stations at NEON aquatic sites. This is provided as NEON data product DP1.00002.001. These data are also available as part of the NEON Mobile Deployment Platforms.

When designing a research project using this data, you should consult the documents associated with this or any data product and not rely solely on this summary.

Single-aspirated Air Temperature

Air temperature profiles are ascertained by deploying SAATS at various heights on NEON tower infrastructure. Air temperature at aquatic sites is measured using a single SAAT at a standard height of 3m above ground level. Air temperature for this data product is provided as one- and thirty-minute averages of 1 Hz observations. Temperature observations are made using platinum resistance thermometers, which are housed in a fan aspirated shield to reduce radiative bias. The temperature is measured in Ohms and subsequently converted to degrees Celsius during data processing. Details on the conversion can be found in the associated Algorithm Theoretic Basis Document (ATBD) for any instrumented data product.

Available Data Tables

The SAAT data product has two available data tables that are delivered for each site and month-year selected. In addition, there are several metadata files that provide you with additional useful information.

  • a readme with information on the data product and the download;
  • a variables file that defines the term descriptions, data types, and units;
  • a validation file with data entry validation and parsing rules; and
  • an XML file with machine readable metadata.

For the data tables, there are both a 1-minute average and a 30-minute average available. If you download data directly off the portal, you will get one of these files for each level on the tower for each site & month-year selected.

File Naming Conventions

It is important to understand the file names to know which file is which. The readme associated with the data provides the following information:

The file naming convention for sensor data files is NEON.DOM.SITE.DPL.PRNUM.REV.TERMS.HOR.VER.TMI.DESC

where:

  • DOM refers to the domain of data acquisition (D01 or D20)
  • SITE refers to the standardized four-character alphabetic code of the site of data acquisition.
  • DPL refers to the data product processing level
  • PRNUM refers to the data product number (see Explore Data Products.)
  • REV refers to the revision number of the data product. (001 = initial REV, Engineering-Grade or Provisional; 101 = initial REV, Science-Grade)
  • TERMS is used in data product numbering to identify a sub-product or discrete vector of metadata. Since each download file typically contains several sub-products, this field is set to 00000 in the file name to maintain consistency with the data product numbering scheme.
  • HOR refers to measurement locations within one horizontal plane.
  • VER refers to measurement locations within one vertical plane. For example, if eight temperature measurements are collected, one at each tower level, the number in the VER field would range from 010-080.
  • TMI is the Temporal Index; refers to the temporal representation, averaging period, or coverage of the data product (e.g., minute, hour, month, year, sub-hourly, day, lunar month, single instance, seasonal, annual, multi-annual)
  • DESC is an abbreviated description of the data product

Therefore, we can interpret the following .csv file name

NEON.D02.SERC.DP1.00002.001.00000.000.010.030.SAAT_30min.csv

as NEON data from Smithsonian Environmental Research Center (SERC) located in Domain 02 (D02). The specific data product is level 1 data product (DP1) and is single aspirated temperature data (00002). There have not been revisions, there are no associated terms, and there is no differentiation in horizontal plane. This data comes from the first (010) vertical level of the tower. The temporal interval is 30-minute averaged data (030; the other option in our data is 1 minute averaging. Finally there is the abbreviated description that is more human readable and tells us again that it is single-aspirated air temperature at 30 minute averages.

Access NEON Data

There are several ways to access NEON data, directly from the NEON data portal, access through a data partner (select data products only), writing code to directly pull data from the NEON API, or, as we'll use here, using the neonUtilities package which is a wrapper for the API with useful function to make working with the data easier.

Data Downloaded Direct from Portal

If you prefer to work with data that are downloaded from the data portal, please review the Getting started and Stack the downloaded data sections of the Download and Explore NEON Data tutorial. This will get you to the point where you can upload your data from sites or dates of interest and resume this tutorial.

Import Data

First, we need to set up our environment with the packages needed for this tutorial.

# Install needed package (only uncomment & run if not already installed)
#install.packages("neonUtilities")
#install.packages("ggplot2")
#install.packages("dplyr")
#install.packages("tidyr")


# Load required libraries
library(neonUtilities)  # for accessing NEON data
library(ggplot2)  # for plotting
library(dplyr)  # for data munging
library(tidyr)  # for data munging


# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd <- "~/Documents/data/" # Change this to match your local environment
setwd(wd)

Data of Interest

This tutorial is part of series working with discrete, plant phenology data and (near) continuous temperature data. Our overall "research" question is to see if there is any correlation between the plant phenology and the temperature. Therefore, we will want to work with data that aligns with the plant phenology data that we worked with in the first tutorial. If you are only interested in working with the temperature data, you do not need to complete the previous tutorial.

Our data of interest will be the temperature data from 2018 from NEON's Smithsonian Conservation Biology Institute (SCBI) field site located in Virginia near the northern terminus of the Blue Ridge Mountains.

NEON single aspirated air temperature data is available in two averaging intervals, 1 minute and 30 minute intervals. Which data you want to work with is going to depend on your research questions. Here, we're going to only download and work with the 30 minute interval data as we're primarily interest in longer term (daily, weekly, annual) patterns.

This will download 7.7 MiB of data. check.size is set to false (F) to improve flow of the script but is always a good idea to view the size with true (T) before downloading a new dataset.

If you are using the data downloaded at the start of the tutorial, use the commented out code in the second half of this code chunk.

# download data of interest - Single Aspirated Air Temperature
saat<-loadByProduct(dpID="DP1.00002.001", site="SCBI", 
										startdate="2018-01", enddate="2018-12", 
										package="basic", 
										avg = "30",
										token = Sys.getenv("NEON_TOKEN"),
										check.size = F)

## Input parameter avg is deprecated; use timeIndex to download by time interval.
## Finding available files
## 

|
| | 0% |
|======= | 8% |
|============= | 17% |
|==================== | 25% |
|========================== | 33% |
|================================= | 42% |
|======================================== | 50% |
|============================================== | 58% |
|===================================================== | 67% |
|=========================================================== | 75% |
|================================================================== | 83% |
|======================================================================== | 92% |
|===============================================================================| 100% ## ## Downloading files totaling approximately 8.056716 MB ## Downloading 63 files ## |
| | 0% |
|= | 2% |
|=== | 3% |
|==== | 5% |
|===== | 6% |
|====== | 8% |
|======== | 10% |
|========= | 11% |
|========== | 13% |
|=========== | 15% |
|============= | 16% |
|============== | 18% |
|=============== | 19% |
|================= | 21% |
|================== | 23% |
|=================== | 24% |
|==================== | 26% |
|====================== | 27% |
|======================= | 29% |
|======================== | 31% |
|========================= | 32% |
|=========================== | 34% |
|============================ | 35% |
|============================= | 37% |
|=============================== | 39% |
|================================ | 40% |
|================================= | 42% |
|================================== | 44% |
|==================================== | 45% |
|===================================== | 47% |
|====================================== | 48% |
|======================================== | 50% |
|========================================= | 52% |
|========================================== | 53% |
|=========================================== | 55% |
|============================================= | 56% |
|============================================== | 58% |
|=============================================== | 60% |
|================================================ | 61% |
|================================================== | 63% |
|=================================================== | 65% |
|==================================================== | 66% |
|====================================================== | 68% |
|======================================================= | 69% |
|======================================================== | 71% |
|========================================================= | 73% |
|=========================================================== | 74% |
|============================================================ | 76% |
|============================================================= | 77% |
|============================================================== | 79% |
|================================================================ | 81% |
|================================================================= | 82% |
|================================================================== | 84% |
|==================================================================== | 85% |
|===================================================================== | 87% |
|====================================================================== | 89% |
|======================================================================= | 90% |
|========================================================================= | 92% |
|========================================================================== | 94% |
|=========================================================================== | 95% |
|============================================================================ | 97% |
|============================================================================== | 98% |
|===============================================================================| 100% ## ## Stacking operation across a single core. ## Stacking table SAAT_30min ## Merged the most recent publication of sensor position files for each site and saved to /stackedFiles ## Copied the most recent publication of variable definition file to /stackedFiles ## Finished: Stacked 1 data tables and 2 metadata tables! ## Stacking took 0.4314961 secs

##If choosing to use example dataset downloaded from this tutorial: 

# Stack multiple files within the downloaded phenology data
#stackByTable("NEON-pheno-temp-timeseries_v2/filesToStack00002", folder = T)

# read in data - readTableNEON uses the variables file to assign the correct
# data type for each variable
#SAAT_30min <- readTableNEON('NEON-pheno-temp-timeseries_v2/filesToStack00002/stackedFiles/SAAT_30min.csv', 'NEON-pheno-temp-timeseries_v2/filesToStack00002/stackedFiles/variables_00002.csv')

Explore Temperature Data

Now that you have the data, let's take a look at the structure and understand what's in the data. The data (saat) come in as a large list of four items.

# View the list
View(saat)

# if using the pre-downloaded data, you need to read in the variables file 
# or open and look at it on your desktop
#var <- read.csv('NEON-pheno-temp-timeseries_v2/filesToStack00002/stackedFiles/variables_00002.csv')
#View(var)

So what exactly are these four files and why would you want to use them?

  • data file(s): There will always be one or more dataframes that include the primary data of the data product you downloaded. Since we downloaded only the 30 minute averaged data we only have one data table SAAT_30min.
  • readme_xxxxx: The readme file, with the corresponding 5 digits from the data product number, provides you with important information relevant to the data product and the specific instance of downloading the data.
  • sensor_positions_xxxxx: this file contains information about the coordinates of each sensor, relative to a reference location.
  • variables_xxxxx: this file contains all the variables found in the associated data table(s). This includes full definitions, units, and other important information.

Since we want to work with the individual files, let's create individual objects from the large list. There are several ways to do this, including the following two.

# if using the pre-downloaded data - you can skip this part.
# assign individual dataFrames in the list as an object
#SAAT_30min <- saat$SAAT_30min

# unlist all objects
list2env(saat, .GlobalEnv)

## <environment: R_GlobalEnv>

Now we the four files as separate R objects. But what is in our data file?

# what is in the data?
str(SAAT_30min)

## 'data.frame':	87600 obs. of  16 variables:
##  $ domainID           : chr  "D02" "D02" "D02" "D02" ...
##  $ siteID             : chr  "SCBI" "SCBI" "SCBI" "SCBI" ...
##  $ horizontalPosition : chr  "000" "000" "000" "000" ...
##  $ verticalPosition   : chr  "010" "010" "010" "010" ...
##  $ startDateTime      : POSIXct, format: "2018-01-01 00:00:00" "2018-01-01 00:30:00" ...
##  $ endDateTime        : POSIXct, format: "2018-01-01 00:30:00" "2018-01-01 01:00:00" ...
##  $ tempSingleMean     : num  -11.8 -11.8 -12 -12.2 -12.4 ...
##  $ tempSingleMinimum  : num  -12.1 -12.2 -12.3 -12.6 -12.8 ...
##  $ tempSingleMaximum  : num  -11.4 -11.3 -11.3 -11.7 -12.1 ...
##  $ tempSingleVariance : num  0.0208 0.0315 0.0412 0.0393 0.0361 0.0289 0.0126 0.0211 0.0115 0.0022 ...
##  $ tempSingleNumPts   : num  1800 1800 1800 1800 1800 1800 1800 1800 1800 1800 ...
##  $ tempSingleExpUncert: num  0.13 0.13 0.13 0.13 0.129 ...
##  $ tempSingleStdErMean: num  0.0034 0.0042 0.0048 0.0047 0.0045 0.004 0.0026 0.0034 0.0025 0.0011 ...
##  $ finalQF            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ publicationDate    : chr  "20200621T115323Z" "20200621T115323Z" "20200621T115323Z" "20200621T115323Z" ...
##  $ release            : chr  "undetermined" "undetermined" "undetermined" "undetermined" ...

Quality Flags

The sensor data undergo a variety of quality assurance and quality control checks. Data can pass or fail these many checks. The expanded data package includes all of these quality flags, which can allow you to decide if not passing one of the checks will significantly hamper your research and if you should therefore remove the data from your analysis. The data that we are using is the basic data package and only includes the finalQF flag.

A pass of the check is 0, while a fail is 1. Let's see if we have data with a quality flag.

# Are there quality flags in your data? Count 'em up

sum(SAAT_30min$finalQF==1)

## [1] 20501

How do we want to deal with this quality flagged data. This may depend on why it is flagged and what questions you are asking. The expanded data package will be useful for determining this.

For our demonstration purposes here we will keep the flagged data for now.

What about null (NA) data?

# Are there NA's in your data? Count 'em up
sum(is.na(SAAT_30min$tempSingleMean) )

## [1] 19616

mean(SAAT_30min$tempSingleMean)

## [1] NA

Why was there no output?

We had previously seen that there are NA values in the temperature data. Given there are NA values, R, by default, won't calculate a mean (and many other summary statistics) as the NA values could skew the data.

na.rm=TRUE

tells R to ignore them for calculation,etc

# create new dataframe without NAs
SAAT_30min_noNA <- SAAT_30min %>%
	drop_na(tempSingleMean)  # tidyr function

# alternate base R
# SAAT_30min_noNA <- SAAT_30min[!is.na(SAAT_30min$tempSingleMean),]

# did it work?
sum(is.na(SAAT_30min_noNA$tempSingleMean))

## [1] 0

Scatterplots with ggplot

We can use ggplot to create scatter plots. Which data should we plot, as we have several options?

  • tempSingleMean: the mean temperature for the interval
  • tempSingleMinimum: the minimum temperature during the interval
  • tempSingleMaximum: the maximum temperature for the interval

Depending on exactly what question you are asking you may prefer to use one over the other. For many applications, the mean temperature of the one or 30 minute interval will provide the best representation of the data.

Let's plot it. (This is a plot of a large amount of data. It can take 1-2 mins to process. It is not essential for completing the next steps if this takes too much of your computer memory.)

# plot temp data
tempPlot <- ggplot(SAAT_30min, aes(startDateTime, tempSingleMean)) +
    geom_point() +
    ggtitle("Single Aspirated Air Temperature") +
    xlab("Date") + ylab("Temp (C)") +
    theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
    theme(text = element_text(size=18))

tempPlot

## Warning: Removed 19616 rows containing missing values (geom_point).

Scatter plot of mean temperatures for the year 2018 at the Smithsonian Conservation Biology Institute (SCBI). Plotted data shows erroneous sensor readings occured during late April/May 2018.

Given all the data -- 68,000+ observations -- it took a little while for that to plot.

What patterns can you see in the data?

Something odd seems to have happened in late April/May 2018. Since it is unlikely Virginia experienced -50C during this time, these are probably erroneous sensor readings and why we should probably remove data that are flagged with those quality flags.

Right now we are also looking at all the data points in the dataset. However, we may want to view or aggregate the data differently:

  • aggregated data: min, mean, or max over a some duration
  • the number of days since a freezing temperatures
  • or some other segregation of the data.

Given that in the previous tutorial, Work With NEON's Plant Phenology Data, we were working with phenology data collected on a daily scale let's aggregate to that level.

To make this plot better, lets do two things

  1. Remove flagged data
  2. Aggregate to a daily mean.

Subset to remove quality flagged data

We previously saw a fair number of data points that were flagged. Now we'll subset the data to remove those data points.

# subset and add C to name for "clean"
SAAT_30minC <- filter(SAAT_30min_noNA, SAAT_30min_noNA$finalQF==0)

# Do any quality flags remain? Count 'em up
sum(SAAT_30minC$finalQF==1)

## [1] 0

Now we can plot it with the clean data.

# plot temp data
tempPlot <- ggplot(SAAT_30minC, aes(startDateTime, tempSingleMean)) +
    geom_point() +
    ggtitle("Single Aspirated Air Temperature") +
    xlab("Date") + ylab("Temp (C)") +
    theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
    theme(text = element_text(size=18))

tempPlot

Scatter plot of mean temperatures for the year 2018 at the Smithsonian Conservation Biology Institute (SCBI). Plotted data now has been cleaned of the erroneous sensor readings by filtering out flagged data.

That looks better! But we still have the 30 min data.

Aggregate Data by Day

We can use the dplyr package functions to aggregate the data. However, we have to choose what product we want from the aggregation. Again, you might want daily minimum temps, mean temperature or maximum temps depending on your question.

In the context of phenology, minimum temperatures might be very important if you are interested in a species that is very frost susceptible. Any days with a minimum temperature below 0C could dramatically change the phenophase. For other species or climates, maximum thresholds may be very import. Or you might be most interested in the daily mean.

For this tutorial, let's stick with maximum daily temperature (of the interval means).

# convert to date, easier to work with
SAAT_30minC$Date <- as.Date(SAAT_30minC$startDateTime)

# did it work
str(SAAT_30minC$Date)

##  Date[1:67099], format: "2018-01-01" "2018-01-01" "2018-01-01" "2018-01-01" "2018-01-01" "2018-01-01" ...

# max of mean temp each day
temp_day <- SAAT_30minC %>%
	group_by(Date) %>%
	distinct(Date, .keep_all=T) %>%
	mutate(dayMax=max(tempSingleMean))

Now we can plot the cleaned up daily temperature.

# plot Air Temperature Data across 2018 using daily data
tempPlot_dayMax <- ggplot(temp_day, aes(Date, dayMax)) +
    geom_point() +
    ggtitle("Daily Max Air Temperature") +
    xlab("") + ylab("Temp (C)") +
    theme(plot.title = element_text(lineheight=.8, face="bold", size = 20)) +
    theme(text = element_text(size=18))

tempPlot_dayMax

Scatter plot of daily maximum temperatures(of 30 minute interval means) for the year 2018 at the Smithsonian Conservation Biology Institute (SCBI).

Thought questions:

  • What do we gain by this visualization?
  • What do we loose over the 30 minute intervals?

ggplot - Subset by Time

Sometimes we want to scale the x- or y-axis to a particular time subset without subsetting the entire data_frame. To do this, we can define start and end times. We can then define these limits in the scale_x_date object as follows:

scale_x_date(limits=start.end) +

Let's plot just the first three months of the year.

# Define Start and end times for the subset as R objects that are the time class
startTime <- as.Date("2018-01-01")
endTime <- as.Date("2018-03-31")

# create a start and end time R object
start.end <- c(startTime,endTime)
str(start.end)

##  Date[1:2], format: "2018-01-01" "2018-03-31"

# View data for first 3 months only
# And we'll add some color for a change. 
tempPlot_dayMax3m <- ggplot(temp_day, aes(Date, dayMax)) +
           geom_point(color="blue", size=1) +  # defines what points look like
           ggtitle("Air Temperature\n Jan - March") +
           xlab("Date") + ylab("Air Temperature (C)")+ 
           (scale_x_date(limits=start.end, 
                date_breaks="1 week",
                date_labels="%b %d"))
 
tempPlot_dayMax3m

## Warning: Removed 268 rows containing missing values (geom_point).

Scatter plot showing daily maximum temperatures(of 30 minute interval means) from the beginning of January 2018 through the end of March 2018 at the Smithsonian Conservation Biology Institute (SCBI).

Now we have the temperature data matching our Phenology data from the previous tutorial, we want to save it to our computer to use in future analyses (or the next tutorial). This is optional if you are continuing as you already have this data in R.

# Write .csv - this step is optional 
# This will write to your current working directory, change as desired.
write.csv( temp_day , file="NEONsaat_daily_SCBI_2018.csv", row.names=F)

#If you are using the downloaded example date, this code will write it to the 
# pheno data file. Note - this file is already a part of the download.

#write.csv(temp_day , file="NEON-pheno-temp-timeseries_v2/NEONsaat_daily_SCBI_2018.csv", row.names=F)

Get Lesson Code

02-drivers-pheno-change-temp.R

Plot Continuous & Discrete Data Together

Authors: Lee Stanish, Megan A. Jones, Natalie Robinson

Last Updated: May 7, 2021

This tutorial discusses ways to plot plant phenology (discrete time series) and single-aspirated temperature (continuous time series) together. It uses data frames created in the first two parts of this series, Work with NEON OS & IS Data - Plant Phenology & Temperature. If you have not completed these tutorials, please download the dataset below.

Objectives

After completing this tutorial, you will be able to:

  • plot multiple figures together with grid.arrange()
  • plot only a subset of dates

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • neonUtilities: install.packages("neonUtilities")
  • ggplot2: install.packages("ggplot2")
  • dplyr: install.packages("dplyr")
  • gridExtra: install.packages("gridExtra")

More on Packages in R – Adapted from Software Carpentry.

Download Data

This tutorial is designed to have you download data directly from the NEON portal API using the neonUtilities package. However, you can also directly download this data, prepackaged, from FigShare. This data set includes all the files needed for the Work with NEON OS & IS Data - Plant Phenology & Temperature tutorial series. The data are in the format you would receive if downloading them using the zipsByProduct() function in the neonUtilities package.

Direct Download: NEON Phenology & Temp Time Series Teaching Data Subset (v2 - 2017-2019 data) (12 MB)

To start, we need to set up our R environment. If you're continuing from the previous tutorial in this series, you'll only need to load the new packages.

# Install needed package (only uncomment & run if not already installed)
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("scales")

# Load required libraries
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(scales)

options(stringsAsFactors=F) #keep strings as character type not factors

# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd <- "~/Documents/data/" # Change this to match your local environment
setwd(wd)

If you don't already have the R objects, temp_day and phe_1sp_2018, loaded you'll need to load and format those data. If you do, you can skip this code.

# Read in data -> if in series this is unnecessary
temp_day <- read.csv(paste0(wd,'NEON-pheno-temp-timeseries/NEONsaat_daily_SCBI_2018.csv'))

phe_1sp_2018 <- read.csv(paste0(wd,'NEON-pheno-temp-timeseries/NEONpheno_LITU_Leaves_SCBI_2018.csv'))

# Convert dates
temp_day$Date <- as.Date(temp_day$Date)
# use dateStat - the date the phenophase status was recorded
phe_1sp_2018$dateStat <- as.Date(phe_1sp_2018$dateStat)

Separate Plots, Same Panel

In this dataset, we have phenology and temperature data from the Smithsonian Conservation Biology Institute (SCBI) NEON field site. There are a variety of ways we may want to look at this data, including aggregated at the site level, by a single plot, or viewing all plots at the same time but in separate plots. In the Work With NEON's Plant Phenology Data and the Work with NEON's Single-Aspirated Air Temperature Data tutorials, we created separate plots of the number of individuals who had leaves at different times of the year and the temperature in 2018.

However, plot the data next to each other to aid comparisons. The grid.arrange() function from the gridExtra package can help us do this.

# first, create one plot 
phenoPlot <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
    geom_bar(stat="identity", na.rm = TRUE) +
    ggtitle("Total Individuals in Leaf") +
    xlab("") + ylab("Number of Individuals")

# create second plot of interest
tempPlot_dayMax <- ggplot(temp_day, aes(Date, dayMax)) +
    geom_point() +
    ggtitle("Daily Max Air Temperature") +
    xlab("Date") + ylab("Temp (C)")

# Then arrange the plots - this can be done with >2 plots as well.
grid.arrange(phenoPlot, tempPlot_dayMax) 

One graphic showing two plots arranged vertically by using the grid.arrange function form the gridExtra package. The top plot shows a bar plot of the counts of Liriodendrum tulipifera (LITU) individuals at the Smithsonian Conservation Biology Institute (SCBI) for the year 2018. The bottom plot shows a scatter plot of daily maximum temperatures(of 30 minute interval means) for the year 2018 at the Smithsonian Conservation Biology Institute (SCBI).

Now, we can see both plots in the same window. But, hmmm... the x-axis on both plots is kinda wonky. We want the same spacing in the scale across the year (e.g., July in one should line up with July in the other) plus we want the dates to display in the same format(e.g. 2016-07 vs. Jul vs Jul 2018).

Format Dates in Axis Labels

The date format parameter can be adjusted with scale_x_date. Let's format the x-axis ticks so they read "month" (%b) in both graphs. We will use the syntax:

scale_x_date(labels=date_format("%b"")

Rather than re-coding the entire plot, we can add the scale_x_date element to the plot object phenoPlot we just created.

**Data Tip:**
  1. You can type ?strptime into the R console to find a list of date format conversion specifications (e.g. %b = month). Type scale_x_date for a list of parameters that allow you to format dates on the x-axis.

  2. If you are working with a date & time class (e.g. POSIXct), you can use scale_x_datetime instead of scale_x_date.

# format x-axis: dates
phenoPlot <- phenoPlot + 
  (scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b")))

tempPlot_dayMax <- tempPlot_dayMax +
  (scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b")))

# New plot. 
grid.arrange(phenoPlot, tempPlot_dayMax) 

Graphic showing the arranged plots created in the previous step, with the x-axis formatted to only read 'month' in both plots. However, it is important to note that this step only partially fixes the problem. The plots still have different ranges on the x-axis, which makes it harder to see trends. The top plot shows a bar plot of the counts of Liriodendrum tulipifera (LITU) individuals at the Smithsonian Conservation Biology Institute (SCBI) for the year 2018. The bottom plot shows a scatter plot of daily maximum temperatures(of 30 minute interval means) for the year 2018 at the Smithsonian Conservation Biology Institute (SCBI).

But this only solves one of the problems, we still have a different range on the x-axis which makes it harder to see trends.

Align data sets with different start dates

Now let's work to align the values on the x-axis. We can do this in two ways,

  1. setting the x-axis to have the same date range or 2) by filtering the dataset itself to only include the overlapping data. Depending on what you are trying to demonstrate and if you're doing additional analyses and want only the overlapping data, you may prefer one over the other. Let's try both.

Set range of x-axis

Alternatively, we can set the x-axis range for both plots by adding the limits parameter to the scale_x_date() function.

# first, lets recreate the full plot and add in the 
phenoPlot_setX <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
    geom_bar(stat="identity", na.rm = TRUE) +
    ggtitle("Total Individuals in Leaf") +
    xlab("") + ylab("Number of Individuals") +
    scale_x_date(breaks = date_breaks("1 month"), 
                  labels = date_format("%b"),
                  limits = as.Date(c('2018-01-01','2018-12-31')))

# create second plot of interest
tempPlot_dayMax_setX <- ggplot(temp_day, aes(Date, dayMax)) +
    geom_point() +
    ggtitle("Daily Max Air Temperature") +
    xlab("Date") + ylab("Temp (C)") +
    scale_x_date(date_breaks = "1 month", 
                 labels=date_format("%b"),
                  limits = as.Date(c('2018-01-01','2018-12-31')))

# Plot
grid.arrange(phenoPlot_setX, tempPlot_dayMax_setX) 

Graphic showing the arranged plots created in the previous step, with the x-axis formatted to only read 'month', and scaled so they align with each other. This is achieved by adding the limits parameter to the scale_x_date function in the ggplot call. The top plot shows a bar plot of the counts of Liriodendrum tulipifera (LITU) individuals at the Smithsonian Conservation Biology Institute (SCBI) for the year 2018. The bottom plot shows a scatter plot of daily maximum temperatures(of 30 minute interval means) for the year 2018 at the Smithsonian Conservation Biology Institute (SCBI).

Now we can really see the pattern over the full year. This emphasizes the point that during much of the late fall, winter, and early spring none of the trees have leaves on them (or that data were not collected - this plot would not distinguish between the two).

Subset one data set to match other

Alternatively, we can simply filter the dataset with the larger date range so the we only plot the data from the overlapping dates.

# filter to only having overlapping data
temp_day_filt <- filter(temp_day, Date >= min(phe_1sp_2018$dateStat) & 
                         Date <= max(phe_1sp_2018$dateStat))

# Check 
range(phe_1sp_2018$date)

## [1] "2018-04-13" "2018-11-20"

range(temp_day_filt$Date)

## [1] "2018-04-13" "2018-11-20"

#plot again
tempPlot_dayMaxFiltered <- ggplot(temp_day_filt, aes(Date, dayMax)) +
    geom_point() +
    scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
    ggtitle("Daily Max Air Temperature") +
    xlab("Date") + ylab("Temp (C)")


grid.arrange(phenoPlot, tempPlot_dayMaxFiltered)

Graphic of the arranged plots created in the previous steps with only the data that overlap. This was achieved by filtering the daily max temperature data by the observation date in the total individuals in Leaf dataset. The top plot shows a bar plot of the counts of Liriodendrum tulipifera (LITU) individuals at the Smithsonian Conservation Biology Institute (SCBI) for the year 2018. The bottom plot shows a scatter plot of daily maximum temperatures(of 30 minute interval means) for the year 2018 at the Smithsonian Conservation Biology Institute (SCBI).

With this plot, we really look at the area of overlap in the plotted data (but this does cut out the time where the data are collected but not plotted).

Same plot with two Y-axes

What about layering these plots and having two y-axes (right and left) that have the different scale bars?

Some argue that you should not do this as it can distort what is actually going on with the data. The author of the ggplot2 package is one of these individuals. Therefore, you cannot use ggplot() to create a single plot with multiple y-axis scales. You can read his own discussion of the topic on this StackOverflow post.


However, individuals have found work arounds for these plots. The code below is provided as a demonstration of this capability. Note, by showing this code here, we don't necessarily endorse having plots with two y-axes.

This code is adapted from code by Jake Heare.

# Source: http://heareresearch.blogspot.com/2014/10/10-30-2014-dual-y-axis-graph-ggplot2_30.html

# Additional packages needed
library(gtable)
library(grid)


# Plot 1: Pheno data as bars, temp as scatter
grid.newpage()
phenoPlot_2 <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
  geom_bar(stat="identity", na.rm = TRUE) +
  scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b")) +
  ggtitle("Total Individuals in Leaf vs. Temp (C)") +
  xlab(" ") + ylab("Number of Individuals") +
  theme_bw()+
  theme(legend.justification=c(0,1),
        legend.position=c(0,1),
        plot.title=element_text(size=25,vjust=1),
        axis.text.x=element_text(size=20),
        axis.text.y=element_text(size=20),
        axis.title.x=element_text(size=20),
        axis.title.y=element_text(size=20))


tempPlot_dayMax_corr_2 <- ggplot() +
  geom_point(data = temp_day_filt, aes(Date, dayMax),color="red") +
  scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
  xlab("") + ylab("Temp (C)") +
  theme_bw() %+replace% 
  theme(panel.background = element_rect(fill = NA),
        panel.grid.major.x=element_blank(),
        panel.grid.minor.x=element_blank(),
        panel.grid.major.y=element_blank(),
        panel.grid.minor.y=element_blank(),
        axis.text.y=element_text(size=20,color="red"),
        axis.title.y=element_text(size=20))

g1<-ggplot_gtable(ggplot_build(phenoPlot_2))
g2<-ggplot_gtable(ggplot_build(tempPlot_dayMax_corr_2))

pp<-c(subset(g1$layout,name=="panel",se=t:r))
g<-gtable_add_grob(g1, g2$grobs[[which(g2$layout$name=="panel")]],pp$t,pp$l,pp$b,pp$l)

ia<-which(g2$layout$name=="axis-l")
ga <- g2$grobs[[ia]]
ax <- ga$children[[2]]
ax$widths <- rev(ax$widths)
ax$grobs <- rev(ax$grobs)
ax$grobs[[1]]$x <- ax$grobs[[1]]$x - unit(1, "npc") + unit(0.15, "cm")
g <- gtable_add_cols(g, g2$widths[g2$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)

grid.draw(g)

# Plot 2: Both pheno data and temp data as line graphs
grid.newpage()
phenoPlot_3 <- ggplot(phe_1sp_2018, aes(dateStat, countYes)) +
  geom_line(na.rm = TRUE) +
  scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
  ggtitle("Total Individuals in Leaf vs. Temp (C)") +
  xlab("Date") + ylab("Number of Individuals") +
  theme_bw()+
  theme(legend.justification=c(0,1),
        legend.position=c(0,1),
        plot.title=element_text(size=25,vjust=1),
        axis.text.x=element_text(size=20),
        axis.text.y=element_text(size=20),
        axis.title.x=element_text(size=20),
        axis.title.y=element_text(size=20))

tempPlot_dayMax_corr_3 <- ggplot() +
  geom_line(data = temp_day_filt, aes(Date, dayMax),color="red") +
  scale_x_date(breaks = date_breaks("months"), labels = date_format("%b")) +
  xlab("") + ylab("Temp (C)") +
  theme_bw() %+replace% 
  theme(panel.background = element_rect(fill = NA),
        panel.grid.major.x=element_blank(),
        panel.grid.minor.x=element_blank(),
        panel.grid.major.y=element_blank(),
        panel.grid.minor.y=element_blank(),
        axis.text.y=element_text(size=20,color="red"),
        axis.title.y=element_text(size=20))

g1<-ggplot_gtable(ggplot_build(phenoPlot_3))
g2<-ggplot_gtable(ggplot_build(tempPlot_dayMax_corr_3))

pp<-c(subset(g1$layout,name=="panel",se=t:r))
g<-gtable_add_grob(g1, g2$grobs[[which(g2$layout$name=="panel")]],pp$t,pp$l,pp$b,pp$l)

ia<-which(g2$layout$name=="axis-l")
ga <- g2$grobs[[ia]]
ax <- ga$children[[2]]
ax$widths <- rev(ax$widths)
ax$grobs <- rev(ax$grobs)
ax$grobs[[1]]$x <- ax$grobs[[1]]$x - unit(1, "npc") + unit(0.15, "cm")
g <- gtable_add_cols(g, g2$widths[g2$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)

grid.draw(g)

Get Lesson Code

03-plot-discrete-continuous-data-pheno-temp.R

Download and work with NEON Aquatic Instrument Data

Authors: Bobby Hensley, Guy Litt, Megan Jones

Last Updated: Apr 8, 2021

This tutorial covers downloading NEON Aquatic Instrument System (AIS) data, using the neonUtilities R package, as well as basic instruction in beginning to explore and work with the downloaded data, including guidance in navigating data documentation, separating data using the horizontal location (HOR) variable, interpreting quality flags, and resampling time intervals.

The following material steps through the multiple considerations in interpreting NEON data, and ultimately achieves a data comparison between two different sensors at nearby locations that are published at different time intervals. This sort of data wrangling is useful for comparing different data streams, and/or preparing data into a consistent format for modeling.

Objectives

After completing this activity, you will be able to:

  • Download NEON AIS data using the neonUtilities package.
  • Understand downloaded data sets and load them into R for analyses.
  • Separate data collected at different sensor locations using the HOR variable.
  • Understand and interpret quality flags, including how to discover what non-
  • standard quality flags mean.
  • Aggregate time series to higher intervals and impute (fill-in) observations
  • where absent.

Things You'll Need To Complete This Tutorial

To complete this tutorial you will need R (version >3.4) and, preferably, RStudio loaded on your computer.

Install R Packages

  • neonUtilities: Basic functions for accessing NEON data
  • ggplot2: Plotting functions
  • dplyr: Data manipulation functions
  • padr: Time-series data preparation functions

These packages are on CRAN and can be installed by install.packages().

Additional Resources

  • GitHub repository for neonUtilities

Download Files and Load Directly to R: loadByProduct()

The most popular function in neonUtilities is loadByProduct(). This function downloads data from the NEON API, merges the site-by-month files, and loads the resulting data tables into the R environment, assigning each data type to the appropriate R class. This is a popular choice because it ensures you're always working with the most up-to-date data, and it ends with ready-to-use tables in R. However, if you use it in a workflow you run repeatedly, keep in mind it will re-download the data every time.

Before we get the NEON data, we need to install (if not already done) and load the neonUtilities R package, as well as other packages we will use in the analysis.

# Install neonUtilities package if you have not yet.
install.packages("neonUtilities")
install.packages("ggplot2")
install.packages("dplyr")
install.packages("padr")


# Set global option to NOT convert all character variables to factors
options(stringsAsFactors=F)

# Load required packages
library(neonUtilities)
library(ggplot2)
library(dplyr)
library(padr)

The inputs to loadByProduct() control which data to download and how to manage the processing. The following are frequently used inputs:

  • dpID: the data product ID, e.g. DP1.20288.001
  • site: defaults to "all", meaning all sites with available data; can be a vector of 4-letter NEON site codes, e.g. c("MART","ARIK","BARC").
  • startdate and enddate: defaults to NA, meaning all dates with available data; or a date in the form YYYY-MM, e.g. 2017-06. Since NEON data are provided in month packages, finer scale querying is not available. Both start and end date are inclusive.
  • package: either basic or expanded data package. Expanded data packages generally include additional information about data quality, such as individual quality flag test results. Not every NEON data product has an expanded package; if the expanded package is requested but there isn't one, the basic package will be downloaded.
  • avg: defaults to "all", to download all data; or the number of minutes in the averaging interval. See example below; only applicable to IS data.
  • savepath: the file path you want to download to; defaults to the working directory.
  • check.size: T or F; should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set this to F.
  • token: this allows you to input your NEON API token to obtain faster downloads. Learn more about NEON API tokens in the Using an API Token when Accessing NEON Data with neonUtilities tutorial.

There are additional inputs you can learn about in the Use the neonUtilities R Package to Access NEON Data tutorial.

The dpID is the data product identifier of the data you want to download. The DPID can be found on the Explore Data Products page.

It will be in the form DP#.#####.###. For this tutorial, we'll use some data products collected in NEON's Aquatic Instrument System:

  • DP1.20288.001: Water quality
  • DP1.20033.001: Nitrate in surface water
  • DP1.20016.001: Elevation of surface water

Now it's time to consider the NEON field site of interest. If not specified, the default will download a data product from all sites. The following are 4-letter site codes for NEON's 34 aquatics sites as of 2020:

ARIK = Arikaree River CO
BARC = Barco Lake FL
BIGC = Upper Big Creek CA
BLDE = Black Deer Creek WY
BLUE = Blue River OK
BLWA = Black Warrior River AL
CARI = Caribou Creek AK
COMO = Como Creek CO
CRAM = Crampton Lake WI
CUPE = Rio Cupeyes PR
FLNT = Flint River GA
GUIL = Rio Guilarte PR
HOPB = Lower Hop Brook MA
KING = Kings Creek KS
LECO = LeConte Creek TN
LEWI = Lewis Run VA
LIRO = Little Rock Lake WI
MART = Martha Creek WA
MAYF = Mayfield Creek AL
MCDI = McDiffett Creek KS
MCRA = McRae Creek OR
OKSR = Oksrukuyik Creek AK
POSE = Posey Creek VA
PRIN = Pringle Creek TX
PRLA = Prairie Lake ND
PRPO = Prairie Pothole ND
REDB = Red Butte Creek UT
SUGG = Suggs Lake FL
SYCA = Sycamore Creek AZ
TECR = Teakettle Creek CA
TOMB = Lower Tombigbee River AL
TOOK = Toolik Lake AK
WALK = Walker Branch TN
WLOU = West St Louis Creek CO

In this exercise, we will want data from only one NEON field site, Pringle Creek, TX (PRIN) from February, 2020.

Now let us download our data. If you are not using a NEON token to download your data, neonUtilities will ignore the token input. We set check.size = F
so that the script runs well but remember you always want to check your download size first.

# download data of interest - Water Quality
waq <- loadByProduct(dpID="DP1.20288.001", site="PRIN", 
                     startdate="2020-02", enddate="2020-02", 
                     package="expanded", 
                     token = Sys.getenv("NEON_TOKEN"),
                     check.size = F)
### Challenge: Download Other Related Data products

Using what you've learned above, can you modify the code to download data for the following parameters?

  • Data Product DP1.20033.001: nitrate in surface water
  • Data Product DP1.20016.001: elevation of surface water
  • The expanded data tables
  • Dates matching the other data products you've downloaded
  1. What is the size of the downloaded data?
  2. Without downloading all the data, how can you tell the difference in size between the "expanded" and "basic" packages?
# download data of interest - Nitrate in Suface Water
nsw <-  loadByProduct(dpID="DP1.20033.001", site="PRIN", 
                      startdate="2020-02", enddate="2020-02", 
                      package="expanded", 
                      token = Sys.getenv("NEON_TOKEN"),
                      check.size = F)

# #1. 2.0 MiB
# #2. You can change check.size to True (T), and compare "basic" vs "expaneded"
# package types. The basic package is 37.0 KiB, and the expanded is 42.4 KiB. 


# download data of interest - Elevation of surface water
eos <- loadByProduct(dpID="DP1.20016.001", site="PRIN",
                     startdate="2020-02", enddate="2020-02",
                     package="expanded",
                     token = Sys.getenv("NEON_TOKEN"),
                     check.size = F)

Files Associated with Downloads

The data we've downloaded comes as an object that is a named list of objects. To work with each of them, select them from the list using the $ operator.

# view all components of the list
names(waq)

## [1] "readme_20288"           "sensor_positions_20288" "variables_20288"       
## [4] "waq_instantaneous"

# View the dataFrame
View(waq$waq_instantaneous)

We can see that there are four objects in the downloaded water quality data. One dataframe of data (waq_instantaneous) and three metadata files.

If you'd like you can use the $ operator to assign an object from an item in the list. If you prefer to extract each table from the list and work with it as independent objects, which we will do, you can use the list2env() function.

# unlist the variables and add to the global environment
list2env(waq, .GlobalEnv)

## <environment: R_GlobalEnv>

So what exactly are these four files and why would you want to use them?

  • data file(s): There will always be one or more dataframes that include the primary data of the data product you downloaded. Multiple dataframes are available when there are related datatables for a single data product.
  • readme_xxxxx: The readme file, with the corresponding 5 digits from the data product number, provides you with important information relevant to the data product and the specific instance of downloading the data. Here you can find manual flagging notes for all sites, locations, and time periods.
  • sensor_postions_xxxxx: this file contains information about the coordinates of each sensor, relative to a reference location.
  • variables_xxxxx: this file contains all the variables found in the associated data table(s). This includes full definitions, units, and other important information.

Let's perform the same thing for the surface water nitrate and elevation of surface water data products too:

list2env(nsw, .GlobalEnv)

## <environment: R_GlobalEnv>

list2env(eos, .GlobalEnv)

## <environment: R_GlobalEnv>

Note that a few more objects were added to the Global Environment, including:

  • NSW_15_minute
  • EOS_5_min
  • EOS_30_min

The 15_minute name indicates the time-averaging intervals in a dataset. Other examples may include 5_min and 30_min in the same data product, such as elevation of surface water (DP1.20016.001). If only one time average interests you, you may specify the time interval of interest when downloading the data when calling neonUtilities::loadByProduct().

Data from Different Sensor Locations (HOR)

NEON often collects the same type of data from sensors in different locations. These data are delivered together but you will frequently want to plot the data separately or only include data from one sensor in your analysis. NEON uses the horizontalPosition variable in the data tables to describe which sensor data is collected from. The horizontalPosition is always a three digit number for AIS data. Non-shoreline HOR examples as of 2020 at AIS sites include:

  • 101: stream sensors located at the upstream station on a monopod mount,
  • 111: stream sensors located at the upstream station on an overhead cable mount,
  • 131: stream sensors located at the upstream station on a stand alone pressure transducer mount,
  • 102: stream sensors located at the downstream station on a monopod mount,
  • 112: stream sensors located at the downstream station on an overhead cable mount
  • 132: stream sensors located at the downstream station on a stand alone pressure transducer mount,
  • 110: pressure transducers mounted to a staff gauge.
  • 103: sensors mounted on buoys in lakes or rivers
  • 130 and 140: sensors mounted in the littoral zone of lakes

You'll frequently want to know which sensor locations are represented in your data. We can do this by looking for the unique() position designations in horizontalPostions.

# which sensor locations exist for water quality, DP1.20288.001?
print("Water quality horizontal positions:")

## [1] "Water quality horizontal positions:"

unique(waq_instantaneous$horizontalPosition)

## [1] "101" "102"

We can see that there are two water quality sensor positions at PRIN in February 2020. As the locations of sensors can change at sites over time (especially with aquatic sensors as AIS sites undergo redesigns) it is a good idea to check horizontal positions when you're adding in new locations or a new date range to your analyses.

Let's check the HOR locations for surface water nitrate and elevation too:

# which sensor locations exist for other data products?
print("Nitrate in Surface Water horizontal positions: ")

## [1] "Nitrate in Surface Water horizontal positions: "

unique(NSW_15_minute$horizontalPosition)

## [1] "102"

print("Elevation of Surface Water horizontal positions: ")

## [1] "Elevation of Surface Water horizontal positions: "

unique(EOS_30_min$horizontalPosition)

## [1] "110" "132"

Now we can use this information to split water quality data into the two different sensor set locations: upstream and the downstream.

# Split data into separate dataframes by upstream/downstream locations.

waq_up <- 
  waq_instantaneous[(waq_instantaneous$horizontalPosition=="101"),]
waq_down <- 
  waq_instantaneous[(waq_instantaneous$horizontalPosition=="102"),]

# Note: The surface water nitrate sensor is only stationed at one location.

eos_up <- EOS_30_min[(EOS_30_min$horizontalPosition=="110"),]
eos_down <- EOS_30_min[(EOS_30_min$horizontalPosition=="132"),]

Plot Data

Now that we have our data separated into the upstream and downstream data, let's plot both of the data sets together. We want to create a plot of the measures of Dissolved Oxygen from the two different sensors.

First, let's identify the column names important for plotting - time and dissolved oxygen data:

# One option is to view column names in the data frame
colnames(waq_instantaneous)

##   [1] "domainID"                        "siteID"                         
##   [3] "horizontalPosition"              "verticalPosition"               
##   [5] "startDateTime"                   "endDateTime"                    
##   [7] "sensorDepth"                     "sensorDepthExpUncert"           
##   [9] "sensorDepthRangeQF"              "sensorDepthNullQF"              
##  [11] "sensorDepthGapQF"                "sensorDepthValidCalQF"          
##  [13] "sensorDepthSuspectCalQF"         "sensorDepthPersistQF"           
##  [15] "sensorDepthAlphaQF"              "sensorDepthBetaQF"              
##  [17] "sensorDepthFinalQF"              "sensorDepthFinalQFSciRvw"       
##  [19] "specificConductance"             "specificConductanceExpUncert"   
##  [21] "specificConductanceRangeQF"      "specificConductanceStepQF"      
##  [23] "specificConductanceNullQF"       "specificConductanceGapQF"       
##  [25] "specificConductanceSpikeQF"      "specificConductanceValidCalQF"  
##  [27] "specificCondSuspectCalQF"        "specificConductancePersistQF"   
##  [29] "specificConductanceAlphaQF"      "specificConductanceBetaQF"      
##  [31] "specificCondFinalQF"             "specificCondFinalQFSciRvw"      
##  [33] "dissolvedOxygen"                 "dissolvedOxygenExpUncert"       
##  [35] "dissolvedOxygenRangeQF"          "dissolvedOxygenStepQF"          
##  [37] "dissolvedOxygenNullQF"           "dissolvedOxygenGapQF"           
##  [39] "dissolvedOxygenSpikeQF"          "dissolvedOxygenValidCalQF"      
##  [41] "dissolvedOxygenSuspectCalQF"     "dissolvedOxygenPersistenceQF"   
##  [43] "dissolvedOxygenAlphaQF"          "dissolvedOxygenBetaQF"          
##  [45] "dissolvedOxygenFinalQF"          "dissolvedOxygenFinalQFSciRvw"   
##  [47] "dissolvedOxygenSaturation"       "dissolvedOxygenSatExpUncert"    
##  [49] "dissolvedOxygenSatRangeQF"       "dissolvedOxygenSatStepQF"       
##  [51] "dissolvedOxygenSatNullQF"        "dissolvedOxygenSatGapQF"        
##  [53] "dissolvedOxygenSatSpikeQF"       "dissolvedOxygenSatValidCalQF"   
##  [55] "dissOxygenSatSuspectCalQF"       "dissolvedOxygenSatPersistQF"    
##  [57] "dissolvedOxygenSatAlphaQF"       "dissolvedOxygenSatBetaQF"       
##  [59] "dissolvedOxygenSatFinalQF"       "dissolvedOxygenSatFinalQFSciRvw"
##  [61] "pH"                              "pHExpUncert"                    
##  [63] "pHRangeQF"                       "pHStepQF"                       
##  [65] "pHNullQF"                        "pHGapQF"                        
##  [67] "pHSpikeQF"                       "pHValidCalQF"                   
##  [69] "pHSuspectCalQF"                  "pHPersistenceQF"                
##  [71] "pHAlphaQF"                       "pHBetaQF"                       
##  [73] "pHFinalQF"                       "pHFinalQFSciRvw"                
##  [75] "chlorophyll"                     "chlorophyllExpUncert"           
##  [77] "chlorophyllRangeQF"              "chlorophyllStepQF"              
##  [79] "chlorophyllNullQF"               "chlorophyllGapQF"               
##  [81] "chlorophyllSpikeQF"              "chlorophyllValidCalQF"          
##  [83] "chlorophyllSuspectCalQF"         "chlorophyllPersistenceQF"       
##  [85] "chlorophyllAlphaQF"              "chlorophyllBetaQF"              
##  [87] "chlorophyllFinalQF"              "chlorophyllFinalQFSciRvw"       
##  [89] "turbidity"                       "turbidityExpUncert"             
##  [91] "turbidityRangeQF"                "turbidityStepQF"                
##  [93] "turbidityNullQF"                 "turbidityGapQF"                 
##  [95] "turbiditySpikeQF"                "turbidityValidCalQF"            
##  [97] "turbiditySuspectCalQF"           "turbidityPersistenceQF"         
##  [99] "turbidityAlphaQF"                "turbidityBetaQF"                
## [101] "turbidityFinalQF"                "turbidityFinalQFSciRvw"         
## [103] "fDOM"                            "rawCalibratedfDOM"              
## [105] "fDOMExpUncert"                   "fDOMRangeQF"                    
## [107] "fDOMStepQF"                      "fDOMNullQF"                     
## [109] "fDOMGapQF"                       "fDOMSpikeQF"                    
## [111] "fDOMValidCalQF"                  "fDOMSuspectCalQF"               
## [113] "fDOMPersistenceQF"               "fDOMAlphaQF"                    
## [115] "fDOMBetaQF"                      "fDOMTempQF"                     
## [117] "fDOMAbsQF"                       "fDOMFinalQF"                    
## [119] "fDOMFinalQFSciRvw"               "buoyNAFlag"                     
## [121] "spectrumCount"                   "publicationDate"                
## [123] "release"

# Alternatively, view the variables object corresponding to the data product for more information
View(variables_20288)

Quite a few columns in the water quality data product!

The time column we'll consider for instrumented systems is endDateTime because it approximately represents data within the interval on or before the endDateTime time stamp. Timestamp column choice matters for time-aggregated datasets, but should not matter for instantaneous data such as water quality.

When interpreting data, keep in mind NEON timestamps are always in UTC.

The data column we would like to plot is labeled dissolvedOxygen.

# plot
wqual <- ggplot() +
	geom_line(data = waq_up, 
	          aes(endDateTime, dissolvedOxygen,color="a"), 
	          na.rm=TRUE ) +
	geom_line(data = waq_down, 
	          aes(endDateTime, dissolvedOxygen, color="b"), 
	          na.rm=TRUE) +
	geom_line(na.rm = TRUE) +
	ylim(0, 20) + ylab("Dissolved Oxygen (mg/L)") +
	xlab(" ") +
  scale_color_manual(values = c("blue","red"),
                     labels = c("upstream","downstream")) +
  labs(colour = "") + # Remove legend title
  theme(legend.position = "top") +
  ggtitle("PRIN Upstream and Downstream DO")
  
  

wqual

Line plot of dissolved oxygen in mg/L measured at the upstream(blue) and downstream(red) stations of the Pringle Creek site.

Now let's try plotting fDOM. fDOM is only measured at the downstream location. NEON also provides uncertainty values for each measurement. Let's also consider measurement uncertainty in the plot.

The data columns we would like to plot are labeled fDOM and fDOMExpUncert.

# plot
fdomUcert <- ggplot() +
	geom_line(data = waq_down, 
	          aes(endDateTime, fDOM), 
	          na.rm=TRUE, color="orange") +
  geom_ribbon(data=waq_down, 
              aes(x=endDateTime, 
                  ymin = (fDOM - fDOMExpUncert), 
                  ymax = (fDOM + fDOMExpUncert)), 
              alpha = 0.4, fill = "grey75") +
	geom_line( na.rm = TRUE) +
	ylim(0, 200) + ylab("fDOM (QSU)") +
	xlab(" ") +
  ggtitle("PRIN Downstream fDOM with Expected Uncertainty Bounds") 

fdomUcert

Line plot of fDOM(QSU) with expected uncertainty from the downstream station of Pringle Creek.

### Challenge: Plot Nitrate in Surface Water Data

Using what you've learned above, identify horizontal postions and column names for nitrate in surface water.

# recall dataframes created in list2env() command, including NSW_15_minute

# which sensor locations?
unique(NSW_15_minute$horizontalPosition)

# what is the column name of the data stream of interest?
names(NSW_15_minute)

Using what you've learned above, plot nitrate in surface water.

# plot
plot_NSW <- ggplot(data = NSW_15_minute,
                   aes(endDateTime, surfWaterNitrateMean)) +
                   geom_line(na.rm=TRUE, color="blue") + 
                   ylab("NO3-N (uM)") + xlab(" ") +
                   ggtitle("PRIN Downstream Nitrate in Surface Water")

plot_NSW

Nitrate in surface water(uM) from the downstream station of Pringle Creek. Note that there is missing data from February 18th through February 24th.

Examine Quality Flagged Data

Data product quality flags fall under two distinct types:

  • Automated quality flags, e.g. range, spike, step, null
  • Manual science review quality flag

In instantaneous data such as water quality DP1.20288.001, the quality flag columns are denoted with "QF".

In time-averaged data, most quality flags have been aggregated into quality metrics, with column names denoted with "QM" representing the fraction of flagged points within the time averaging window.

waq_qf_names <- names(waq_down)[grep("QF", names(waq_down))]

print(paste0("Total columns in DP1.20288.001 expanded package = ", 
             as.character(length(waq_qf_names))))

## [1] "Total columns in DP1.20288.001 expanded package = 96"

# water quality has 96 data columns with QF in the name, 
# so let's just look at those corresponding to fDOM
print("fDOM columns in DP1.20288.001 expanded package:")

## [1] "fDOM columns in DP1.20288.001 expanded package:"

print(waq_qf_names[grep("fDOM", waq_qf_names)])

##  [1] "fDOMRangeQF"       "fDOMStepQF"        "fDOMNullQF"        "fDOMGapQF"        
##  [5] "fDOMSpikeQF"       "fDOMValidCalQF"    "fDOMSuspectCalQF"  "fDOMPersistenceQF"
##  [9] "fDOMAlphaQF"       "fDOMBetaQF"        "fDOMTempQF"        "fDOMAbsQF"        
## [13] "fDOMFinalQF"       "fDOMFinalQFSciRvw"

A quality flag (QF) of 0 indicates a pass, 1 indicates a fail, and -1 indicates a test that could not be performed. For example, a range test cannot be performed on missing measurements.

Detailed quality flags test results are all available in the package = 'expanded' setting we specified when calling neonUtilities::loadByProduct(). If we had specified package = 'basic', we wouldn't be able to investigate the detail in the type of data flag thrown. We would only see the FinalQF columns.

The AlphaQF and BetaQF represent aggregated results of various QF tests, and vary by a data product's algorithm. In most cases, an observation's AlphaQF = 1 indicates whether or not at least one QF was set to a value of 1, and an observation's BetaQF = 1 indicates whether or not at least one QF was set to value of -1.

Note that fDOM has a couple other data-stream specific QFs beyond the standard quality flags. These are specific to the algorithms used to correct raw fDOM readings using temperature and absorbance per Watras et al. (2011) and Downing et al. (2012).

Let's consider what types of fDOM quality flags were thrown.

waq_qf_names <- names(waq_down)[grep("QF", names(waq_down))]

print(paste0("Total QF columns: ",length(waq_qf_names)))

## [1] "Total QF columns: 96"

# water quality has 96 data columns with QF in the name, 
# so let us just look at those corresponding to fDOM
fdom_qf_names <- waq_qf_names[grep("fDOM",waq_qf_names)]

for(col_nam in fdom_qf_names){
  print(paste0(col_nam, " unique values: ", 
               paste0(unique(waq_down[,col_nam]), 
                      collapse = ", ")))
}

## [1] "fDOMRangeQF unique values: 0, -1"
## [1] "fDOMStepQF unique values: 0, 1, -1"
## [1] "fDOMNullQF unique values: 0, 1"
## [1] "fDOMGapQF unique values: 0, 1"
## [1] "fDOMSpikeQF unique values: 0, -1, 1"
## [1] "fDOMValidCalQF unique values: 0"
## [1] "fDOMSuspectCalQF unique values: 0"
## [1] "fDOMPersistenceQF unique values: 0"
## [1] "fDOMAlphaQF unique values: 0, 1"
## [1] "fDOMBetaQF unique values: 0, 1"
## [1] "fDOMTempQF unique values: 0, 1, -1"
## [1] "fDOMAbsQF unique values: 0, -1, 1, 2"
## [1] "fDOMFinalQF unique values: 0, 1"
## [1] "fDOMFinalQFSciRvw unique values: NA"

QF values generally mean the following:

  • 0: Quality test passed
  • 1: Quality test failed
  • -1: Quality test could not be run
  • 2: A special case for fDOMAbsQF

So what does fDOMAbsQF = 2 mean? The data product's variable descriptions may provide us some clues.

Recall we previously viewed the water quality variables object that comes with every NEON data download. Now let's print the description corresponding to the fDOMAbsQF field name.

print(variables_20288$description[which(variables_20288$fieldName == "fDOMAbsQF")])

## [1] "Quality flag indicating that fDOM absorbance corrections were applied = 0; unable to be applied = 1; absorbance values were high = 2; calculated correction factor was 1 (i.e. no absorbance correction was made) = 3"

So whenever fDOMAbsQF = 2, the absorbance values coming from the SUNA (surface water nitrate sensor) were high.

Now let's consider the total number of flags generated for each quality test:

# Loop across the fDOM QF column names. 
#  Within each column, count the number of rows that equal '1'.
print("FLAG TEST - COUNT")

## [1] "FLAG TEST - COUNT"

for (col_nam in fdom_qf_names){
  totl_qf_in_col <- length(which(waq_down[,col_nam] == 1))
  print(paste0(col_nam,": ",totl_qf_in_col))
}

## [1] "fDOMRangeQF: 0"
## [1] "fDOMStepQF: 770"
## [1] "fDOMNullQF: 233"
## [1] "fDOMGapQF: 218"
## [1] "fDOMSpikeQF: 71"
## [1] "fDOMValidCalQF: 0"
## [1] "fDOMSuspectCalQF: 0"
## [1] "fDOMPersistenceQF: 0"
## [1] "fDOMAlphaQF: 9997"
## [1] "fDOMBetaQF: 238"
## [1] "fDOMTempQF: 9"
## [1] "fDOMAbsQF: 9016"
## [1] "fDOMFinalQF: 9997"
## [1] "fDOMFinalQFSciRvw: 0"

# Let's also check out how many fDOMAbsQF = 2 exist
print(paste0("fDOMAbsQF = 2: ",
             length(which(waq_down[,"fDOMAbsQF"] == 2))))

## [1] "fDOMAbsQF = 2: 210"

print(paste0("Total fDOM observations: ", nrow(waq_down) ))

## [1] "Total fDOM observations: 41769"

Above lists the total fDOM QFs from a month of data at PRIN, as well as the total number of observation data points in the data file.

We see a notably higher quantity of fDOMAbsQF relative to other quality flags. Why is that? How do we know where to look?

The variables_20228 included in the download would be a good place to start. Let's check the description for fDOMAbsQF again.

print(variables_20288[which(variables_20288$fieldName == "fDOMAbsQF"),])

##                table fieldName
## 1: waq_instantaneous fDOMAbsQF
##                                                                                                                                                                                                              description
## 1: Quality flag indicating that fDOM absorbance corrections were applied = 0; unable to be applied = 1; absorbance values were high = 2; calculated correction factor was 1 (i.e. no absorbance correction was made) = 3
##          dataType units downloadPkg pubFormat primaryKey categoricalCodeName
## 1: signed integer  <NA>    expanded   integer       <NA>                  NA

So fDOMAbsQF = 1 means fDOM absorbance corrections were unable to be applied.

For specific details on the algorithms used to create a data product and it's corresponding quality tests, it's best to first check the data product's Algorithm Theoretical Basis Document (ATBD). For water quality, that is NEON.DOC.004931 listed as Documentation references in the README file and the data product's web page.

Are there any manual science review quality flags? If so, the explanation for flagging may also be viewed in the data product's README file or in the data product's web page on NEON's data portal.

Filtering (Some) Quality Flagged Observations

A simple approach to removing quality flagged observations is to remove data when the finalQF is raised. Let's view a plotting example using fDOM:

# Map QF label names forthe plot for the fDOMFinalQF grouping
group_labels <- c("fDOMFinalQF = 0", "fDOMFinalQF = 1")
names(group_labels) <- c("0","1")

# Plot fDOM data, grouping by the fDOMFinalQF value
ggplot2::ggplot(data = waq_down, 
                aes(x = endDateTime, y = fDOM, group = fDOMFinalQF)) +
  ggplot2::geom_step() +
  facet_grid(fDOMFinalQF ~ ., 
             labeller = labeller(fDOMFinalQF = group_labels)) +
  ggplot2::ggtitle("PRIN Sensor Set 102 fDOM final QF comparison")

Line plots of fDOM data that received a quality flag of zero(top), and data that received a data quality flag of one(bottom). Note how the bottom plot has many spikes in the data and were appropriately given a flag value of one.

The top panel corresponding to fDOMFinalQF = 0 represents all fDOM data that were not flagged. Conversely, the fDOMFinalQF = 1 represents all flagged fDOM data. Clearly, many spikes look like they were appropriately flagged. However, some flagged data look like they could be useful, such as the 2020 February 18-February 24 time range.

Let's inspect the quality flags during that time.

# Find row indices around February 22:
idxs_Feb22 <- base::which(waq_down$endDateTime > as.POSIXct("2020-02-22"))[1:1440]

print("FLAG TEST - COUNT")

## [1] "FLAG TEST - COUNT"

for (col_nam in fdom_qf_names){
  totl_qf_in_col <- length(which(waq_down[idxs_Feb22,col_nam] == 1))
  print(paste0(col_nam,": ",totl_qf_in_col))
}

## [1] "fDOMRangeQF: 0"
## [1] "fDOMStepQF: 8"
## [1] "fDOMNullQF: 0"
## [1] "fDOMGapQF: 0"
## [1] "fDOMSpikeQF: 0"
## [1] "fDOMValidCalQF: 0"
## [1] "fDOMSuspectCalQF: 0"
## [1] "fDOMPersistenceQF: 0"
## [1] "fDOMAlphaQF: 1440"
## [1] "fDOMBetaQF: 0"
## [1] "fDOMTempQF: 0"
## [1] "fDOMAbsQF: 1440"
## [1] "fDOMFinalQF: 1440"
## [1] "fDOMFinalQFSciRvw: 0"

Looks like all Feb 22, 2020 data were flagged with fDOMAbsQF, with a few step test quality flags as well.

Let's take a closer look at each fDOMAbsQF flag value by grouping data based on each fDOMAbsQF value:

ggplot2::ggplot(data = waq_down, 
                aes(x = endDateTime, y = fDOM, group = fDOMAbsQF)) +
  ggplot2::geom_step() +
  facet_grid(fDOMAbsQF ~ .) +
  ggplot2::ggtitle("PRIN Sensor Set 102 fDOMAbsQF comparison")

## Warning: Removed 233 row(s) containing missing values (geom_path).

Line plots of fDOM absorbance quality flag (fDOMAbsQF) values that received a quality flag of -1,0,1, and 2.  Note how the plot corresponding to values flagged as 1 correspond to the same time frame of data missing in the surface water nitrate generated earlier.

The fDOMAbsQF = 1 is the most common quality flag from any single test. This means the absorbance correction could not be applied to fDOM data. This absorbance test also causes the final quality flag test fail, but some users may wish to ignore the absorbance quality test entirely.

Note the fDOMAbsQF = 1 time frame corresponds to the missing surface water nitrate data, as shown in the surface water nitrate plot we generated earlier. Here is a reminder of our nitrate data:

plot_NSW

Nitrate in surface water(uM) from the downstream station of Pringle Creek. Note that there is missing data from February 18th through February 24th.

Some types of automated quality flags may be worth ignoring. Rather than use the FinalQF column to omit any quality flags, let's create a custom final quality flag by ignoring thefDOMAbsQF column, allowing us omit quality flagged fDOM data regardless of absorbance correction status.

# Remove the absorbance and aggregated quality flag tests from list of fDOM QF tests:
fdom_qf_non_abs_names <- fdom_qf_names[which(!fdom_qf_names %in% c("fDOMAlphaQF","fDOMBetaQF","fDOMAbsQF","fDOMFinalQF"))]

# Create a custom quality flag column as the maximum QF value within each row
waq_down$aggr_non_abs_QF <- apply( waq_down[,fdom_qf_non_abs_names],1,max, na.rm = TRUE)
# The 'apply' function above allows us avoid a for-loop and more efficiently 
#  iterate over each row.

# Plot fDOM data, grouping by the custom quality flag column's value
ggplot2::ggplot(data = waq_down, 
                aes(x = endDateTime, y = fDOM, 
                    group = aggr_non_abs_QF)) +
  ggplot2::geom_step() +
  facet_grid(aggr_non_abs_QF ~ .) +
  ggplot2::ggtitle("PRIN Sensor Set 102 fDOM custom QF aggregation")

Line plots of fDOM data where a custom quality flag has been generated by omitting the fDOMAbsQF. Note the increase in available data using the custom quality flag aggregation that ignored fDOMAbsQF.

Using the custom quality flag aggregation that ignored fDOMAbsQF, the aggregated aggr_non_abs_QF column we created increases the quantity of data that could be used for further analyses.

Note that the automated quality flag algorithms are not perfect, and a few suspect data points may occasionally pass the quality tests.

Data Aggregation

Sensor data users commonly wish to aggregate data such that time stamps match across two different datasets. In the following example, we will show how to combine elevation of surface water (DP1.20016.001) and water quality (DP1.20288.001) data products into a single dataframe.

Water quality is published as an instantaneous record, which should be every minute at non-buoy sites such as PRIN. We know a data product does not come from the buoy if the HOR location is different from "103". Because elevation of surface water is already aggregated to 30-minute intervals, we want to aggregate the water quality data product to 30-minute intervals as well.

At PRIN in February 2020, the elevation of surface water sensor is co-located with the water quality sonde at horizontalPosition = "102", meaning the downstream sensor set. In this lesson, let's ignore the upstream data at HOR 101 and just aggregate water quality's downstream data from HOR 102.

Data can easily be aggregated in different forms, such as the mean, min, max, and sum. In the following code chunk, we'll aggregate the data values to 15 minutes as a mean, and the finalQF values as a max between 0 and 1. More complex functions may be needed for aggregating other types of data, such as the measurement uncertainty or special, non-binary quality flags like fDOMAbsQF.

# Recall we already created the downstream object for water quality, waq_down

# We first need to name each data stream within water quality. 
# One trick is to find all the variable names by searching for "BetaQF"
waq_strm_betaqf_cols <- names(waq_down)[grep("BetaQF",names(waq_down))]
print(paste0("BetaQF column names: ",
             paste0(waq_strm_betaqf_cols, collapse = ", ")))

## [1] "BetaQF column names: sensorDepthBetaQF, specificConductanceBetaQF, dissolvedOxygenBetaQF, dissolvedOxygenSatBetaQF, pHBetaQF, chlorophyllBetaQF, turbidityBetaQF, fDOMBetaQF"

# Now let's remove the BetaQF from the column name:
waq_strm_cols <- base::gsub("BetaQF","",waq_strm_betaqf_cols)
# To keep column names short, some variable names had to be shortened
# when appending "BetaQF", so let's add "uration" to "dissolvedOxygenSat"
waq_strm_cols <- base::gsub("dissolvedOxygenSat",
                            "dissolvedOxygenSaturation",waq_strm_cols)
print(paste0("Water quality sensor data stream names: ", 
             paste0(waq_strm_cols, collapse = ", ")))

## [1] "Water quality sensor data stream names: sensorDepth, specificConductance, dissolvedOxygen, dissolvedOxygenSaturation, pH, chlorophyll, turbidity, fDOM"

# We will also aggregate the final quality flags:
waq_final_qf_cols <- names(waq_down)[grep("FinalQF",names(waq_down))]

# Let's check to make sure our time column is in POSIXct format, which is 
# needed if you download and read-in NEON data files without using the 
# neonUtilities package.
if("POSIXct" %in% class(waq_down$endDateTime)){
  print("Time column in waq_down is appropriately in POSIXct format")
} else {
  print("Converting waq_down endDateTime column to POSIXct")
  waq_down$endDateTime <- as.POSIXct(waq_down$endDateTime, tz = "UTC")
}

## [1] "Time column in waq_down is appropriately in POSIXct format"

Now that we have the column names of data and quality flags which we wish to aggregate, we can now move to the aggregation steps! We're going to use some more advanced features using the dplyr and padr packages. Instead of looping by each column, let's employ the dplyr pipe operator, %>%, and call a function that acts on each data column of interest, which we've determined above.

# Aggregate water quality data columns to 30 minute intervals, 
# taking the mean of non-NA values within each 30-minute period. 
# We explain each step in the dplyr piping operation in code 
# comments:

waq_30min_down <- waq_down %>% 
              # pass the downstream data frame to the next function
              # padr's thicken function adds a new column, roundedTime, 
              # that shows the closest 30 min timestamp to
              # to a given observation in time
  
              padr::thicken(interval = "30 min",
                            by = "endDateTime",
                            colname = "roundedTime",
                            rounding = "down") %>%
              # In 1-min data, there should now be sets of 30 
              # corresponding to each 30-minute roundedTime
              # We use dplyr to group data by unique roundedTime 
              # values, and summarise each 30-min group
              # by the the mean, for all data columns provided 
              # in waq_strm_cols and waq_final_qf_cols
  
              dplyr::group_by(roundedTime) %>% 
                dplyr::summarise_at(vars(dplyr::all_of(c(waq_strm_cols, 
                                                  waq_final_qf_cols))), 
                                    mean, na.rm = TRUE)

# Rather than binary values, quality flags are more like "quality 
# metrics", defining the fraction of data flagged within an 
# aggregation interval.

Now, we have a new dataframe for water quality data and associated final quality flags aggregated to 30 minute time intervals. Now the downstream water quality data may be easily combined with the nearby, albeit non-co-located downstream 30-minute averaged elevation of surface water data.

The following code chunk merges the data:

# We have to specify the matching column from each dataframe
all_30min_data_down <- base::merge(x = waq_30min_down, 
                                   y = eos_down, 
                                   by.x = "roundedTime", 
                                   by.y = "endDateTime")

# Let's take a peek at the combined data frame's column names:
colnames(all_30min_data_down)

##  [1] "roundedTime"                     "sensorDepth"                    
##  [3] "specificConductance"             "dissolvedOxygen"                
##  [5] "dissolvedOxygenSaturation"       "pH"                             
##  [7] "chlorophyll"                     "turbidity"                      
##  [9] "fDOM"                            "sensorDepthFinalQF"             
## [11] "sensorDepthFinalQFSciRvw"        "specificCondFinalQF"            
## [13] "specificCondFinalQFSciRvw"       "dissolvedOxygenFinalQF"         
## [15] "dissolvedOxygenFinalQFSciRvw"    "dissolvedOxygenSatFinalQF"      
## [17] "dissolvedOxygenSatFinalQFSciRvw" "pHFinalQF"                      
## [19] "pHFinalQFSciRvw"                 "chlorophyllFinalQF"             
## [21] "chlorophyllFinalQFSciRvw"        "turbidityFinalQF"               
## [23] "turbidityFinalQFSciRvw"          "fDOMFinalQF"                    
## [25] "fDOMFinalQFSciRvw"               "domainID"                       
## [27] "siteID"                          "horizontalPosition"             
## [29] "verticalPosition"                "startDateTime"                  
## [31] "surfacewaterElevMean"            "surfacewaterElevMinimum"        
## [33] "surfacewaterElevMaximum"         "surfacewaterElevVariance"       
## [35] "surfacewaterElevNumPts"          "surfacewaterElevExpUncert"      
## [37] "surfacewaterElevStdErMean"       "sWatElevRangeFailQM"            
## [39] "sWatElevRangePassQM"             "sWatElevRangeNAQM"              
## [41] "sWatElevPersistenceFailQM"       "sWatElevPersistencePassQM"      
## [43] "sWatElevPersistenceNAQM"         "sWatElevStepFailQM"             
## [45] "sWatElevStepPassQM"              "sWatElevStepNAQM"               
## [47] "sWatElevNullFailQM"              "sWatElevNullPassQM"             
## [49] "sWatElevNullNAQM"                "sWatElevGapFailQM"              
## [51] "sWatElevGapPassQM"               "sWatElevGapNAQM"                
## [53] "sWatElevSpikeFailQM"             "sWatElevSpikePassQM"            
## [55] "sWatElevSpikeNAQM"               "validCalFailQM"                 
## [57] "validCalPassQM"                  "validCalNAQM"                   
## [59] "sWatElevAlphaQM"                 "sWatElevBetaQM"                 
## [61] "sWatElevFinalQF"                 "sWatElevFinalQFSciRvw"          
## [63] "publicationDate"                 "release"

We now have matching time stamps for water quality and any other 30-minute
averaged data product, such as elevation of surface water. The merged data frame facilitates direct comparison across different sensors.

Let's take a look with a plot of specific conductance versus water surface elevation:

ggplot(data = all_30min_data_down, 
       aes(x = surfacewaterElevMean, y = specificConductance)) +
  geom_point() + 
  ggtitle("PRIN specific conductance vs. surface water elevation") + 
  xlab("Elevation [m ASL]") + 
  ylab("Specific conductance [uS/cm]")

## Warning: Removed 5 rows containing missing values (geom_point).

Scatter plot of specific conductance (uS/cm) and elevation (m) from Pringle Creek. Specific conductance (uS/cm) is on the Y-axis and elevation (m) on the X-axis. A new data set of 30 minute aggregated water quality data was generated to match the measurement interval of surface water elevation.

Aggregating high frequency time series data is a useful tool for understanding relationships between variables collected at different time intervals, and may also be a required format for model inputs.

Now that you have the basic tools and knowledge on how to read and wrangle NEON AIS data, go have fun working on your scientific questions!

Citations

Watras, C. J., Hanson, P. C., Stacy, T. L., Morrison, K. M., Mather, J., Hu, Y. H., & Milewski, P. (2011). A temperature compensation method for CDOM fluorescence sensors in freshwater. Limnology and Oceanography: Methods, 9(7), 296-301.

Downing, B. D., Pellerin, B. A., Bergamaschi, B. A., Saraceno, J. F., & Kraus, T. E. (2012). Seeing the light: The effects of particles, dissolved materials, and temperature on in situ measurements of DOM fluorescence in rivers and streams. Limnology and Oceanography: Methods, 10(10), 767-775.

Get Lesson Code

download-NEON-AIS-data.R

Explore and work with NEON biodiversity data from aquatic ecosystems

Authors: Eric R. Sokol

Last Updated: May 5, 2022

Learning Objectives

After completing this tutorial you will be able to:

  • Download NEON macroinvertebrate data.
  • Organize those data into long and wide tables.
  • Calculate alpha, beta, and gamma diversity following Jost (2007).

Things You’ll Need To Complete This Tutorial

R Programming Language

You will need a current version of R to complete this tutorial. We also recommend the RStudio IDE to work with R.

R Packages to Install

Prior to starting the tutorial ensure that the following packages are installed.

  • tidyverse: install.packages("tidyverse")
  • neonUtilities: install.packages("neonUtilities")
  • vegan: install.packages("vegan")

More on Packages in R – Adapted from Software Carpentry.

Introduction

Biodiversity is a popular topic within ecology, but quantifying and describing biodiversity precisely can be elusive. In this tutorial, we will describe many of the aspects of biodiversity using NEON's Macroinvertebrate Collection data.

Load Libraries and Prepare Workspace

First, we will load all necessary libraries into our R environment. If you have not already installed these libraries, please see the 'R Packages to Install' section above.

There are also two optional sections in this code chunk: clearing your environment, and loading your NEON API token. Clearing out your environment will erase all of the variables and data that are currently loaded in your R session. This is a good practice for many reasons, but only do this if you are completely sure that you won't be losing any important information! Secondly, your NEON API token will allow you increased download speeds, and helps NEON anonymously track data usage statistics, which helps us optimize our data delivery platforms, and informs our monthly and annual reporting to our funding agency, the National Science Foundation. Please consider signing up for a NEON data user account and using your token as described in this tutorial here.

# clean out workspace

#rm(list = ls()) # OPTIONAL - clear out your environment
#gc()            # Uncomment these lines if desired

# load libraries 
library(tidyverse)
library(neonUtilities)
library(vegan)


# source .r file with my NEON_TOKEN
# source("my_neon_token.R") # OPTIONAL - load NEON token
# See: https://www.neonscience.org/neon-api-tokens-tutorial

Download NEON Macroinvertebrate Data

Now that the workspace is prepared, we will download NEON macroinvertebrate data using the neonUtilities function loadByProduct().

# Macroinvert dpid
my_dpid <- 'DP1.20120.001'

# list of sites
my_site_list <- c('ARIK', 'POSE', 'MAYF')

# get all tables for these sites from the API -- takes < 1 minute
all_tabs_inv <- neonUtilities::loadByProduct(
  dpID = my_dpid,
  site = my_site_list,
  #token = NEON_TOKEN, #Uncomment to use your token
  check.size = F)

Macroinvertebrate Data Munging

Now that we have the data downloaded, we will need to do some 'data munging' to reorganize our data into a more useful format for this analysis. First, let's take a look at some of the tables that were generated by loadByProduct():

# what tables do you get with macroinvertebrate 
# data product
names(all_tabs_inv)

## [1] "categoricalCodes_20120" "inv_fieldData"          "inv_persample"          "inv_taxonomyProcessed"  "issueLog_20120"        
## [6] "readme_20120"           "validation_20120"       "variables_20120"

# extract items from list and put in R env. 
all_tabs_inv %>% list2env(.GlobalEnv)

## <environment: R_GlobalEnv>

# readme has the same informaiton as what you 
# will find on the landing page on the data portal

# The variables file describes each field in 
# the returned data tables
View(variables_20120)

# The validation file provides the rules that 
# constrain data upon ingest into the NEON database:
View(validation_20120)

# the categoricalCodes file provides controlled 
# lists used in the data
View(categoricalCodes_20120)

Next, we will perform several operations in a row to re-organize our data. Each step is described by a code comment.

# It is good to check for duplicate records. This had occurred in the past in 
# data published in the inv_fieldData table in 2021. Those duplicates were 
# fixed in the 2022 data release. 
# Here we use sampleID as primary key and if we find duplicate records, we
# keep the first uid associated with any sampleID that has multiple uids

de_duped_uids <- inv_fieldData %>% 
  
  # remove records where no sample was collected
  filter(!is.na(sampleID)) %>%  
  group_by(sampleID) %>%
  summarise(n_recs = length(uid),
                   n_unique_uids = length(unique(uid)),
                   uid_to_keep = dplyr::first(uid)) 





# Are there any records that have more than one unique uid?
max_dups <- max(de_duped_uids$n_unique_uids %>% unique())





# filter data using de-duped uids if they exist
if(max_dups > 1){
  inv_fieldData <- inv_fieldData %>%
  dplyr::filter(uid %in% de_duped_uids$uid_to_keep)
}





# extract year from date, add it as a new column
inv_fieldData <- inv_fieldData %>%
  mutate(
    year = collectDate %>% 
      lubridate::as_date() %>% 
      lubridate::year())




# extract location data into a separate table
table_location <- inv_fieldData %>%

  # keep only the columns listed below
  select(siteID, 
         domainID,
         namedLocation, 
         decimalLatitude, 
         decimalLongitude, 
         elevation) %>%
  
  # keep rows with unique combinations of values, 
  # i.e., no duplicate records
  distinct()




# create a taxon table, which describes each 
# taxonID that appears in the data set
# start with inv_taxonomyProcessed
table_taxon <- inv_taxonomyProcessed %>%

  # keep only the coluns listed below
  select(acceptedTaxonID, taxonRank, scientificName,
         order, family, genus, 
         identificationQualifier,
         identificationReferences) %>%

  # remove rows with duplicate information
  distinct()



# taxon table information for all taxa in 
# our database can be downloaded here:
# takes 1-2 minutes
# full_taxon_table_from_api <- neonUtilities::getTaxonTable("MACROINVERTEBRATE", token = NEON_TOKEN)




# Make the observation table.
# start with inv_taxonomyProcessed

# check for repeated taxa within a sampleID that need to be added together
inv_taxonomyProcessed_summed <- inv_taxonomyProcessed %>% 
  select(sampleID,
         acceptedTaxonID,
         individualCount,
         estimatedTotalCount) %>%
  group_by(sampleID, acceptedTaxonID) %>%
  summarize(
    across(c(individualCount, estimatedTotalCount), ~sum(.x, na.rm = TRUE)))
  



# join summed taxon counts back with sample and field data
table_observation <- inv_taxonomyProcessed_summed %>%
  
  # Join relevant sample info back in by sampleID
  left_join(inv_taxonomyProcessed %>% 
              select(sampleID,
                     domainID,
                     siteID,
                     namedLocation,
                     collectDate,
                     acceptedTaxonID,
                     order, family, genus, 
                     scientificName,
                     taxonRank) %>%
              distinct()) %>%
  
  # Join the columns selected above with two 
  # columns from inv_fieldData (the two columns 
  # are sampleID and benthicArea)
  left_join(inv_fieldData %>% 
              select(sampleID, eventID, year, 
                     habitatType, samplerType,
                     benthicArea)) %>%
  
  # some new columns called 'variable_name', 
  # 'value', and 'unit', and assign values for 
  # all rows in the table.
  # variable_name and unit are both assigned the 
  # same text strint for all rows. 
  mutate(inv_dens = estimatedTotalCount / benthicArea,
         inv_dens_unit = 'count per square meter')





# check for duplicate records, should return a table with 0 rows
table_observation %>% 
  group_by(sampleID, acceptedTaxonID) %>% 
  summarize(n_obs = length(sampleID)) %>%
  filter(n_obs > 1)

## # A tibble: 0 x 3
## # Groups:   sampleID [0]
## # ... with 3 variables: sampleID <chr>, acceptedTaxonID <chr>, n_obs <int>

# extract sample info
table_sample_info <- table_observation %>%
  select(sampleID, domainID, siteID, namedLocation, 
         collectDate, eventID, year, 
         habitatType, samplerType, benthicArea, 
         inv_dens_unit) %>%
  distinct()




# remove singletons and doubletons
# create an occurrence summary table
taxa_occurrence_summary <- table_observation %>%
  select(sampleID, acceptedTaxonID) %>%
  distinct() %>%
  group_by(acceptedTaxonID) %>%
  summarize(occurrences = n())





# filter out taxa that are only observed 1 or 2 times
taxa_list_cleaned <- taxa_occurrence_summary %>%
  filter(occurrences > 2)





# filter observation table based on taxon list above
table_observation_cleaned <- table_observation %>%
  filter(acceptedTaxonID %in%
             taxa_list_cleaned$acceptedTaxonID,
         !sampleID %in% c("MAYF.20190729.CORE.1",
                          "MAYF.20200713.CORE.1",
                          "MAYF.20210721.CORE.1",
                          "POSE.20160718.HESS.1")) 
                      #this is an outlier sampleID






# some summary data
sampling_effort_summary <- table_sample_info %>%
  
  # group by siteID, year
  group_by(siteID, year, samplerType) %>%
  
  # count samples and habitat types within each event
  summarise(
    event_count = eventID %>% unique() %>% length(),
    sample_count = sampleID %>% unique() %>% length(),
    habitat_count = habitatType %>% 
        unique() %>% length())




# check out the summary table
sampling_effort_summary %>% as.data.frame() %>% 
  head() %>% print()

##   siteID year     samplerType event_count sample_count habitat_count
## 1   ARIK 2014            core           2            6             1
## 2   ARIK 2014 modifiedKicknet           2           10             1
## 3   ARIK 2015            core           3           11             2
## 4   ARIK 2015 modifiedKicknet           3           13             2
## 5   ARIK 2016            core           3            9             1
## 6   ARIK 2016 modifiedKicknet           3           15             1

Working with 'Long' data

'Reshaping' your data to use as an input to a particular fuction may require you to consider: do I want 'long' or 'wide' data? Here's a link to a great article from 'the analysis factor' that describes the differences.

For this first step, we will use data in a 'long' table:

# no. taxa by rank by site
table_observation_cleaned %>% 
  group_by(domainID, siteID, taxonRank) %>%
  summarize(
    n_taxa = acceptedTaxonID %>% 
        unique() %>% length()) %>%
  ggplot(aes(n_taxa, taxonRank)) +
  facet_wrap(~ domainID + siteID) +
  geom_col()

Horizontal bar graph showing the number of taxa for each taxonomic rank at the D02:POSE, D08:MAYF, and D10:ARIK sites. Including facet_wrap to the ggplot call creates a seperate plot for each of the faceting arguments, which in this case are domainID and siteID.

# library(scales)
# sum densities by order for each sampleID
table_observation_by_order <- 
    table_observation_cleaned %>% 
    filter(!is.na(order)) %>%
    group_by(domainID, siteID, year, 
             eventID, sampleID, habitatType, order) %>%
    summarize(order_dens = sum(inv_dens, na.rm = TRUE))
  
  
# rank occurrence by order
table_observation_by_order %>% head()

## # A tibble: 6 x 8
## # Groups:   domainID, siteID, year, eventID, sampleID, habitatType [1]
##   domainID siteID  year eventID       sampleID               habitatType order            order_dens
##   <chr>    <chr>  <dbl> <chr>         <chr>                  <chr>       <chr>                 <dbl>
## 1 D02      POSE    2014 POSE.20140722 POSE.20140722.SURBER.1 riffle      Branchiobdellida      516. 
## 2 D02      POSE    2014 POSE.20140722 POSE.20140722.SURBER.1 riffle      Coleoptera            516. 
## 3 D02      POSE    2014 POSE.20140722 POSE.20140722.SURBER.1 riffle      Decapoda               86.0
## 4 D02      POSE    2014 POSE.20140722 POSE.20140722.SURBER.1 riffle      Diptera              5419. 
## 5 D02      POSE    2014 POSE.20140722 POSE.20140722.SURBER.1 riffle      Ephemeroptera        5301. 
## 6 D02      POSE    2014 POSE.20140722 POSE.20140722.SURBER.1 riffle      Megaloptera           387.

# stacked rank occurrence plot
table_observation_by_order %>%
  group_by(order, siteID) %>%
  summarize(
    occurrence = (order_dens > 0) %>% sum()) %>%
    ggplot(aes(
        x = reorder(order, -occurrence), 
        y = occurrence,
        color = siteID,
        fill = siteID)) +
    geom_col() +
    theme(axis.text.x = 
              element_text(angle = 45, hjust = 1))

Bar graph of the occurence of each taxonomic order at the D02:POSE, D08:MAYF, and D10:ARIK sites. Occurence data at each site is depicted as stacked bars for each order, where a red bar represents D10:ARIK, a green bar represents D08:MAYF, and a blue bar represents the D02:POSE site. The data has also been reordered to show the greatest to least occuring taxonomic order from left to right.

# faceted densities plot
table_observation_by_order %>%
  ggplot(aes(
    x = reorder(order, -order_dens), 
    y = log10(order_dens),
    color = siteID,
    fill = siteID)) +
  geom_boxplot(alpha = .5) +
  facet_grid(siteID ~ .) +
  theme(axis.text.x = 
            element_text(angle = 45, hjust = 1))

Box plots of the log density of each taxonomic order per site. This graph consists of three box plots, organized vertically in one column, that correspond to log density data for each site. This is achieved through the use of the Facet_grid function in the ggplot call.

Making Data 'wide'

For the next process, we will need to make our data table in the 'wide' format.

# select only site by species density info and remove duplicate records
table_sample_by_taxon_density_long <- table_observation_cleaned %>%
  select(sampleID, acceptedTaxonID, inv_dens) %>%
  distinct() %>%
  filter(!is.na(inv_dens))

# table_sample_by_taxon_density_long %>% nrow()
# table_sample_by_taxon_density_long %>% distinct() %>% nrow()



# pivot to wide format, sum multiple counts per sampleID
table_sample_by_taxon_density_wide <- table_sample_by_taxon_density_long %>%
  tidyr::pivot_wider(id_cols = sampleID, 
                     names_from = acceptedTaxonID,
                     values_from = inv_dens,
                     values_fill = list(inv_dens = 0),
                     values_fn = list(inv_dens = sum)) %>%
  column_to_rownames(var = "sampleID") 

# check col and row sums -- mins should all be > 0
colSums(table_sample_by_taxon_density_wide) %>% min()

## [1] 12

rowSums(table_sample_by_taxon_density_wide) %>% min()

## [1] 25.55004

Multiscale Biodiversity

Reference: Jost, L. 2007. Partitioning diversity into independent alpha and beta components. Ecology 88:2427–2439. https://doi.org/10.1890/06-1736.1.

These metrics are based on Robert Whittaker's multiplicative diversity where

  • gamma is regional biodiversity
  • alpha is local biodiversity (e.g., the mean diversity at a patch)
  • and beta diversity is a measure of among-patch variability in community composition.

Beta could be interpreted as the number of "distinct" communities present within the region.

The relationship among alpha, beta, and gamma diversity is: beta = gamma / alpha

The influence of relative abundances over the calculation of alpha, beta, and gamma diversity metrics is determined by the coefficient q. The coefficient "q" determines the "order" of the diversity metric, where q = 0 provides diversity measures based on richness, and higher orders of q give more weight to taxa that have higher abundances in the data. Order q = 1 is related to Shannon diveristy metrics, and order q = 2 is related to Simpson diversity metrics.

Alpha diversity is average local richness.

Order q = 0 alpha diversity calculated for our dataset returns a mean local richness (i.e., species counts) of ~30 taxa per sample across the entire data set.

# Here we use vegan::renyi to calculate Hill numbers
# If hill = FALSE, the function returns an entropy
# If hill = TRUE, the function returns the exponentiated
# entropy. In other words:
# exp(renyi entropy) = Hill number = "species equivalent"

# Note that for this function, the "scales" argument 
# determines the order of q used in the calculation

table_sample_by_taxon_density_wide %>%
  vegan::renyi(scales = 0, hill = TRUE) %>%
  mean()

## [1] 30.06114

Comparing alpha diversity calculated using different orders:

Order q = 1 alpha diversity returns mean number of "species equivalents" per sample in the data set. This approach incorporates evenness because when abundances are more even across taxa, taxa are weighted more equally toward counting as a "species equivalent". For example, if you have a sample with 100 individuals, spread across 10 species, and each species is represented by 10 individuals, the number of order q = 1 species equivalents will equal the richness (10).

Alternatively, if 90 of the 100 individuals in the sample are one species, and the other 10 individuals are spread across the other 9 species, there will only be 1.72 order q = 1 species equivalents, whereas, there are still 10 species in the sample.

# even distribution, orders q = 0 and q = 1 for 10 taxa
vegan::renyi(
  c(spp.a = 10, spp.b = 10, spp.c = 10, 
    spp.d = 10, spp.e = 10, spp.f = 10, 
    spp.g = 10, spp.h = 10, spp.i = 10, 
    spp.j = 10),
  hill = TRUE,
  scales = c(0, 1))

##  0  1 
## 10 10 
## attr(,"class")
## [1] "renyi"   "numeric"

# uneven distribution, orders q = 0 and q = 1 for 10 taxa
vegan::renyi(
  c(spp.a = 90, spp.b = 2, spp.c = 1, 
    spp.d = 1, spp.e = 1, spp.f = 1, 
    spp.g = 1, spp.h = 1, spp.i = 1, 
    spp.j = 1),
  hill = TRUE,
  scales = c(0, 1)) 

##         0         1 
## 10.000000  1.718546 
## attr(,"class")
## [1] "renyi"   "numeric"

Comparing orders of q for NEON data

Let's compare the different orders q = 0, 1, and 2 measures of alpha diversity across the samples collected from ARIK, POSE, and MAYF.

# Nest data by siteID
data_nested_by_siteID <- table_sample_by_taxon_density_wide %>%
  tibble::rownames_to_column("sampleID") %>%
  left_join(table_sample_info %>% 
                select(sampleID, siteID)) %>%
  tibble::column_to_rownames("sampleID") %>%
  nest(data = -siteID)

data_nested_by_siteID$data[[1]] %>%
  vegan::renyi(scales = 0, hill = TRUE) %>%
  mean()

## [1] 24.69388

# apply the calculation by site for alpha diversity
# for each order of q
data_nested_by_siteID %>% mutate(
  alpha_q0 = purrr::map_dbl(
    .x = data,
    .f = ~ vegan::renyi(x = .,
                        hill = TRUE, 
                        scales = 0) %>% mean()),
  alpha_q1 = purrr::map_dbl(
    .x = data,
    .f = ~ vegan::renyi(x = .,
                        hill = TRUE, 
                        scales = 1) %>% mean()),
  alpha_q2 = purrr::map_dbl(
    .x = data,
    .f = ~ vegan::renyi(x = .,
                        hill = TRUE, 
                        scales = 2) %>% mean())
  )

## # A tibble: 3 x 5
##   siteID data                 alpha_q0 alpha_q1 alpha_q2
##   <chr>  <list>                  <dbl>    <dbl>    <dbl>
## 1 ARIK   <tibble [147 x 458]>     24.7     10.2     6.52
## 2 MAYF   <tibble [149 x 458]>     22.2     12.0     8.19
## 3 POSE   <tibble [162 x 458]>     42.1     20.7    13.0

# Note that POSE has the highest mean alpha diversity



# To calculate gamma diversity at the site scale,
# calculate the column means and then calculate 
# the renyi entropy and Hill number
# Here we are only calcuating order 
# q = 0 gamma diversity
data_nested_by_siteID %>% mutate(
  gamma_q0 = purrr::map_dbl(
    .x = data,
    .f = ~ vegan::renyi(x = colMeans(.),
                        hill = TRUE, 
                        scales = 0)))

## # A tibble: 3 x 3
##   siteID data                 gamma_q0
##   <chr>  <list>                  <dbl>
## 1 ARIK   <tibble [147 x 458]>      243
## 2 MAYF   <tibble [149 x 458]>      239
## 3 POSE   <tibble [162 x 458]>      337

# Note that POSE has the highest gamma diversity



# Now calculate alpha, beta, and gamma using orders 0 and 1 
# for each siteID
diversity_partitioning_results <- 
  data_nested_by_siteID %>% 
  mutate(
    n_samples = purrr::map_int(data, ~ nrow(.)),
    alpha_q0 = purrr::map_dbl(
      .x = data,
      .f = ~ vegan::renyi(x = .,
                          hill = TRUE, 
                          scales = 0) %>% mean()),
    alpha_q1 = purrr::map_dbl(
      .x = data,
      .f = ~ vegan::renyi(x = .,
                          hill = TRUE, 
                          scales = 1) %>% mean()),
    gamma_q0 = purrr::map_dbl(
      .x = data,
      .f = ~ vegan::renyi(x = colMeans(.),
                          hill = TRUE, 
                          scales = 0)),
    gamma_q1 = purrr::map_dbl(
      .x = data,
      .f = ~ vegan::renyi(x = colMeans(.),
                          hill = TRUE, 
                          scales = 1)),
    beta_q0 = gamma_q0 / alpha_q0,
    beta_q1 = gamma_q1 / alpha_q1)


diversity_partitioning_results %>% 
  select(-data) %>% as.data.frame() %>% print()

##   siteID n_samples alpha_q0 alpha_q1 gamma_q0  gamma_q1   beta_q0  beta_q1
## 1   ARIK       147 24.69388 10.19950      243  35.70716  9.840496 3.500873
## 2   MAYF       149 22.24832 12.02405      239  65.77590 10.742383 5.470360
## 3   POSE       162 42.11728 20.70184      337 100.16506  8.001466 4.838462

Using NMDS to ordinate samples

Finally, we will use Nonmetric Multidimensional Scaling (NMDS) to ordinate samples as shown below:

# create ordination using NMDS
my_nmds_result <- table_sample_by_taxon_density_wide %>% vegan::metaMDS()

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.2280867 
## Run 1 stress 0.2297516 
## Run 2 stress 0.2322618 
## Run 3 stress 0.2492232 
## Run 4 stress 0.2335912 
## Run 5 stress 0.235082 
## Run 6 stress 0.2396413 
## Run 7 stress 0.2303469 
## Run 8 stress 0.2363123 
## Run 9 stress 0.2523796 
## Run 10 stress 0.2288613 
## Run 11 stress 0.2302371 
## Run 12 stress 0.2302613 
## Run 13 stress 0.2409554 
## Run 14 stress 0.2308922 
## Run 15 stress 0.2528171 
## Run 16 stress 0.2534587 
## Run 17 stress 0.2320313 
## Run 18 stress 0.239435 
## Run 19 stress 0.2293618 
## Run 20 stress 0.2307903 
## *** No convergence -- monoMDS stopping criteria:
##      1: no. of iterations >= maxit
##     18: stress ratio > sratmax
##      1: scale factor of the gradient < sfgrmin

# plot stress
my_nmds_result$stress

## [1] 0.2280867

p1 <- vegan::ordiplot(my_nmds_result)
vegan::ordilabel(p1, "species")

Two-dimension ordination plot of NMDS results. NMDS procedure resulted in a stress value of 0.21. Plot contains sampleIDs depicted in circles, and species, which have been labeled using the ordilabel function.

# merge NMDS scores with sampleID information for plotting
nmds_scores <- my_nmds_result %>% 
  vegan::scores() %>%
  .[["sites"]] %>%
  as.data.frame() %>%
  tibble::rownames_to_column("sampleID") %>%
  left_join(table_sample_info)


# # How I determined the outlier(s)
nmds_scores %>% arrange(desc(NMDS1)) %>% head()

##               sampleID    NMDS1      NMDS2 domainID siteID  namedLocation         collectDate       eventID year habitatType
## 1 MAYF.20190311.CORE.2 1.590745  1.0833382      D08   MAYF MAYF.AOS.reach 2019-03-11 15:00:00 MAYF.20190311 2019         run
## 2 MAYF.20201117.CORE.2 1.395784  0.4986856      D08   MAYF MAYF.AOS.reach 2020-11-17 16:33:00 MAYF.20201117 2020         run
## 3 MAYF.20180726.CORE.2 1.372494  0.2603682      D08   MAYF MAYF.AOS.reach 2018-07-26 14:17:00 MAYF.20180726 2018         run
## 4 MAYF.20190311.CORE.1 1.299395  1.0075703      D08   MAYF MAYF.AOS.reach 2019-03-11 15:00:00 MAYF.20190311 2019         run
## 5 MAYF.20170314.CORE.1 1.132679  1.6469463      D08   MAYF MAYF.AOS.reach 2017-03-14 14:11:00 MAYF.20170314 2017         run
## 6 MAYF.20180326.CORE.3 1.130687 -0.7139679      D08   MAYF MAYF.AOS.reach 2018-03-26 14:50:00 MAYF.20180326 2018         run
##   samplerType benthicArea          inv_dens_unit
## 1        core       0.006 count per square meter
## 2        core       0.006 count per square meter
## 3        core       0.006 count per square meter
## 4        core       0.006 count per square meter
## 5        core       0.006 count per square meter
## 6        core       0.006 count per square meter

nmds_scores %>% arrange(desc(NMDS1)) %>% tail()

##                    sampleID      NMDS1        NMDS2 domainID siteID  namedLocation         collectDate       eventID year habitatType
## 453 ARIK.20160919.KICKNET.5 -0.8577931 -0.245144245      D10   ARIK ARIK.AOS.reach 2016-09-19 22:06:00 ARIK.20160919 2016         run
## 454 ARIK.20160919.KICKNET.1 -0.8694139  0.291753483      D10   ARIK ARIK.AOS.reach 2016-09-19 22:06:00 ARIK.20160919 2016         run
## 455    ARIK.20150714.CORE.3 -0.8843672  0.013601377      D10   ARIK ARIK.AOS.reach 2015-07-14 14:55:00 ARIK.20150714 2015        pool
## 456    ARIK.20150714.CORE.2 -1.0465497  0.004066437      D10   ARIK ARIK.AOS.reach 2015-07-14 14:55:00 ARIK.20150714 2015        pool
## 457 ARIK.20160919.KICKNET.4 -1.0937181 -0.148046639      D10   ARIK ARIK.AOS.reach 2016-09-19 22:06:00 ARIK.20160919 2016         run
## 458    ARIK.20160331.CORE.3 -1.1791981 -0.327145374      D10   ARIK ARIK.AOS.reach 2016-03-31 15:41:00 ARIK.20160331 2016        pool
##         samplerType benthicArea          inv_dens_unit
## 453 modifiedKicknet       0.250 count per square meter
## 454 modifiedKicknet       0.250 count per square meter
## 455            core       0.006 count per square meter
## 456            core       0.006 count per square meter
## 457 modifiedKicknet       0.250 count per square meter
## 458            core       0.006 count per square meter

# Plot samples in community composition space by year
nmds_scores %>%
  ggplot(aes(NMDS1, NMDS2, color = siteID, 
             shape = samplerType)) +
  geom_point() +
  facet_wrap(~ as.factor(year))

Ordination plots of community composition faceted by year. These plots were acheived by merging NMDS scores with sampleID information in order to plot samples by sampler type(shape) and siteID(color).

# Plot samples in community composition space
# facet by siteID and habitat type
# color by year
nmds_scores %>%
  ggplot(aes(NMDS1, NMDS2, color = as.factor(year), 
             shape = samplerType)) +
  geom_point() +
  facet_grid(habitatType ~ siteID, scales = "free")

Ordination plots in community composition space faceted by siteID and habitat type. Points are colored to represent different years, as well as different shapes for sampler type.

Get Lesson Code

01_working_with_NEON_macroinverts.R

Create a Canopy Height Model from lidar-derived Rasters in R

Authors: Edmund Hart, Leah A. Wasser

Last Updated: Apr 8, 2021

A common analysis using lidar data are to derive top of the canopy height values from the lidar data. These values are often used to track changes in forest structure over time, to calculate biomass, and even leaf area index (LAI). Let's dive into the basics of working with raster formatted lidar data in R!

Learning Objectives

After completing this tutorial, you will be able to:

  • Work with digital terrain model (DTM) & digital surface model (DSM) raster files.
  • Create a canopy height model (CHM) raster from DTM & DSM rasters.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.

Install R Packages

  • raster: install.packages("raster")
  • rgdal: install.packages("rgdal")

More on Packages in R - Adapted from Software Carpentry.

Download Data

NEON Teaching Data Subset: Field Site Spatial Data

These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. The entire dataset can be accessed by request from the NEON Data Portal.

Download Dataset

This tutorial is designed for you to set your working directory to the directory created by unzipping this file.


Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.


Recommended Reading

What is a CHM, DSM and DTM? About Gridded, Raster lidar Data

Create a lidar-derived Canopy Height Model (CHM)

The National Ecological Observatory Network (NEON) will provide lidar-derived data products as one of its many free ecological data products. These products will come in the GeoTIFF format, which is a .tif raster format that is spatially located on the earth.

In this tutorial, we create a Canopy Height Model. The Canopy Height Model (CHM), represents the heights of the trees on the ground. We can derive the CHM by subtracting the ground elevation from the elevation of the top of the surface (or the tops of the trees).

We will use the raster R package to work with the the lidar-derived digital surface model (DSM) and the digital terrain model (DTM).

# Load needed packages
library(raster)
library(rgdal)

# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd="~/Git/data/" #This will depend on your local environment
setwd(wd)

First, we will import the Digital Surface Model (DSM). The DSM represents the elevation of the top of the objects on the ground (trees, buildings, etc).

# assign raster to object
dsm <- raster(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/DigitalSurfaceModel/SJER2013_DSM.tif"))

# view info about the raster.
dsm

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : /Users/olearyd/Git/data/NEON-DS-Field-Site-Spatial-Data/SJER/DigitalSurfaceModel/SJER2013_DSM.tif 
## names      : SJER2013_DSM

# plot the DSM
plot(dsm, main="Lidar Digital Surface Model \n SJER, California")

Note the resolution, extent, and coordinate reference system (CRS) of the raster. To do later steps, our DTM will need to be the same.

Next, we will import the Digital Terrain Model (DTM) for the same area. The DTM represents the ground (terrain) elevation.

# import the digital terrain model
dtm <- raster(paste0(wd,"NEON-DS-Field-Site-Spatial-Data/SJER/DigitalTerrainModel/SJER2013_DTM.tif"))

plot(dtm, main="Lidar Digital Terrain Model \n SJER, California")

With both of these rasters now loaded, we can create the Canopy Height Model (CHM). The CHM represents the difference between the DSM and the DTM or the height of all objects on the surface of the earth.

To do this we perform some basic raster math to calculate the CHM. You can perform the same raster math in a GIS program like QGIS.

When you do the math, make sure to subtract the DTM from the DSM or you'll get trees with negative heights!

# use raster math to create CHM
chm <- dsm - dtm

# view CHM attributes
chm

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : -1.399994, 40.29001  (min, max)

plot(chm, main="Lidar Canopy Height Model \n SJER, California")

We've now created a CHM from our DSM and DTM. What do you notice about the canopy cover at this location in the San Joaquin Experimental Range?

### Challenge: Basic Raster Math

Convert the CHM from meters to feet. Plot it.

If, in your work you need to create lots of CHMs from different rasters, an efficient way to do this would be to create a function to create your CHMs.

# Create a function that subtracts one raster from another
# 
canopyCalc <- function(DTM, DSM) {
  return(DSM -DTM)
  }
    
# use the function to create the final CHM
chm2 <- canopyCalc(dsm,dtm)
chm2

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : -40.29001, 1.399994  (min, max)

# or use the overlay function
chm3 <- overlay(dsm,dtm,fun = canopyCalc) 
chm3 

## class      : RasterLayer 
## dimensions : 5060, 4299, 21752940  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 254570, 258869, 4107302, 4112362  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : -40.29001, 1.399994  (min, max)

As with any raster, we can write out the CHM as a GeoTiff using the writeRaster() function.

# write out the CHM in tiff format. 
writeRaster(chm,paste0(wd,"chm_SJER.tif"),"GTiff")

We've now successfully created a canopy height model using basic raster math -- in R! We can bring the chm_SJER.tif file into QGIS (or any GIS program) and look at it.


Consider going onto the next tutorial Extract Values from a Raster in R to compare this lidar-derived CHM with ground-based observations!

Get Lesson Code

create-canopy-height-model-in-R.R

Compare tree height measured from the ground to a Lidar-based Canopy Height Model

Authors: Claire K. Lunch

Last Updated: Apr 13, 2021

This data tutorial provides instruction on working with two different NEON data products to estimate tree height:

  • DP3.30015.001, Ecosystem structure, aka Canopy Height Model (CHM)
  • DP1.10098.001, Woody plant vegetation structure

The CHM data are derived from the Lidar point cloud data collected by the remote sensing platform. The vegetation structure data are collected by by field staff on the ground. We will be using data from the Wind River Experimental Forest NEON field site located in Washington state. The predominant vegetation there are tall evergreen conifers.

If you are coming to this exercise after following tutorials on data download and formatting, and therefore already have the needed data, skip ahead to section 4.

Things You’ll Need To Complete This Tutorial

You will need the most current version of R loaded on your computer to complete this tutorial.

## 1. Setup

Start by installing and loading packages (if necessary) and setting options. One of the packages we'll be using, geoNEON, is only available via GitHub, so it's installed using the devtools package. The other packages can be installed directly from CRAN.

Installation can be run once, then periodically to get package updates.

install.packages("neonUtilities")
install.packages("sp")
install.packages("raster")
install.packages("devtools")
devtools::install_github("NEONScience/NEON-geolocation/geoNEON")

Now load packages. This needs to be done every time you run code. We'll also set a working directory for data downloads.

library(sp)
library(raster)
library(neonUtilities)
library(geoNEON)

options(stringsAsFactors=F)

# set working directory
# adapt directory path for your system
wd <- "~/data"
setwd(wd)

2. Vegetation structure data

Download the vegetation structure data using the loadByProduct() function in the neonUtilities package. Inputs needed to the function are:

  • dpID: data product ID; woody vegetation structure = DP1.10098.001
  • site: 4-letter site code; Wind River = WREF
  • package: basic or expanded; we'll download basic here
  • check.size: should this function prompt the user with an estimated download size? Set to FALSE here for ease of processing as a script, but good to leave as default True when downloading a dataset for the first time.

Refer to the tutorial for the neonUtilities package for more details if desired.

veglist <- loadByProduct(dpID="DP1.10098.001", 
                         site="WREF", 
                         package="basic", 
                         check.size = FALSE)

Use the getLocTOS() function in the geoNEON package to get precise locations for the tagged plants. Refer to the package documentation for more details.

vegmap <- getLocTOS(veglist$vst_mappingandtagging, 
                          "vst_mappingandtagging")

Merge the mapped locations of individuals (the vst_mappingandtagging table) with the annual measurements of height, diameter, etc (the vst_apparentindividual table). The two tables join on individualID, the identifier for each tagged plant, but we'll include namedLocation, domainID, siteID, and plotID in the list of variables to merge on, to avoid ending up with duplicates of each of those columns. Refer to the variables table and to the Data Product User Guide for Woody plant vegetation structure for more information about the contents of each data table.

veg <- merge(veglist$vst_apparentindividual, vegmap, 
             by=c("individualID","namedLocation",
                  "domainID","siteID","plotID"))

Let's see what the data look like! Make a stem map of the plants in plot WREF_075. Note that the circles argument of the symbols() function expects a radius, but stemDiameter is just that, a diameter, so we will need to divide by two. And stemDiameter is in centimeters, but the mapping scale is in meters, so we also need to divide by 100 to get the scale right.

symbols(veg$adjEasting[which(veg$plotID=="WREF_075")], 
        veg$adjNorthing[which(veg$plotID=="WREF_075")], 
        circles=veg$stemDiameter[which(veg$plotID=="WREF_075")]/100/2, 
        inches=F, xlab="Easting", ylab="Northing")

And now overlay the estimated uncertainty in the location of each stem, in blue:

symbols(veg$adjEasting[which(veg$plotID=="WREF_075")], 
        veg$adjNorthing[which(veg$plotID=="WREF_075")], 
        circles=veg$stemDiameter[which(veg$plotID=="WREF_075")]/100/2, 
        inches=F, xlab="Easting", ylab="Northing")
symbols(veg$adjEasting[which(veg$plotID=="WREF_075")], 
        veg$adjNorthing[which(veg$plotID=="WREF_075")], 
        circles=veg$adjCoordinateUncertainty[which(veg$plotID=="WREF_075")], 
        inches=F, add=T, fg="lightblue")

3. Canopy height model data

Now we'll download the CHM tile corresponding to plot WREF_075. Several other plots are also covered by this tile. We could download all tiles that contain vegetation structure plots, but in this exercise we're sticking to one tile to limit download size and processing time.

The tileByAOP() function in the neonUtilities package allows for download of remote sensing tiles based on easting and northing coordinates, so we'll give it the coordinates of plot WREF_075 and the data product ID, DP3.30015.001.

The download will include several metadata files as well as the data tile. Load the data tile into the environment using the raster package.

byTileAOP(dpID="DP3.30015.001", site="WREF", year="2017", 
          easting=veg$adjEasting[which(veg$plotID=="WREF_075")], 
          northing=veg$adjNorthing[which(veg$plotID=="WREF_075")],
          check.size = FALSE, savepath=wd)

chm <- raster(paste0(wd, "/DP3.30015.001/2017/FullSite/D16/2017_WREF_1/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D16_WREF_DP3_580000_5075000_CHM.tif"))

Let's view the tile.

plot(chm, col=topo.colors(5))

4. Comparing the two datasets

Now we have the heights of individual trees measured from the ground, and the height of the top surface of the canopy, measured from the air. There are many different ways to make a comparison between these two datasets! This section will walk through three different approaches.

First, subset the vegetation structure data to only the individuals that fall within this tile, using the extent() function from the raster package.

This step isn't strictly necessary, but it will make the processing faster.

vegsub <- veg[which(veg$adjEasting >= extent(chm)[1] &
                      veg$adjEasting <= extent(chm)[2] &
                      veg$adjNorthing >= extent(chm)[3] & 
                      veg$adjNorthing <= extent(chm)[4]),]

Starting with a very simple first pass: use the extract() function from the raster package to get the CHM value matching the coordinates of each mapped plant. Include a buffer equal to the uncertainty in the plant's location, and extract the highest CHM value within the buffer. Then make a scatter plot of each tree's height vs. the CHM value at its location.

bufferCHM <- extract(chm, 
                     cbind(vegsub$adjEasting,
                           vegsub$adjNorthing),
                     buffer=vegsub$adjCoordinateUncertainty, 
                     fun=max)

plot(bufferCHM~vegsub$height, pch=20, xlab="Height", 
     ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")

How strong is the correlation between the ground and lidar measurements?

cor(bufferCHM, vegsub$height, use="complete")

## [1] 0.3650552

There are a lot of points clustered on the 1-1 line, but there is also a cloud of points above the line, where the measured height is lower than the canopy height model at the same coordinates. This makes sense, because we made no attempt to filter out the understory. There are likely many plants measured in the vegetation structure data that are not at the top of the canopy, and the CHM sees only the top surface of the canopy.

How to exclude understory plants from this analysis? Again, there are many possible approaches. We'll try out two, one map-centric and one tree-centric.

Starting with the map-centric approach: select a pixel size, and aggregate both the vegetation structure data and the CHM data to find the tallest point in each pixel. Let's try this with 10m pixels.

Start by rounding the coordinates of the vegetation structure data, to create 10m bins. Use floor() instead of round() so each tree ends up in the pixel with the same numbering as the raster pixels (the rasters/pixels are numbered by their southwest corners).

easting10 <- 10*floor(vegsub$adjEasting/10)
northing10 <- 10*floor(vegsub$adjNorthing/10)
vegsub <- cbind(vegsub, easting10, northing10)

Use the aggregate() function to get the tallest tree in each 10m bin.

vegbin <- stats::aggregate(vegsub, by=list(vegsub$easting10, vegsub$northing10), FUN=max)

To get the CHM values for the 10m bins, use the raster package version of the aggregate() function. Let's take a look at the lower-resolution image we get as a result.

CHM10 <- raster::aggregate(chm, fact=10, fun=max)
plot(CHM10, col=topo.colors(5))

Use the extract() function again to get the values from each pixel. We don't need a buffer this time, since we've put both datasets onto the same grid. But our grids are numbered by the corners, so add 5 to each tree coordinate to make sure it's in the correct pixel.

vegbin$easting10 <- vegbin$easting10+5
vegbin$northing10 <- vegbin$northing10+5
binCHM <- extract(CHM10, cbind(vegbin$easting10, 
                               vegbin$northing10))
plot(binCHM~vegbin$height, pch=20, 
     xlab="Height", ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")

cor(binCHM, vegbin$height, use="complete")

## [1] 0.3565511

The understory points are thinned out substantially, but so are the rest. We've lost a lot of data by going to a lower resolution.

Let's try and see if we can identify the tallest trees by another approach, using the trees as the starting point instead of map area. Start by sorting the veg structure data by height.

vegsub <- vegsub[order(vegsub$height, decreasing=T),]

Now, for each tree, let's estimate which nearby trees might be beneath its canopy, and discard those points. To do this:

  1. Calculate the distance of each tree from the target tree.
  2. Pick a reasonable estimate for canopy size, and discard shorter trees within that radius. The radius I used is 0.3 times the height, based on some rudimentary googling about Douglas fir allometry. It could definitely be improved on!
  3. Iterate over all trees.

We'll use a simple for loop to do this:

vegfil <- vegsub
for(i in 1:nrow(vegsub)) {
    if(is.na(vegfil$height[i]))
        next
    dist <- sqrt((vegsub$adjEasting[i]-vegsub$adjEasting)^2 + 
                (vegsub$adjNorthing[i]-vegsub$adjNorthing)^2)
    vegfil$height[which(dist<0.3*vegsub$height[i] & 
                        vegsub$height<vegsub$height[i])] <- NA
}

vegfil <- vegfil[which(!is.na(vegfil$height)),]

Now extract the raster values, as above. Let's also increase the buffer size a bit, to better account for the uncertainty in the Lidar data as well as the uncertainty in the ground locations.

filterCHM <- extract(chm, cbind(vegfil$adjEasting, vegfil$adjNorthing),
                         buffer=vegfil$adjCoordinateUncertainty+1, fun=max)
plot(filterCHM~vegfil$height, pch=20, 
     xlab="Height", ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")

cor(filterCHM,vegfil$height)

## [1] 0.7273229

This is quite a bit better! There are still several understory points we failed to exclude, but we were able to filter out most of the understory without losing so many overstory points.

Let's try one last thing. The plantStatus field in the veg structure data indicates whether a plant is dead, broken, or otherwise damaged. In theory, a dead or broken tree can still be the tallest thing around, but it's less likely, and it's also less likely to get a good Lidar return. Exclude all trees that aren't alive:

vegfil <- vegfil[which(vegfil$plantStatus=="Live"),]
filterCHM <- extract(chm, cbind(vegfil$adjEasting, vegfil$adjNorthing),
                         buffer=vegfil$adjCoordinateUncertainty+1, fun=max)
plot(filterCHM~vegfil$height, pch=20, 
     xlab="Height", ylab="Canopy height model")
lines(c(0,50), c(0,50), col="grey")

cor(filterCHM,vegfil$height)

## [1] 0.8135262

Nice!

One final note: however we slice the data, there is a noticeable bias even in the strongly correlated values. The CHM heights are generally a bit shorter than the ground-based estimates of tree height. There are two biases in the CHM data that contribute to this. (1) Lidar returns from short-statured vegetation are difficult to distinguish from the ground, so the "ground" estimated by Lidar is generally a bit higher than the true ground surface, and (2) the height estimate from Lidar represents the highest return, but the highest return may slightly miss the actual tallest point on a given tree. This is especially likely to happen with conifers, which are the top-of-canopy trees at Wind River.

Get Lesson Code

veg_structure_and_chm.R

Introduction to working with NEON eddy flux data

Authors: [Claire K. Lunch]

Last Updated: Mar 12, 2021

This data tutorial provides an introduction to working with NEON eddy flux data, using the neonUtilities R package. If you are new to NEON data, we recommend starting with a more general tutorial, such as the neonUtilities tutorial or the Download and Explore tutorial. Some of the functions and techniques described in those tutorials will be used here, as well as functions and data formats that are unique to the eddy flux system.

This tutorial assumes general familiarity with eddy flux data and associated concepts.

1. Setup

Start by installing and loading packages and setting options. To work with the NEON flux data, we need the rhdf5 package, which is hosted on Bioconductor, and requires a different installation process than CRAN packages:

install.packages('BiocManager')
BiocManager::install('rhdf5')
install.packages('neonUtilities')




options(stringsAsFactors=F)

library(neonUtilities)

Use the zipsByProduct() function from the neonUtilities package to download flux data from two sites and two months. The transformations and functions below will work on any time range and site(s), but two sites and two months allows us to see all the available functionality while minimizing download size.

Inputs to the zipsByProduct() function:

  • dpID: DP4.00200.001, the bundled eddy covariance product
  • package: basic (the expanded package is not covered in this tutorial)
  • site: NIWO = Niwot Ridge and HARV = Harvard Forest
  • startdate: 2018-06 (both dates are inclusive)
  • enddate: 2018-07 (both dates are inclusive)
  • savepath: modify this to something logical on your machine
  • check.size: T if you want to see file size before downloading, otherwise F

The download may take a while, especially if you're on a slow network. For faster downloads, consider using an API token.

zipsByProduct(dpID="DP4.00200.001", package="basic", 
              site=c("NIWO", "HARV"), 
              startdate="2018-06", enddate="2018-07",
              savepath="~/Downloads", 
              check.size=F)

2. Data Levels

There are five levels of data contained in the eddy flux bundle. For full details, refer to the NEON algorithm document.

Briefly, the data levels are:

  • Level 0' (dp0p): Calibrated raw observations
  • Level 1 (dp01): Time-aggregated observations, e.g. 30-minute mean gas concentrations
  • Level 2 (dp02): Time-interpolated data, e.g. rate of change of a gas concentration
  • Level 3 (dp03): Spatially interpolated data, i.e. vertical profiles
  • Level 4 (dp04): Fluxes

The dp0p data are available in the expanded data package and are beyond the scope of this tutorial.

The dp02 and dp03 data are used in storage calculations, and the dp04 data include both the storage and turbulent components. Since many users will want to focus on the net flux data, we'll start there.

3. Extract Level 4 data (Fluxes!)

To extract the Level 4 data from the HDF5 files and merge them into a single table, we'll use the stackEddy() function from the neonUtilities package.

stackEddy() requires two inputs:

  • filepath: Path to a file or folder, which can be any one of:
    1. A zip file of eddy flux data downloaded from the NEON data portal
    2. A folder of eddy flux data downloaded by the zipsByProduct() function
    3. The folder of files resulting from unzipping either of 1 or 2
    4. One or more HDF5 files of NEON eddy flux data
  • level: dp01-4

Input the filepath you downloaded to using zipsByProduct() earlier, including the filestoStack00200 folder created by the function, and dp04:

flux <- stackEddy(filepath="~/Downloads/filesToStack00200",
                 level="dp04")

We now have an object called flux. It's a named list containing four tables: one table for each site's data, and variables and objDesc tables.

names(flux)

## [1] "HARV"      "NIWO"      "variables" "objDesc"

Let's look at the contents of one of the site data files:

head(flux$NIWO)

##               timeBgn             timeEnd data.fluxCo2.nsae.flux data.fluxCo2.stor.flux data.fluxCo2.turb.flux
## 1 2018-06-01 00:00:00 2018-06-01 00:29:59              0.1713858            -0.06348163              0.2348674
## 2 2018-06-01 00:30:00 2018-06-01 00:59:59              0.9251711             0.08748146              0.8376896
## 3 2018-06-01 01:00:00 2018-06-01 01:29:59              0.5005812             0.02231698              0.4782642
## 4 2018-06-01 01:30:00 2018-06-01 01:59:59              0.8032820             0.25569306              0.5475889
## 5 2018-06-01 02:00:00 2018-06-01 02:29:59              0.4897685             0.23090472              0.2588638
## 6 2018-06-01 02:30:00 2018-06-01 02:59:59              0.9223979             0.06228581              0.8601121
##   data.fluxH2o.nsae.flux data.fluxH2o.stor.flux data.fluxH2o.turb.flux data.fluxMome.turb.veloFric
## 1              15.876622              3.3334970              12.543125                   0.2047081
## 2               8.089274             -1.2063258               9.295600                   0.1923735
## 3               5.290594             -4.4190781               9.709672                   0.1200918
## 4               9.190214              0.2030371               8.987177                   0.1177545
## 5               3.111909              0.1349363               2.976973                   0.1589189
## 6               4.613676             -0.3929445               5.006621                   0.1114406
##   data.fluxTemp.nsae.flux data.fluxTemp.stor.flux data.fluxTemp.turb.flux data.foot.stat.angZaxsErth
## 1               4.7565505              -1.4575094               6.2140599                    94.2262
## 2              -0.2717454               0.3403877              -0.6121331                   355.4252
## 3              -4.2055147               0.1870677              -4.3925824                   359.8013
## 4             -13.3834484              -2.4904300             -10.8930185                   137.7743
## 5              -5.1854815              -0.7514531              -4.4340284                   188.4799
## 6              -7.7365481              -1.9046775              -5.8318707                   183.1920
##   data.foot.stat.distReso data.foot.stat.veloYaxsHorSd data.foot.stat.veloZaxsHorSd data.foot.stat.veloFric
## 1                    8.34                    0.7955893                    0.2713232               0.2025427
## 2                    8.34                    0.8590177                    0.2300000               0.2000000
## 3                    8.34                    1.2601763                    0.2300000               0.2000000
## 4                    8.34                    0.7332641                    0.2300000               0.2000000
## 5                    8.34                    0.7096286                    0.2300000               0.2000000
## 6                    8.34                    0.3789859                    0.2300000               0.2000000
##   data.foot.stat.distZaxsMeasDisp data.foot.stat.distZaxsRgh data.foot.stat.distZaxsAbl
## 1                            8.34                 0.04105708                       1000
## 2                            8.34                 0.27991938                       1000
## 3                            8.34                 0.21293225                       1000
## 4                            8.34                 0.83400000                       1000
## 5                            8.34                 0.83400000                       1000
## 6                            8.34                 0.83400000                       1000
##   data.foot.stat.distXaxs90 data.foot.stat.distXaxsMax data.foot.stat.distYaxs90 qfqm.fluxCo2.nsae.qfFinl
## 1                    325.26                     133.44                     25.02                        1
## 2                    266.88                     108.42                     50.04                        1
## 3                    275.22                     116.76                     66.72                        1
## 4                    208.50                      83.40                     75.06                        1
## 5                    208.50                      83.40                     66.72                        1
## 6                    208.50                      83.40                     41.70                        1
##   qfqm.fluxCo2.stor.qfFinl qfqm.fluxCo2.turb.qfFinl qfqm.fluxH2o.nsae.qfFinl qfqm.fluxH2o.stor.qfFinl
## 1                        1                        1                        1                        1
## 2                        1                        1                        1                        0
## 3                        1                        1                        1                        0
## 4                        1                        1                        1                        0
## 5                        1                        1                        1                        0
## 6                        1                        1                        1                        1
##   qfqm.fluxH2o.turb.qfFinl qfqm.fluxMome.turb.qfFinl qfqm.fluxTemp.nsae.qfFinl qfqm.fluxTemp.stor.qfFinl
## 1                        1                         0                         0                         0
## 2                        1                         0                         1                         0
## 3                        1                         1                         0                         0
## 4                        1                         1                         0                         0
## 5                        1                         0                         0                         0
## 6                        1                         0                         0                         0
##   qfqm.fluxTemp.turb.qfFinl qfqm.foot.turb.qfFinl
## 1                         0                     0
## 2                         1                     0
## 3                         0                     0
## 4                         0                     0
## 5                         0                     0
## 6                         0                     0

The variables and objDesc tables can help you interpret the column headers in the data table. The objDesc table contains definitions for many of the terms used in the eddy flux data product, but it isn't complete. To get the terms of interest, we'll break up the column headers into individual terms and look for them in the objDesc table:

term <- unlist(strsplit(names(flux$NIWO), split=".", fixed=T))
flux$objDesc[which(flux$objDesc$Object %in% term),]

##          Object
## 138 angZaxsErth
## 171        data
## 343      qfFinl
## 420        qfqm
## 604     timeBgn
## 605     timeEnd
##                                                                                                         Description
## 138                                                                                                 Wind direction 
## 171                                                                                          Represents data fields
## 343       The final quality flag indicating if the data are valid for the given aggregation period (1=fail, 0=pass)
## 420 Quality flag and quality metrics, represents quality flags and quality metrics that accompany the provided data
## 604                                                                    The beginning time of the aggregation period
## 605                                                                          The end time of the aggregation period

For the terms that aren't captured here, fluxCo2, fluxH2o, and fluxTemp are self-explanatory. The flux components are

  • turb: Turbulent flux
  • stor: Storage
  • nsae: Net surface-atmosphere exchange

The variables table contains the units for each field:

flux$variables

##    category   system variable             stat           units
## 1      data  fluxCo2     nsae          timeBgn              NA
## 2      data  fluxCo2     nsae          timeEnd              NA
## 3      data  fluxCo2     nsae             flux umolCo2 m-2 s-1
## 4      data  fluxCo2     stor          timeBgn              NA
## 5      data  fluxCo2     stor          timeEnd              NA
## 6      data  fluxCo2     stor             flux umolCo2 m-2 s-1
## 7      data  fluxCo2     turb          timeBgn              NA
## 8      data  fluxCo2     turb          timeEnd              NA
## 9      data  fluxCo2     turb             flux umolCo2 m-2 s-1
## 10     data  fluxH2o     nsae          timeBgn              NA
## 11     data  fluxH2o     nsae          timeEnd              NA
## 12     data  fluxH2o     nsae             flux           W m-2
## 13     data  fluxH2o     stor          timeBgn              NA
## 14     data  fluxH2o     stor          timeEnd              NA
## 15     data  fluxH2o     stor             flux           W m-2
## 16     data  fluxH2o     turb          timeBgn              NA
## 17     data  fluxH2o     turb          timeEnd              NA
## 18     data  fluxH2o     turb             flux           W m-2
## 19     data fluxMome     turb          timeBgn              NA
## 20     data fluxMome     turb          timeEnd              NA
## 21     data fluxMome     turb         veloFric           m s-1
## 22     data fluxTemp     nsae          timeBgn              NA
## 23     data fluxTemp     nsae          timeEnd              NA
## 24     data fluxTemp     nsae             flux           W m-2
## 25     data fluxTemp     stor          timeBgn              NA
## 26     data fluxTemp     stor          timeEnd              NA
## 27     data fluxTemp     stor             flux           W m-2
## 28     data fluxTemp     turb          timeBgn              NA
## 29     data fluxTemp     turb          timeEnd              NA
## 30     data fluxTemp     turb             flux           W m-2
## 31     data     foot     stat          timeBgn              NA
## 32     data     foot     stat          timeEnd              NA
## 33     data     foot     stat      angZaxsErth             deg
## 34     data     foot     stat         distReso               m
## 35     data     foot     stat    veloYaxsHorSd           m s-1
## 36     data     foot     stat    veloZaxsHorSd           m s-1
## 37     data     foot     stat         veloFric           m s-1
## 38     data     foot     stat distZaxsMeasDisp               m
## 39     data     foot     stat      distZaxsRgh               m
## 40     data     foot     stat      distZaxsAbl               m
## 41     data     foot     stat       distXaxs90               m
## 42     data     foot     stat      distXaxsMax               m
## 43     data     foot     stat       distYaxs90               m
## 44     qfqm  fluxCo2     nsae          timeBgn              NA
## 45     qfqm  fluxCo2     nsae          timeEnd              NA
## 46     qfqm  fluxCo2     nsae           qfFinl              NA
## 47     qfqm  fluxCo2     stor           qfFinl              NA
## 48     qfqm  fluxCo2     stor          timeBgn              NA
## 49     qfqm  fluxCo2     stor          timeEnd              NA
## 50     qfqm  fluxCo2     turb          timeBgn              NA
## 51     qfqm  fluxCo2     turb          timeEnd              NA
## 52     qfqm  fluxCo2     turb           qfFinl              NA
## 53     qfqm  fluxH2o     nsae          timeBgn              NA
## 54     qfqm  fluxH2o     nsae          timeEnd              NA
## 55     qfqm  fluxH2o     nsae           qfFinl              NA
## 56     qfqm  fluxH2o     stor           qfFinl              NA
## 57     qfqm  fluxH2o     stor          timeBgn              NA
## 58     qfqm  fluxH2o     stor          timeEnd              NA
## 59     qfqm  fluxH2o     turb          timeBgn              NA
## 60     qfqm  fluxH2o     turb          timeEnd              NA
## 61     qfqm  fluxH2o     turb           qfFinl              NA
## 62     qfqm fluxMome     turb          timeBgn              NA
## 63     qfqm fluxMome     turb          timeEnd              NA
## 64     qfqm fluxMome     turb           qfFinl              NA
## 65     qfqm fluxTemp     nsae          timeBgn              NA
## 66     qfqm fluxTemp     nsae          timeEnd              NA
## 67     qfqm fluxTemp     nsae           qfFinl              NA
## 68     qfqm fluxTemp     stor           qfFinl              NA
## 69     qfqm fluxTemp     stor          timeBgn              NA
## 70     qfqm fluxTemp     stor          timeEnd              NA
## 71     qfqm fluxTemp     turb          timeBgn              NA
## 72     qfqm fluxTemp     turb          timeEnd              NA
## 73     qfqm fluxTemp     turb           qfFinl              NA
## 74     qfqm     foot     turb          timeBgn              NA
## 75     qfqm     foot     turb          timeEnd              NA
## 76     qfqm     foot     turb           qfFinl              NA

Let's plot some data! First, a brief aside about time stamps, since these are time series data.

Time stamps

NEON sensor data come with time stamps for both the start and end of the averaging period. Depending on the analysis you're doing, you may want to use one or the other; for general plotting, re-formatting, and transformations, I prefer to use the start time, because there are some small inconsistencies between data products in a few of the end time stamps.

Note that all NEON data use UTC time, aka Greenwich Mean Time. This is true across NEON's instrumented, observational, and airborne measurements. When working with NEON data, it's best to keep everything in UTC as much as possible, otherwise it's very easy to end up with data in mismatched times, which can cause insidious and hard-to-detect problems. In the code below, time stamps and time zones have been handled by stackEddy() and loadByProduct(), so we don't need to do anything additional. But if you're writing your own code and need to convert times, remember that if the time zone isn't specified, R will default to the local time zone it detects on your operating system.

plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn, 
     pch=".", xlab="Date", ylab="CO2 flux")

There is a clear diurnal pattern, and an increase in daily carbon uptake as the growing season progresses.

Let's trim down to just two days of data to see a few other details.

plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn, 
     pch=20, xlab="Date", ylab="CO2 flux",
     xlim=c(as.POSIXct("2018-07-07", tz="GMT"),
            as.POSIXct("2018-07-09", tz="GMT")),
    ylim=c(-20,20), xaxt="n")
axis.POSIXct(1, x=flux$NIWO$timeBgn, 
             format="%Y-%m-%d %H:%M:%S")

Note the timing of C uptake; the UTC time zone is clear here, where uptake occurs at times that appear to be during the night.

4. Merge flux data with other sensor data

Many of the data sets we would use to interpret and model flux data are measured as part of the NEON project, but are not present in the eddy flux data product bundle. In this section, we'll download PAR data and merge them with the flux data; the steps taken here can be applied to any of the NEON instrumented (IS) data products.

Download PAR data

To get NEON PAR data, use the loadByProduct() function from the neonUtilities package. loadByProduct() takes the same inputs as zipsByProduct(), but it loads the downloaded data directly into the current R environment.

Let's download PAR data matching the Niwot Ridge flux data. The inputs needed are:

  • dpID: DP1.00024.001
  • site: NIWO
  • startdate: 2018-06
  • enddate: 2018-07
  • package: basic
  • timeIndex: 30

The new input here is timeIndex=30, which downloads only the 30-minute data. Since the flux data are at a 30-minute resolution, we can save on download time by disregarding the 1-minute data files (which are of course 30 times larger). The timeIndex input can be left off if you want to download all available averaging intervals.

pr <- loadByProduct("DP1.00024.001", site="NIWO", 
                    timeIndex=30, package="basic", 
                    startdate="2018-06", enddate="2018-07",
                    check.size=F)

pr is another named list, and again, metadata and units can be found in the variables table. The PARPAR_30min table contains a verticalPosition field. This field indicates the position on the tower, with 10 being the first tower level, and 20, 30, etc going up the tower.

Join PAR to flux data

We'll connect PAR data from the tower top to the flux data.

pr.top <- pr$PARPAR_30min[which(pr$PARPAR_30min$verticalPosition==
                                max(pr$PARPAR_30min$verticalPosition)),]

As noted above, loadByProduct() automatically converts time stamps to a recognized date-time format when it reads the data. However, the field names for the time stamps differ between the flux data and the other meteorological data: the start of the averaging interval is timeBgn in the flux data and startDateTime in the PAR data.

Let's create a new variable in the PAR data:

pr.top$timeBgn <- pr.top$startDateTime

And now use the matching time stamp fields to merge the flux and PAR data.

fx.pr <- merge(pr.top, flux$NIWO, by="timeBgn")

And now we can plot net carbon exchange as a function of light availability:

plot(fx.pr$data.fluxCo2.nsae.flux~fx.pr$PARMean,
     pch=".", ylim=c(-20,20),
     xlab="PAR", ylab="CO2 flux")

If you're interested in data in the eddy covariance bundle besides the net flux data, the rest of this tutorial will guide you through how to get those data out of the bundle.

5. Vertical profile data (Level 3)

The Level 3 (dp03) data are the spatially interpolated profiles of the rates of change of CO2, H2O, and temperature. Extract the Level 3 data from the HDF5 file using stackEddy() with the same syntax as for the Level 4 data.

prof <- stackEddy(filepath="~/Downloads/filesToStack00200/",
                 level="dp03")

As with the Level 4 data, the result is a named list with data tables for each site.

head(prof$NIWO)

##      timeBgn             timeEnd data.co2Stor.rateRtioMoleDryCo2.X0.1.m data.co2Stor.rateRtioMoleDryCo2.X0.2.m
## 1 2018-06-01 2018-06-01 00:29:59                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X0.3.m data.co2Stor.rateRtioMoleDryCo2.X0.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X0.5.m data.co2Stor.rateRtioMoleDryCo2.X0.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X0.7.m data.co2Stor.rateRtioMoleDryCo2.X0.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X0.9.m data.co2Stor.rateRtioMoleDryCo2.X1.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X1.1.m data.co2Stor.rateRtioMoleDryCo2.X1.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X1.3.m data.co2Stor.rateRtioMoleDryCo2.X1.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X1.5.m data.co2Stor.rateRtioMoleDryCo2.X1.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X1.7.m data.co2Stor.rateRtioMoleDryCo2.X1.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X1.9.m data.co2Stor.rateRtioMoleDryCo2.X2.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X2.1.m data.co2Stor.rateRtioMoleDryCo2.X2.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X2.3.m data.co2Stor.rateRtioMoleDryCo2.X2.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X2.5.m data.co2Stor.rateRtioMoleDryCo2.X2.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X2.7.m data.co2Stor.rateRtioMoleDryCo2.X2.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X2.9.m data.co2Stor.rateRtioMoleDryCo2.X3.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X3.1.m data.co2Stor.rateRtioMoleDryCo2.X3.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X3.3.m data.co2Stor.rateRtioMoleDryCo2.X3.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X3.5.m data.co2Stor.rateRtioMoleDryCo2.X3.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X3.7.m data.co2Stor.rateRtioMoleDryCo2.X3.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X3.9.m data.co2Stor.rateRtioMoleDryCo2.X4.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X4.1.m data.co2Stor.rateRtioMoleDryCo2.X4.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X4.3.m data.co2Stor.rateRtioMoleDryCo2.X4.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X4.5.m data.co2Stor.rateRtioMoleDryCo2.X4.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X4.7.m data.co2Stor.rateRtioMoleDryCo2.X4.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X4.9.m data.co2Stor.rateRtioMoleDryCo2.X5.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X5.1.m data.co2Stor.rateRtioMoleDryCo2.X5.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X5.3.m data.co2Stor.rateRtioMoleDryCo2.X5.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X5.5.m data.co2Stor.rateRtioMoleDryCo2.X5.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X5.7.m data.co2Stor.rateRtioMoleDryCo2.X5.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X5.9.m data.co2Stor.rateRtioMoleDryCo2.X6.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X6.1.m data.co2Stor.rateRtioMoleDryCo2.X6.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X6.3.m data.co2Stor.rateRtioMoleDryCo2.X6.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X6.5.m data.co2Stor.rateRtioMoleDryCo2.X6.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X6.7.m data.co2Stor.rateRtioMoleDryCo2.X6.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X6.9.m data.co2Stor.rateRtioMoleDryCo2.X7.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X7.1.m data.co2Stor.rateRtioMoleDryCo2.X7.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X7.3.m data.co2Stor.rateRtioMoleDryCo2.X7.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X7.5.m data.co2Stor.rateRtioMoleDryCo2.X7.6.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X7.7.m data.co2Stor.rateRtioMoleDryCo2.X7.8.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X7.9.m data.co2Stor.rateRtioMoleDryCo2.X8.m
## 1                          -0.0002681938                        -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X8.1.m data.co2Stor.rateRtioMoleDryCo2.X8.2.m
## 1                          -0.0002681938                          -0.0002681938
##   data.co2Stor.rateRtioMoleDryCo2.X8.3.m data.co2Stor.rateRtioMoleDryCo2.X8.4.m
## 1                          -0.0002681938                          -0.0002681938
##   data.h2oStor.rateRtioMoleDryH2o.X0.1.m data.h2oStor.rateRtioMoleDryH2o.X0.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X0.3.m data.h2oStor.rateRtioMoleDryH2o.X0.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X0.5.m data.h2oStor.rateRtioMoleDryH2o.X0.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X0.7.m data.h2oStor.rateRtioMoleDryH2o.X0.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X0.9.m data.h2oStor.rateRtioMoleDryH2o.X1.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X1.1.m data.h2oStor.rateRtioMoleDryH2o.X1.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X1.3.m data.h2oStor.rateRtioMoleDryH2o.X1.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X1.5.m data.h2oStor.rateRtioMoleDryH2o.X1.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X1.7.m data.h2oStor.rateRtioMoleDryH2o.X1.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X1.9.m data.h2oStor.rateRtioMoleDryH2o.X2.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X2.1.m data.h2oStor.rateRtioMoleDryH2o.X2.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X2.3.m data.h2oStor.rateRtioMoleDryH2o.X2.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X2.5.m data.h2oStor.rateRtioMoleDryH2o.X2.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X2.7.m data.h2oStor.rateRtioMoleDryH2o.X2.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X2.9.m data.h2oStor.rateRtioMoleDryH2o.X3.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X3.1.m data.h2oStor.rateRtioMoleDryH2o.X3.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X3.3.m data.h2oStor.rateRtioMoleDryH2o.X3.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X3.5.m data.h2oStor.rateRtioMoleDryH2o.X3.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X3.7.m data.h2oStor.rateRtioMoleDryH2o.X3.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X3.9.m data.h2oStor.rateRtioMoleDryH2o.X4.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X4.1.m data.h2oStor.rateRtioMoleDryH2o.X4.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X4.3.m data.h2oStor.rateRtioMoleDryH2o.X4.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X4.5.m data.h2oStor.rateRtioMoleDryH2o.X4.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X4.7.m data.h2oStor.rateRtioMoleDryH2o.X4.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X4.9.m data.h2oStor.rateRtioMoleDryH2o.X5.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X5.1.m data.h2oStor.rateRtioMoleDryH2o.X5.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X5.3.m data.h2oStor.rateRtioMoleDryH2o.X5.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X5.5.m data.h2oStor.rateRtioMoleDryH2o.X5.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X5.7.m data.h2oStor.rateRtioMoleDryH2o.X5.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X5.9.m data.h2oStor.rateRtioMoleDryH2o.X6.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X6.1.m data.h2oStor.rateRtioMoleDryH2o.X6.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X6.3.m data.h2oStor.rateRtioMoleDryH2o.X6.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X6.5.m data.h2oStor.rateRtioMoleDryH2o.X6.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X6.7.m data.h2oStor.rateRtioMoleDryH2o.X6.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X6.9.m data.h2oStor.rateRtioMoleDryH2o.X7.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X7.1.m data.h2oStor.rateRtioMoleDryH2o.X7.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X7.3.m data.h2oStor.rateRtioMoleDryH2o.X7.4.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X7.5.m data.h2oStor.rateRtioMoleDryH2o.X7.6.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X7.7.m data.h2oStor.rateRtioMoleDryH2o.X7.8.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X7.9.m data.h2oStor.rateRtioMoleDryH2o.X8.m
## 1                            0.000315911                          0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X8.1.m data.h2oStor.rateRtioMoleDryH2o.X8.2.m
## 1                            0.000315911                            0.000315911
##   data.h2oStor.rateRtioMoleDryH2o.X8.3.m data.h2oStor.rateRtioMoleDryH2o.X8.4.m data.tempStor.rateTemp.X0.1.m
## 1                            0.000315911                            0.000315911                 -0.0001014444
##   data.tempStor.rateTemp.X0.2.m data.tempStor.rateTemp.X0.3.m data.tempStor.rateTemp.X0.4.m
## 1                 -0.0001014444                 -0.0001014444                 -0.0001014444
##   data.tempStor.rateTemp.X0.5.m data.tempStor.rateTemp.X0.6.m data.tempStor.rateTemp.X0.7.m
## 1                 -0.0001014444                 -0.0001050874                  -0.000111159
##   data.tempStor.rateTemp.X0.8.m data.tempStor.rateTemp.X0.9.m data.tempStor.rateTemp.X1.m
## 1                 -0.0001172305                 -0.0001233021               -0.0001293737
##   data.tempStor.rateTemp.X1.1.m data.tempStor.rateTemp.X1.2.m data.tempStor.rateTemp.X1.3.m
## 1                 -0.0001354453                 -0.0001415168                 -0.0001475884
##   data.tempStor.rateTemp.X1.4.m data.tempStor.rateTemp.X1.5.m data.tempStor.rateTemp.X1.6.m
## 1                   -0.00015366                 -0.0001597315                 -0.0001658031
##   data.tempStor.rateTemp.X1.7.m data.tempStor.rateTemp.X1.8.m data.tempStor.rateTemp.X1.9.m
## 1                 -0.0001718747                 -0.0001779463                 -0.0001840178
##   data.tempStor.rateTemp.X2.m data.tempStor.rateTemp.X2.1.m data.tempStor.rateTemp.X2.2.m
## 1                -0.000185739                 -0.0001869767                 -0.0001882144
##   data.tempStor.rateTemp.X2.3.m data.tempStor.rateTemp.X2.4.m data.tempStor.rateTemp.X2.5.m
## 1                 -0.0001894521                 -0.0001906899                 -0.0001919276
##   data.tempStor.rateTemp.X2.6.m data.tempStor.rateTemp.X2.7.m data.tempStor.rateTemp.X2.8.m
## 1                 -0.0001931653                 -0.0001944031                 -0.0001956408
##   data.tempStor.rateTemp.X2.9.m data.tempStor.rateTemp.X3.m data.tempStor.rateTemp.X3.1.m
## 1                 -0.0001968785               -0.0001981162                  -0.000199354
##   data.tempStor.rateTemp.X3.2.m data.tempStor.rateTemp.X3.3.m data.tempStor.rateTemp.X3.4.m
## 1                 -0.0002005917                 -0.0002018294                 -0.0002030672
##   data.tempStor.rateTemp.X3.5.m data.tempStor.rateTemp.X3.6.m data.tempStor.rateTemp.X3.7.m
## 1                 -0.0002043049                 -0.0002055426                 -0.0002067803
##   data.tempStor.rateTemp.X3.8.m data.tempStor.rateTemp.X3.9.m data.tempStor.rateTemp.X4.m
## 1                 -0.0002080181                 -0.0002092558               -0.0002104935
##   data.tempStor.rateTemp.X4.1.m data.tempStor.rateTemp.X4.2.m data.tempStor.rateTemp.X4.3.m
## 1                 -0.0002117313                  -0.000212969                 -0.0002142067
##   data.tempStor.rateTemp.X4.4.m data.tempStor.rateTemp.X4.5.m data.tempStor.rateTemp.X4.6.m
## 1                 -0.0002154444                 -0.0002172161                 -0.0002189878
##   data.tempStor.rateTemp.X4.7.m data.tempStor.rateTemp.X4.8.m data.tempStor.rateTemp.X4.9.m
## 1                 -0.0002207595                 -0.0002225312                 -0.0002243029
##   data.tempStor.rateTemp.X5.m data.tempStor.rateTemp.X5.1.m data.tempStor.rateTemp.X5.2.m
## 1               -0.0002260746                 -0.0002278463                  -0.000229618
##   data.tempStor.rateTemp.X5.3.m data.tempStor.rateTemp.X5.4.m data.tempStor.rateTemp.X5.5.m
## 1                 -0.0002313896                 -0.0002331613                  -0.000234933
##   data.tempStor.rateTemp.X5.6.m data.tempStor.rateTemp.X5.7.m data.tempStor.rateTemp.X5.8.m
## 1                 -0.0002367047                 -0.0002384764                 -0.0002402481
##   data.tempStor.rateTemp.X5.9.m data.tempStor.rateTemp.X6.m data.tempStor.rateTemp.X6.1.m
## 1                 -0.0002420198               -0.0002437915                 -0.0002455631
##   data.tempStor.rateTemp.X6.2.m data.tempStor.rateTemp.X6.3.m data.tempStor.rateTemp.X6.4.m
## 1                 -0.0002473348                 -0.0002491065                 -0.0002508782
##   data.tempStor.rateTemp.X6.5.m data.tempStor.rateTemp.X6.6.m data.tempStor.rateTemp.X6.7.m
## 1                 -0.0002526499                 -0.0002544216                 -0.0002561933
##   data.tempStor.rateTemp.X6.8.m data.tempStor.rateTemp.X6.9.m data.tempStor.rateTemp.X7.m
## 1                  -0.000257965                 -0.0002597367               -0.0002615083
##   data.tempStor.rateTemp.X7.1.m data.tempStor.rateTemp.X7.2.m data.tempStor.rateTemp.X7.3.m
## 1                   -0.00026328                 -0.0002650517                 -0.0002668234
##   data.tempStor.rateTemp.X7.4.m data.tempStor.rateTemp.X7.5.m data.tempStor.rateTemp.X7.6.m
## 1                 -0.0002685951                 -0.0002703668                 -0.0002721385
##   data.tempStor.rateTemp.X7.7.m data.tempStor.rateTemp.X7.8.m data.tempStor.rateTemp.X7.9.m
## 1                 -0.0002739102                 -0.0002756819                 -0.0002774535
##   data.tempStor.rateTemp.X8.m data.tempStor.rateTemp.X8.1.m data.tempStor.rateTemp.X8.2.m
## 1               -0.0002792252                 -0.0002809969                 -0.0002827686
##   data.tempStor.rateTemp.X8.3.m data.tempStor.rateTemp.X8.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.1.m
## 1                 -0.0002845403                  -0.000286312                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X0.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X0.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X0.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X0.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X0.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X1.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X1.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X1.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X1.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X1.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X1.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X2.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X2.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X2.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X2.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X2.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X2.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X3.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X3.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X3.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X3.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X3.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X3.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X4.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X4.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X4.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X4.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X4.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X4.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X5.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X5.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X5.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X5.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X5.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X5.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X6.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X6.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X6.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X6.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X6.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X6.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X7.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X7.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X7.4.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.5.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X7.6.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.7.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X7.8.m qfqm.co2Stor.rateRtioMoleDryCo2.X7.9.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X8.m qfqm.co2Stor.rateRtioMoleDryCo2.X8.1.m
## 1                                    1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X8.2.m qfqm.co2Stor.rateRtioMoleDryCo2.X8.3.m
## 1                                      1                                      1
##   qfqm.co2Stor.rateRtioMoleDryCo2.X8.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.1.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X0.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X0.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X0.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X0.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X0.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X1.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X1.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X1.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X1.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X1.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X1.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X2.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X2.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X2.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X2.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X2.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X2.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X3.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X3.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X3.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X3.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X3.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X3.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X4.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X4.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X4.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X4.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X4.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X4.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X5.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X5.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X5.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X5.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X5.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X5.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X6.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X6.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X6.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X6.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X6.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X6.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X7.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X7.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X7.4.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.5.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X7.6.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.7.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X7.8.m qfqm.h2oStor.rateRtioMoleDryH2o.X7.9.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X8.m qfqm.h2oStor.rateRtioMoleDryH2o.X8.1.m
## 1                                    1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X8.2.m qfqm.h2oStor.rateRtioMoleDryH2o.X8.3.m
## 1                                      1                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.X8.4.m qfqm.tempStor.rateTemp.X0.1.m qfqm.tempStor.rateTemp.X0.2.m
## 1                                      1                             0                             0
##   qfqm.tempStor.rateTemp.X0.3.m qfqm.tempStor.rateTemp.X0.4.m qfqm.tempStor.rateTemp.X0.5.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X0.6.m qfqm.tempStor.rateTemp.X0.7.m qfqm.tempStor.rateTemp.X0.8.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X0.9.m qfqm.tempStor.rateTemp.X1.m qfqm.tempStor.rateTemp.X1.1.m
## 1                             0                           0                             0
##   qfqm.tempStor.rateTemp.X1.2.m qfqm.tempStor.rateTemp.X1.3.m qfqm.tempStor.rateTemp.X1.4.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X1.5.m qfqm.tempStor.rateTemp.X1.6.m qfqm.tempStor.rateTemp.X1.7.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X1.8.m qfqm.tempStor.rateTemp.X1.9.m qfqm.tempStor.rateTemp.X2.m
## 1                             0                             0                           0
##   qfqm.tempStor.rateTemp.X2.1.m qfqm.tempStor.rateTemp.X2.2.m qfqm.tempStor.rateTemp.X2.3.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X2.4.m qfqm.tempStor.rateTemp.X2.5.m qfqm.tempStor.rateTemp.X2.6.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X2.7.m qfqm.tempStor.rateTemp.X2.8.m qfqm.tempStor.rateTemp.X2.9.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X3.m qfqm.tempStor.rateTemp.X3.1.m qfqm.tempStor.rateTemp.X3.2.m
## 1                           0                             0                             0
##   qfqm.tempStor.rateTemp.X3.3.m qfqm.tempStor.rateTemp.X3.4.m qfqm.tempStor.rateTemp.X3.5.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X3.6.m qfqm.tempStor.rateTemp.X3.7.m qfqm.tempStor.rateTemp.X3.8.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X3.9.m qfqm.tempStor.rateTemp.X4.m qfqm.tempStor.rateTemp.X4.1.m
## 1                             0                           0                             0
##   qfqm.tempStor.rateTemp.X4.2.m qfqm.tempStor.rateTemp.X4.3.m qfqm.tempStor.rateTemp.X4.4.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X4.5.m qfqm.tempStor.rateTemp.X4.6.m qfqm.tempStor.rateTemp.X4.7.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X4.8.m qfqm.tempStor.rateTemp.X4.9.m qfqm.tempStor.rateTemp.X5.m
## 1                             0                             0                           0
##   qfqm.tempStor.rateTemp.X5.1.m qfqm.tempStor.rateTemp.X5.2.m qfqm.tempStor.rateTemp.X5.3.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X5.4.m qfqm.tempStor.rateTemp.X5.5.m qfqm.tempStor.rateTemp.X5.6.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X5.7.m qfqm.tempStor.rateTemp.X5.8.m qfqm.tempStor.rateTemp.X5.9.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X6.m qfqm.tempStor.rateTemp.X6.1.m qfqm.tempStor.rateTemp.X6.2.m
## 1                           0                             0                             0
##   qfqm.tempStor.rateTemp.X6.3.m qfqm.tempStor.rateTemp.X6.4.m qfqm.tempStor.rateTemp.X6.5.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X6.6.m qfqm.tempStor.rateTemp.X6.7.m qfqm.tempStor.rateTemp.X6.8.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X6.9.m qfqm.tempStor.rateTemp.X7.m qfqm.tempStor.rateTemp.X7.1.m
## 1                             0                           0                             0
##   qfqm.tempStor.rateTemp.X7.2.m qfqm.tempStor.rateTemp.X7.3.m qfqm.tempStor.rateTemp.X7.4.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X7.5.m qfqm.tempStor.rateTemp.X7.6.m qfqm.tempStor.rateTemp.X7.7.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X7.8.m qfqm.tempStor.rateTemp.X7.9.m qfqm.tempStor.rateTemp.X8.m
## 1                             0                             0                           0
##   qfqm.tempStor.rateTemp.X8.1.m qfqm.tempStor.rateTemp.X8.2.m qfqm.tempStor.rateTemp.X8.3.m
## 1                             0                             0                             0
##   qfqm.tempStor.rateTemp.X8.4.m
## 1                             0
##  [ reached 'max' / getOption("max.print") -- omitted 5 rows ]

6. Un-interpolated vertical profile data (Level 2)

The Level 2 data are interpolated in time but not in space. They contain the rates of change at each of the measurement heights.

Again, they can be extracted from the HDF5 files using stackEddy() with the same syntax:

prof.l2 <- stackEddy(filepath="~/Downloads/filesToStack00200/",
                 level="dp02")



head(prof.l2$HARV)

##   verticalPosition             timeBgn             timeEnd data.co2Stor.rateRtioMoleDryCo2.mean
## 1              010 2018-06-01 00:00:00 2018-06-01 00:29:59                                  NaN
## 2              010 2018-06-01 00:30:00 2018-06-01 00:59:59                          0.002666576
## 3              010 2018-06-01 01:00:00 2018-06-01 01:29:59                         -0.011224223
## 4              010 2018-06-01 01:30:00 2018-06-01 01:59:59                          0.006133056
## 5              010 2018-06-01 02:00:00 2018-06-01 02:29:59                         -0.019554655
## 6              010 2018-06-01 02:30:00 2018-06-01 02:59:59                         -0.007855632
##   data.h2oStor.rateRtioMoleDryH2o.mean data.tempStor.rateTemp.mean qfqm.co2Stor.rateRtioMoleDryCo2.qfFinl
## 1                                  NaN                2.583333e-05                                      1
## 2                                  NaN               -2.008056e-04                                      1
## 3                                  NaN               -1.901111e-04                                      1
## 4                                  NaN               -7.419444e-05                                      1
## 5                                  NaN               -1.537083e-04                                      1
## 6                                  NaN               -1.874861e-04                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.qfFinl qfqm.tempStor.rateTemp.qfFinl
## 1                                      1                             0
## 2                                      1                             0
## 3                                      1                             0
## 4                                      1                             0
## 5                                      1                             0
## 6                                      1                             0

Note that here, as in the PAR data, there is a verticalPosition field. It has the same meaning as in the PAR data, indicating the tower level of the measurement.

7. Calibrated raw data (Level 1)

Level 1 (dp01) data are calibrated, and aggregated in time, but otherwise untransformed. Use Level 1 data for raw gas concentrations and atmospheric stable isotopes.

Using stackEddy() to extract Level 1 data requires additional inputs. The Level 1 files are too large to simply pull out all the variables by default, and they include multiple averaging intervals, which can't be merged. So two additional inputs are needed:

  • avg: The averaging interval to extract
  • var: One or more variables to extract

What variables are available, at what averaging intervals? Another function in the neonUtilities package, getVarsEddy(), returns a list of HDF5 file contents. It requires only one input, a filepath to a single NEON HDF5 file:

vars <- getVarsEddy("~/Downloads/filesToStack00200/NEON.D01.HARV.DP4.00200.001.nsae.2018-07.basic.20201020T201317Z.h5")
head(vars)

##    site level category system hor ver tmi       name       otype   dclass   dim  oth
## 5  HARV  dp01     data   amrs 000 060 01m angNedXaxs H5I_DATASET COMPOUND 43200 <NA>
## 6  HARV  dp01     data   amrs 000 060 01m angNedYaxs H5I_DATASET COMPOUND 43200 <NA>
## 7  HARV  dp01     data   amrs 000 060 01m angNedZaxs H5I_DATASET COMPOUND 43200 <NA>
## 9  HARV  dp01     data   amrs 000 060 30m angNedXaxs H5I_DATASET COMPOUND  1440 <NA>
## 10 HARV  dp01     data   amrs 000 060 30m angNedYaxs H5I_DATASET COMPOUND  1440 <NA>
## 11 HARV  dp01     data   amrs 000 060 30m angNedZaxs H5I_DATASET COMPOUND  1440 <NA>

Inputs to var can be any values from the name field in the table returned by getVarsEddy(). Let's take a look at CO2 and H2O, 13C in CO2 and 18O in H2O, at 30-minute aggregation. Let's look at Harvard Forest for these data, since deeper canopies generally have more interesting profiles:

iso <- stackEddy(filepath="~/Downloads/filesToStack00200/",
               level="dp01", var=c("rtioMoleDryCo2","rtioMoleDryH2o",
                                   "dlta13CCo2","dlta18OH2o"), avg=30)



head(iso$HARV)

##   verticalPosition             timeBgn             timeEnd data.co2Stor.rtioMoleDryCo2.mean
## 1              010 2018-06-01 00:00:00 2018-06-01 00:29:59                         509.3375
## 2              010 2018-06-01 00:30:00 2018-06-01 00:59:59                         502.2736
## 3              010 2018-06-01 01:00:00 2018-06-01 01:29:59                         521.6139
## 4              010 2018-06-01 01:30:00 2018-06-01 01:59:59                         469.6317
## 5              010 2018-06-01 02:00:00 2018-06-01 02:29:59                         484.7725
## 6              010 2018-06-01 02:30:00 2018-06-01 02:59:59                         476.8554
##   data.co2Stor.rtioMoleDryCo2.min data.co2Stor.rtioMoleDryCo2.max data.co2Stor.rtioMoleDryCo2.vari
## 1                        451.4786                        579.3518                         845.0795
## 2                        463.5470                        533.6622                         161.3652
## 3                        442.8649                        563.0518                         547.9924
## 4                        432.6588                        508.7463                         396.8379
## 5                        436.2842                        537.4641                         662.9449
## 6                        443.7055                        515.6598                         246.6969
##   data.co2Stor.rtioMoleDryCo2.numSamp data.co2Turb.rtioMoleDryCo2.mean data.co2Turb.rtioMoleDryCo2.min
## 1                                 235                               NA                              NA
## 2                                 175                               NA                              NA
## 3                                 235                               NA                              NA
## 4                                 175                               NA                              NA
## 5                                 235                               NA                              NA
## 6                                 175                               NA                              NA
##   data.co2Turb.rtioMoleDryCo2.max data.co2Turb.rtioMoleDryCo2.vari data.co2Turb.rtioMoleDryCo2.numSamp
## 1                              NA                               NA                                  NA
## 2                              NA                               NA                                  NA
## 3                              NA                               NA                                  NA
## 4                              NA                               NA                                  NA
## 5                              NA                               NA                                  NA
## 6                              NA                               NA                                  NA
##   data.h2oStor.rtioMoleDryH2o.mean data.h2oStor.rtioMoleDryH2o.min data.h2oStor.rtioMoleDryH2o.max
## 1                              NaN                             NaN                             NaN
## 2                              NaN                             NaN                             NaN
## 3                              NaN                             NaN                             NaN
## 4                              NaN                             NaN                             NaN
## 5                              NaN                             NaN                             NaN
## 6                              NaN                             NaN                             NaN
##   data.h2oStor.rtioMoleDryH2o.vari data.h2oStor.rtioMoleDryH2o.numSamp data.h2oTurb.rtioMoleDryH2o.mean
## 1                               NA                                   0                               NA
## 2                               NA                                   0                               NA
## 3                               NA                                   0                               NA
## 4                               NA                                   0                               NA
## 5                               NA                                   0                               NA
## 6                               NA                                   0                               NA
##   data.h2oTurb.rtioMoleDryH2o.min data.h2oTurb.rtioMoleDryH2o.max data.h2oTurb.rtioMoleDryH2o.vari
## 1                              NA                              NA                               NA
## 2                              NA                              NA                               NA
## 3                              NA                              NA                               NA
## 4                              NA                              NA                               NA
## 5                              NA                              NA                               NA
## 6                              NA                              NA                               NA
##   data.h2oTurb.rtioMoleDryH2o.numSamp data.isoCo2.dlta13CCo2.mean data.isoCo2.dlta13CCo2.min
## 1                                  NA                         NaN                        NaN
## 2                                  NA                   -11.40646                    -14.992
## 3                                  NA                         NaN                        NaN
## 4                                  NA                   -10.69318                    -14.065
## 5                                  NA                         NaN                        NaN
## 6                                  NA                   -11.02814                    -13.280
##   data.isoCo2.dlta13CCo2.max data.isoCo2.dlta13CCo2.vari data.isoCo2.dlta13CCo2.numSamp
## 1                        NaN                          NA                              0
## 2                     -8.022                   1.9624355                            305
## 3                        NaN                          NA                              0
## 4                     -7.385                   1.5766385                            304
## 5                        NaN                          NA                              0
## 6                     -7.966                   0.9929341                            308
##   data.isoCo2.rtioMoleDryCo2.mean data.isoCo2.rtioMoleDryCo2.min data.isoCo2.rtioMoleDryCo2.max
## 1                             NaN                            NaN                            NaN
## 2                        458.3546                        415.875                        531.066
## 3                             NaN                            NaN                            NaN
## 4                        439.9582                        415.777                        475.736
## 5                             NaN                            NaN                            NaN
## 6                        446.5563                        420.845                        468.312
##   data.isoCo2.rtioMoleDryCo2.vari data.isoCo2.rtioMoleDryCo2.numSamp data.isoCo2.rtioMoleDryH2o.mean
## 1                              NA                                  0                             NaN
## 2                        953.2212                                306                        22.11830
## 3                              NA                                  0                             NaN
## 4                        404.0365                                306                        22.38925
## 5                              NA                                  0                             NaN
## 6                        138.7560                                309                        22.15731
##   data.isoCo2.rtioMoleDryH2o.min data.isoCo2.rtioMoleDryH2o.max data.isoCo2.rtioMoleDryH2o.vari
## 1                            NaN                            NaN                              NA
## 2                       21.85753                       22.34854                      0.01746926
## 3                            NaN                            NaN                              NA
## 4                       22.09775                       22.59945                      0.02626762
## 5                            NaN                            NaN                              NA
## 6                       22.06641                       22.26493                      0.00277579
##   data.isoCo2.rtioMoleDryH2o.numSamp data.isoH2o.dlta18OH2o.mean data.isoH2o.dlta18OH2o.min
## 1                                  0                         NaN                        NaN
## 2                                 85                   -12.24437                    -12.901
## 3                                  0                         NaN                        NaN
## 4                                 84                   -12.04580                    -12.787
## 5                                  0                         NaN                        NaN
## 6                                 80                   -11.81500                    -12.375
##   data.isoH2o.dlta18OH2o.max data.isoH2o.dlta18OH2o.vari data.isoH2o.dlta18OH2o.numSamp
## 1                        NaN                          NA                              0
## 2                    -11.569                  0.03557313                            540
## 3                        NaN                          NA                              0
## 4                    -11.542                  0.03970481                            539
## 5                        NaN                          NA                              0
## 6                    -11.282                  0.03498614                            540
##   data.isoH2o.rtioMoleDryH2o.mean data.isoH2o.rtioMoleDryH2o.min data.isoH2o.rtioMoleDryH2o.max
## 1                             NaN                            NaN                            NaN
## 2                        20.89354                       20.36980                       21.13160
## 3                             NaN                            NaN                            NaN
## 4                        21.12872                       20.74663                       21.33272
## 5                             NaN                            NaN                            NaN
## 6                        20.93480                       20.63463                       21.00702
##   data.isoH2o.rtioMoleDryH2o.vari data.isoH2o.rtioMoleDryH2o.numSamp qfqm.co2Stor.rtioMoleDryCo2.qfFinl
## 1                              NA                                  0                                  1
## 2                     0.025376207                                540                                  1
## 3                              NA                                  0                                  1
## 4                     0.017612293                                540                                  1
## 5                              NA                                  0                                  1
## 6                     0.003805751                                540                                  1
##   qfqm.co2Turb.rtioMoleDryCo2.qfFinl qfqm.h2oStor.rtioMoleDryH2o.qfFinl qfqm.h2oTurb.rtioMoleDryH2o.qfFinl
## 1                                 NA                                  1                                 NA
## 2                                 NA                                  1                                 NA
## 3                                 NA                                  1                                 NA
## 4                                 NA                                  1                                 NA
## 5                                 NA                                  1                                 NA
## 6                                 NA                                  1                                 NA
##   qfqm.isoCo2.dlta13CCo2.qfFinl qfqm.isoCo2.rtioMoleDryCo2.qfFinl qfqm.isoCo2.rtioMoleDryH2o.qfFinl
## 1                             1                                 1                                 1
## 2                             0                                 0                                 0
## 3                             1                                 1                                 1
## 4                             0                                 0                                 0
## 5                             1                                 1                                 1
## 6                             0                                 0                                 0
##   qfqm.isoH2o.dlta18OH2o.qfFinl qfqm.isoH2o.rtioMoleDryH2o.qfFinl ucrt.co2Stor.rtioMoleDryCo2.mean
## 1                             1                                 1                       10.0248527
## 2                             0                                 0                        1.1077243
## 3                             1                                 1                        7.5181428
## 4                             0                                 0                        8.4017805
## 5                             1                                 1                        0.9465824
## 6                             0                                 0                        1.3629090
##   ucrt.co2Stor.rtioMoleDryCo2.vari ucrt.co2Stor.rtioMoleDryCo2.se ucrt.co2Turb.rtioMoleDryCo2.mean
## 1                        170.28091                      1.8963340                               NA
## 2                         34.29589                      0.9602536                               NA
## 3                        151.35746                      1.5270503                               NA
## 4                         93.41077                      1.5058703                               NA
## 5                         14.02753                      1.6795958                               NA
## 6                          8.50861                      1.1873064                               NA
##   ucrt.co2Turb.rtioMoleDryCo2.vari ucrt.co2Turb.rtioMoleDryCo2.se ucrt.h2oStor.rtioMoleDryH2o.mean
## 1                               NA                             NA                               NA
## 2                               NA                             NA                               NA
## 3                               NA                             NA                               NA
## 4                               NA                             NA                               NA
## 5                               NA                             NA                               NA
## 6                               NA                             NA                               NA
##   ucrt.h2oStor.rtioMoleDryH2o.vari ucrt.h2oStor.rtioMoleDryH2o.se ucrt.h2oTurb.rtioMoleDryH2o.mean
## 1                               NA                             NA                               NA
## 2                               NA                             NA                               NA
## 3                               NA                             NA                               NA
## 4                               NA                             NA                               NA
## 5                               NA                             NA                               NA
## 6                               NA                             NA                               NA
##   ucrt.h2oTurb.rtioMoleDryH2o.vari ucrt.h2oTurb.rtioMoleDryH2o.se ucrt.isoCo2.dlta13CCo2.mean
## 1                               NA                             NA                         NaN
## 2                               NA                             NA                   0.5812574
## 3                               NA                             NA                         NaN
## 4                               NA                             NA                   0.3653442
## 5                               NA                             NA                         NaN
## 6                               NA                             NA                   0.2428672
##   ucrt.isoCo2.dlta13CCo2.vari ucrt.isoCo2.dlta13CCo2.se ucrt.isoCo2.rtioMoleDryCo2.mean
## 1                         NaN                        NA                             NaN
## 2                   0.6827844                0.08021356                       16.931819
## 3                         NaN                        NA                             NaN
## 4                   0.3761155                0.07201605                       10.078698
## 5                         NaN                        NA                             NaN
## 6                   0.1544487                0.05677862                        7.140787
##   ucrt.isoCo2.rtioMoleDryCo2.vari ucrt.isoCo2.rtioMoleDryCo2.se ucrt.isoCo2.rtioMoleDryH2o.mean
## 1                             NaN                            NA                             NaN
## 2                       614.01630                      1.764965                      0.08848440
## 3                             NaN                            NA                             NaN
## 4                       196.99445                      1.149078                      0.08917388
## 5                             NaN                            NA                             NaN
## 6                        55.90843                      0.670111                              NA
##   ucrt.isoCo2.rtioMoleDryH2o.vari ucrt.isoCo2.rtioMoleDryH2o.se ucrt.isoH2o.dlta18OH2o.mean
## 1                             NaN                            NA                         NaN
## 2                      0.01226428                   0.014335993                  0.02544454
## 3                             NaN                            NA                         NaN
## 4                      0.01542679                   0.017683602                  0.01373503
## 5                             NaN                            NA                         NaN
## 6                              NA                   0.005890447                  0.01932110
##   ucrt.isoH2o.dlta18OH2o.vari ucrt.isoH2o.dlta18OH2o.se ucrt.isoH2o.rtioMoleDryH2o.mean
## 1                         NaN                        NA                             NaN
## 2                 0.003017400               0.008116413                      0.06937514
## 3                         NaN                        NA                             NaN
## 4                 0.002704220               0.008582764                      0.08489408
## 5                         NaN                        NA                             NaN
## 6                 0.002095066               0.008049170                      0.02813808
##   ucrt.isoH2o.rtioMoleDryH2o.vari ucrt.isoH2o.rtioMoleDryH2o.se
## 1                             NaN                            NA
## 2                     0.009640249                   0.006855142
## 3                             NaN                            NA
## 4                     0.008572288                   0.005710986
## 5                             NaN                            NA
## 6                     0.002551672                   0.002654748

Let's plot vertical profiles of CO2 and 13C in CO2 on a single day.

Here we'll use the time stamps in a different way, using grep() to select all of the records for a single day. And discard the verticalPosition values that are string values - those are the calibration gases.

iso.d <- iso$HARV[grep("2018-06-25", iso$HARV$timeBgn, fixed=T),]
iso.d <- iso.d[-which(is.na(as.numeric(iso.d$verticalPosition))),]

ggplot is well suited to these types of data, let's use it to plot the profiles. If you don't have the package yet, use install.packages() to install it first.

library(ggplot2)

Now we can plot CO2 relative to height on the tower, with separate lines for each time interval.

g <- ggplot(iso.d, aes(y=verticalPosition)) + 
  geom_path(aes(x=data.co2Stor.rtioMoleDryCo2.mean, 
                group=timeBgn, col=timeBgn)) + 
  theme(legend.position="none") + 
  xlab("CO2") + ylab("Tower level")
g

And the same plot for 13C in CO2:

g <- ggplot(iso.d, aes(y=verticalPosition)) + 
  geom_path(aes(x=data.isoCo2.dlta13CCo2.mean, 
                group=timeBgn, col=timeBgn)) + 
  theme(legend.position="none") + 
  xlab("d13C") + ylab("Tower level")
g

The legends are omitted for space, see if you can use the concentration and isotope ratio buildup and drawdown below the canopy to work out the times of day the different colors represent.

Get Lesson Code

eddy_intro.R
NEON Logo

Follow Us:

Join Our Newsletter

Get updates on events, opportunities, and how NEON is being used today.

Subscribe Now

Footer

  • My Account
  • About Us
  • Newsroom
  • Contact Us
  • Terms & Conditions
  • Careers

Copyright © Battelle, 2019-2020

The National Ecological Observatory Network is a major facility fully funded by the National Science Foundation.

Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the National Science Foundation.