Skip to main content
NSF NEON, Operated by Battelle

Main navigation

  • About
    • NEON Overview
      • Vision and Management
      • Spatial and Temporal Design
      • History
    • About the NEON Biorepository
      • ASU NEON Biorepository Staff
      • Contact the NEON Biorepository
    • Observatory Blog
    • Newsletters
    • Staff
    • FAQ
    • User Accounts
    • Contact Us

    About

  • Data
    • Data Portal
      • Data Availability Charts
      • API & GraphQL
      • Prototype Data
      • Externally Hosted Data
    • Data Collection Methods
      • Airborne Observation Platform (AOP)
      • Instrument System (IS)
        • Instrumented Collection Types
        • Aquatic Instrument System (AIS)
        • Terrestrial Instrument System (TIS)
      • Observational System (OS)
        • Observation Types
        • Observational Sampling Design
        • Sampling Schedules
        • Taxonomic Lists Used by Field Staff
        • Optimizing the Observational Sampling Designs
      • Protocols & Standardized Methods
    • Getting Started with NEON Data
      • neonUtilities for R and Python
      • Learning Hub
      • Code Hub
    • Using Data
      • Data Formats and Conventions
      • Released, Provisional, and Revised Data
      • Data Product Bundles
      • Usage Policies
      • Acknowledging and Citing NEON
      • Publishing Research Outputs
    • Data Notifications
    • NEON Data Management
      • Data Availability
      • Data Processing
      • Data Quality

    Data

  • Samples & Specimens
    • Biorepository Sample Portal at ASU
    • About Samples
      • Sample Types
      • Sample Repositories
      • Megapit and Distributed Initial Characterization Soil Archives
    • Finding and Accessing Sample Data
      • Species Checklists
      • Sample Explorer - Relationships and Data
      • Biorepository API
    • Requesting and Using Samples
      • Loans & Archival Requests
      • Usage Policies

    Samples & Specimens

  • Field Sites
    • Field Site Map and Info
    • Spatial Data Layers & Maps

    Field Sites

  • Resources
    • Getting Started with NEON Data
    • Research Support Services
      • Field Site Coordination
      • Letters of Support
      • Permits and Permissions
      • AOP Flight Campaigns
      • Research Support FAQs
      • Research Support Projects
    • Code Hub
      • neonUtilities for R and Python
      • Code Resources Guidelines
      • Code Resources Submission
      • NEON's GitHub Organization Homepage
    • Learning Hub
      • Tutorials
      • Workshops & Courses
      • Science Videos
      • Teaching Modules
    • Science Seminars and Data Skills Webinars
    • Document Library
    • Funding Opportunities

    Resources

  • Impact
    • Research Highlights
    • Papers & Publications
    • NEON in the News

    Impact

  • Get Involved
    • Upcoming Events
    • Research and Collaborations
      • Environmental Data Science Innovation and Inclusion Lab
      • Collaboration with DOE BER User Facilities and Programs
      • EFI-NEON Ecological Forecasting Challenge
      • NEON Great Lakes User Group
      • NCAR-NEON-Community Collaborations
    • Advisory Groups
      • Science, Technology & Education Advisory Committee (STEAC)
      • Innovation Advisory Committee (IAC)
      • Technical Working Groups (TWG)
    • NEON Ambassador Program
      • Exploring NEON-Derived Data Products Workshop Series
    • Partnerships
    • Community Engagement
    • Work Opportunities

    Get Involved

  • My Account
  • Search

Search

Learning Hub

  • Tutorials
  • Workshops & Courses
  • Science Videos
  • Teaching Modules

Breadcrumb

  1. Resources
  2. Learning Hub
  3. Workshops & Courses
  4. NEON Flux Data - AmeriFlux Early Career Workshop

Workshop

NEON Flux Data - AmeriFlux Early Career Workshop

AmeriFlux Annual Meeting

September 17, 2019

Share

NEON provides open ecological data from 81 sites across the United States, covering a wide range of subject areas, including organismal observations, biogeochemistry, and remote sensing. Surface-atmosphere exchange data are collected at all 47 terrestrial NEON sites, enabling analysis and synthesis both within and across sites.

This workshop will provide an introduction to data access via the Data Portal and the API, and to working with NEON flux data using the neonUtilities R package.

Workshop Schedule

For location and updates, please refer to the Early Career Workshop Agenda.

Time Topic
Noon Workshop Opening (box lunch provided)
12:10 Introduction and Overview of Tutorials
12:40 B R E A K
12:55 Data Tutorials for Breakout Groups (concurrent sessions)
Surface-Atmosphere exchange (NEON) – Chris Florian, Claire Lunch, David Durden
PhenoCam data analysis – Bijan Seyednasrollah, Andrew Richardson, Adam Young
EC Flux processing including methane (GitHub) – Gavin McNicol
Google Earth Engine create-your-own site description – Gabriela Shirkey
1:30p Data Group Project in sub-groups
3:00 Presentation of Group Work and Closing Remarks
3:30 Leave for NEON HQ
4:00 Visit at NEON
NEON Calibration/Validation Lab
NEON Airborne Observation Platform Lab (data demo and calibration lab)
Tower & Mobile Deployment Platform (test towers behind building)
5:30p Social Event @ local brewery near NEON HQ – no host

Before the Workshop

To participant in this NEON Flux data portion of this workshop, you will need a laptop with the most current version of R, and preferably RStudio, loaded on your computer. Prior experience coding in R is ideal but not required.

Install R Packages

Once R is installed, use the following code to install the packages we will be using in the workshop:

install.packages('BiocManager')
BiocManager::install('rhdf5')
install.packages('neonUtilities')

If you have trouble with any of the installations, please contact Claire Lunch.

[[nid:6408]]

Workshop Instructors & Helpers

  • Chris Florian; Research Scientist, Surface-Atmosphere Exchange – Coordination; NEON program, Battelle
  • Claire Lunch; @dr_lunch, Research Scientist, Data Science; NEON program, Battelle
  • David Durden; Research Scientist, Surface-Atmosphere Exchange – Data Science; NEON program, Battelle

Twitter?

Please tweet using the hashtag #NEONData & @NEON_Sci during this workshop!

Tutorials

The workshop materials are based on the following tutorials and they can be used for reference after the workshop. We will deviate from these tutorials based on the pace of the workshop and specific questions that arise, therefore, please do not work ahead in the tutorials as this will likely cause a discrepancy between your code and the live coding in the workshop.

Download and Explore NEON Data

This tutorial covers downloading NEON data, using the Data Portal and either the neonUtilities R package or the neonutilities Python package, as well as basic instruction in beginning to explore and work with the downloaded data, including guidance in navigating data documentation. We will explore data of 3 different types, and make a simple figure from each.

NEON data

There are 3 basic categories of NEON data:

  1. Remote sensing (AOP) - Data collected by the airborne observation platform, e.g. LIDAR, surface reflectance
  2. Observational (OS) - Data collected by a human in the field, or in an analytical laboratory, e.g. beetle identification, foliar isotopes
  3. Instrumentation (IS) - Data collected by an automated, streaming sensor, e.g.  net radiation, soil carbon dioxide. This category also includes the surface-atmosphere exchange (SAE) data, which are processed and structured in a unique way, distinct from other instrumentation data (see the introductory eddy flux data tutorial for details).

This lesson covers all three types of data. The download procedures are similar for all types, but data navigation differs significantly by type.

Objectives

After completing this activity, you will be able to:

  • Download NEON data using the neonUtilities package.
  • Understand downloaded data sets and load them into R or Python for analyses.

Things You’ll Need To Complete This Tutorial

You can follow either the R or Python code throughout this tutorial. * For R users, we recommend using R version 4+ and RStudio. * For Python users, we recommend using Python 3.9+.

Set up: Install Packages

Packages only need to be installed once, you can skip this step after the first time:

R

  • neonUtilities: Basic functions for accessing NEON data
  • neonOS: Functions for common data wrangling needs for NEON observational data.
  • terra: Spatial data package; needed for working with remote sensing data.
install.packages("neonUtilities")
install.packages("neonOS")
install.packages("terra")

Python

  • neonutilities: Basic functions for accessing NEON data
  • rasterio: Spatial data package; needed for working with remote sensing data.
pip install neonutilities
pip install rasterio

Additional Resources

  • GitHub repository for neonUtilities R package
  • GitHub repository for neonutilities Python package
  • neonUtilities cheat sheet. A quick reference guide for users. Focuses on the R package, but applicable to Python as well.

Set up: Load packages

R

library(neonUtilities)
library(neonOS)
library(terra)

Python


import neonutilities as nu
import os
import rasterio
import pandas as pd
import matplotlib.pyplot as plt

Getting started: Download data from the Portal

Go to the NEON Data Portal and download some data! To follow the tutorial exactly, download Photosynthetically active radiation (PAR) (DP1.00024.001) data from September-November 2019 at Wind River Experimental Forest (WREF). The downloaded file should be a zip file named NEON_par.zip.

If you prefer to explore a different data product, you can still follow this tutorial. But it will be easier to understand the steps in the tutorial, particularly the data navigation, if you choose a sensor data product for this section.

Once you’ve downloaded a zip file of data from the portal, switch over to R or Python to proceed with coding.

Stack the downloaded data files: stackByTable()

The stackByTable() (or stack_by_table()) function will unzip and join the files in the downloaded zip file.

R

# Modify the file path to match the path to your zip file
stackByTable("~/Downloads/NEON_par.zip")

Python

# Modify the file path to match the path to your zip file
nu.stack_by_table(os.path.expanduser("~/Downloads/NEON_par.zip"))

In the directory where the zipped file was saved, you should now have an unzipped folder of the same name. When you open this you will see a new folder called stackedFiles, which should contain at least seven files: PARPAR_30min.csv, PARPAR_1min.csv, sensor_positions.csv, variables_00024.csv, readme_00024.txt, issueLog_00024.csv, and citation_00024_RELEASE-202X.txt.

Navigate data downloads: IS

Let’s start with a brief description of each file. This set of files is typical of a NEON IS data product.

  • PARPAR_30min.csv: PAR data at 30-minute averaging intervals
  • PARPAR_1min.csv: PAR data at 1-minute averaging intervals
  • sensor_positions.csv: The physical location of each sensor collecting PAR measurements. There is a PAR sensor at each level of the WREF tower, and this table lets you connect the tower level index to the height of the sensor in meters.
  • variables_00024.csv: Definitions and units for each data field in the PARPAR_#min tables.
  • readme_00024.txt: Basic information about the PAR data product.
  • issueLog_00024.csv: A record of known issues associated with PAR data.
  • citation_00024_RELEASE-202X.txt: The citation to use when you publish a paper using these data, in BibTeX format.

We’ll explore the 30-minute data. To read the file, use the function readTableNEON() or read_table_neon(), which uses the variables file to assign data types to each column of data:

R

par30 <- readTableNEON(
  dataFile="~/Downloads/NEON_par_R/stackedFiles/PARPAR_30min.csv", 
  varFile="~/Downloads/NEON_par_R/stackedFiles/variables_00024.csv")
head(par30)
par30 <- readTableNEON(
  dataFile="~/Downloads/NEON_par/stackedFiles/PARPAR_30min.csv", 
  varFile="~/Downloads/NEON_par/stackedFiles/variables_00024.csv")
head(par30)

Python


par30 = nu.read_table_neon(
  data_file=os.path.expanduser(
    "~/Downloads/NEON_par/stackedFiles/PARPAR_30min.csv"), 
  var_file=os.path.expanduser(
    "~/Downloads/NEON_par/stackedFiles/variables_00024.csv"))
# Open the par30 table in the table viewer of your choice

The first four columns are added by stackByTable() when it merges files across sites, months, and tower heights. The column publicationDate is the date-time stamp indicating when the data were published, and the release column indicates which NEON data release the data belong to. For more information about NEON data releases, see the Data Product Revisions and Releases page.

Information about each data column can be found in the variables file, where you can see definitions and units for each column of data.

Plot PAR data

Now that we know what we’re looking at, let’s plot PAR from the top tower level. We’ll use the mean PAR from each averaging interval, and we can see from the sensor positions file that the vertical index 080 corresponds to the highest tower level. To explore the sensor positions data in more depth, see the spatial data tutorial.

R

plot(PARMean~endDateTime, 
     data=par30[which(par30$verticalPosition=="080"),],
     type="l")

Python

par30top = par30[par30.verticalPosition=="080"]
fig, ax = plt.subplots()
ax.plot(par30top.endDateTime, par30top.PARMean)
plt.show()

Looks good! The sun comes up and goes down every day, and some days are cloudy.

Plot more PAR data

To see another layer of data, add PAR from a lower tower level to the plot.

R

plot(PARMean~endDateTime, 
     data=par30[which(par30$verticalPosition=="080"),],
     type="l")

lines(PARMean~endDateTime, data=par30[which(par30$verticalPosition=="020"),], col="orange")

Python

par30low = par30[par30.verticalPosition=="020"]
fig, ax = plt.subplots()
ax.plot(par30top.endDateTime, par30top.PARMean)
ax.plot(par30low.endDateTime, par30low.PARMean)
plt.show()

We can see there is a lot of light attenuation through the canopy.

Download files and load directly to R: loadByProduct()

At the start of this tutorial, we downloaded data from the NEON data portal. NEON also provides an API, and the neonUtilities packages provide methods for downloading programmatically.

The steps we carried out above - downloading from the portal, stacking the downloaded files, and reading in to R or Python - can all be carried out in one step by the neonUtilities function loadByProduct().

To get the same PAR data we worked with above, we would run this line of code using loadByProduct():

R

parlist <- loadByProduct(dpID="DP1.00024.001", 
                         site="WREF", 
                         startdate="2019-09",
                         enddate="2019-11")

Python

parlist = nu.load_by_product(dpid="DP1.00024.001", 
                site="WREF", 
                startdate="2019-09",
                enddate="2019-11")

Explore loaded data

The object returned by loadByProduct() in R is a named list, and the object returned by load_by_product() in Python is a dictionary. The objects contained in the list or dictionary are the same set of tables we ended with after stacking the data from the portal above. You can see this by checking the names of the tables in parlist:

R

names(parlist)
## [1] "citation_00024_RELEASE-2024" "issueLog_00024"             
## [3] "PARPAR_1min"                 "PARPAR_30min"               
## [5] "readme_00024"                "sensor_positions_00024"     
## [7] "variables_00024"

Python

parlist.keys()
## dict_keys(['PARPAR_1min', 'PARPAR_30min', 'citation_00024_RELEASE-2024', 'issueLog_00024', 'readme_00024', 'sensor_positions_00024', 'variables_00024'])

Now let’s walk through the details of the inputs and options in loadByProduct().

This function downloads data from the NEON API, merges the site-by-month files, and loads the resulting data tables into the programming environment, assigning each data type to the appropriate class. This is a popular choice for NEON data users because it ensures you’re always working with the latest data, and it ends with ready-to-use tables. However, if you use it in a workflow you run repeatedly, keep in mind it will re-download the data every time. See below for suggestions on saving the data locally to save time and compute resources.

loadByProduct() works on most observational (OS) and sensor (IS) data, but not on surface-atmosphere exchange (SAE) data and remote sensing (AOP) data. For functions that download AOP data, see the final section in this tutorial. For functions that work with SAE data, see the NEON eddy flux data tutorial.

The inputs to loadByProduct() control which data to download and how to manage the processing. The list below shows the R syntax; in Python, the inputs are the same but all lowercase (e.g. `dpid` instead of `dpID`) and `.` is replaced by `_`.

  • dpID: the data product ID, e.g. DP1.00002.001
  • site: defaults to “all”, meaning all sites with available data; can be a vector of 4-letter NEON site codes, e.g.  c("HARV","CPER","ABBY") (or ["HARV","CPER","ABBY"] in Python)
  • startdate and enddate: defaults to NA, meaning all dates with available data; or a date in the form YYYY-MM, e.g.  2017-06. Since NEON data are provided in month packages, finer scale querying is not available. Both start and end date are inclusive.
  • package: either basic or expanded data package. Expanded data packages generally include additional information about data quality, such as chemical standards and quality flags. Not every data product has an expanded package; if the expanded package is requested but there isn’t one, the basic package will be downloaded.
  • timeIndex: defaults to “all”, to download all data; or the number of minutes in the averaging interval. Only applicable to IS data.
  • release: Specify a NEON data release to download. Defaults to the most recent release plus provisional data. See the release tutorial for more information.
  • include.provisional: T or F: should Provisional data be included in the download? Defaults to F to return only Released data, which are citable by a DOI and do not change over time. Provisional data are subject to change.
  • check.size: T or F: should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set it to F.
  • token: Optional NEON API token for faster downloads. See this tutorial for instructions on using a token.
  • progress: Set to F to turn off progress bars.
  • cloud.mode: Can be set to T if you are working in a cloud environment; enables more efficient data transfer from NEON’s cloud storage.

The dpID is the data product identifier of the data you want to download. The DPID can be found on the Explore Data Products page. It will be in the form DP#.#####.###

Download observational data

To explore observational data, we’ll download aquatic plant chemistry data (DP1.20063.001) from three lake sites: Prairie Lake (PRLA), Suggs Lake (SUGG), and Toolik Lake (TOOK).

R

apchem <- loadByProduct(dpID="DP1.20063.001", 
                  site=c("PRLA","SUGG","TOOK"), 
                  package="expanded",
                  release="RELEASE-2024",
                  check.size=F)

Python

apchem = nu.load_by_product(dpid="DP1.20063.001", 
                  site=["PRLA", "SUGG", "TOOK"], 
                  package="expanded",
                  release="RELEASE-2024",
                  check_size=False)

Navigate data downloads: OS

As we saw above, the object returned by loadByProduct() is a named list of data frames. Let’s check out what’s the same and what’s different from the IS data tables.

R

names(apchem)
##  [1] "apl_biomass"                       "apl_clipHarvest"                  
##  [3] "apl_plantExternalLabDataPerSample" "apl_plantExternalLabQA"           
##  [5] "asi_externalLabPOMSummaryData"     "categoricalCodes_20063"           
##  [7] "citation_20063_RELEASE-2024"       "issueLog_20063"                   
##  [9] "readme_20063"                      "validation_20063"                 
## [11] "variables_20063"

Python

apchem.keys()
## dict_keys(['apl_biomass', 'apl_clipHarvest', 'apl_plantExternalLabDataPerSample', 'apl_plantExternalLabQA', 'asi_externalLabPOMSummaryData', 'categoricalCodes_20063', 'citation_20063_RELEASE-2024', 'issueLog_20063', 'readme_20063', 'validation_20063', 'variables_20063'])

Explore tables

As with the sensor data, we have some data tables and some metadata tables. Most of the metadata files are the same as the sensor data: readme, variables, issueLog, and citation. These files contain the same type of metadata here that they did in the IS data product. Let’s look at the other files:

  • apl_clipHarvest: Data from the clip harvest collection of aquatic plants
  • apl_biomass: Biomass data from the collected plants
  • apl_plantExternalLabDataPerSample: Chemistry data from the collected plants
  • apl_plantExternalLabQA: Quality assurance data from the chemistry analyses
  • asi_externalLabPOMSummaryData: Quality metrics from the chemistry lab
  • validation_20063: For observational data, a major method for ensuring data quality is to control data entry. This file contains information about the data ingest rules applied to each input data field.
  • categoricalCodes_20063: Definitions of each value for categorical data, such as growth form and sample condition

You can work with these tables from the named list object, but many people find it easier to extract each table from the list and work with it as an independent object. To do this, use the list2env() function in R or globals().update() in Python:

R

list2env(apchem, .GlobalEnv)
## <environment: R_GlobalEnv>

Python


globals().update(apchem)

Save data locally

Keep in mind that using loadByProduct() will re-download the data every time you run your code. In some cases this may be desirable, but it can be a waste of time and compute resources. To come back to these data without re-downloading, you’ll want to save the tables locally. The most efficient option is to save the named list in total - this will also preserve the data types.

R

saveRDS(apchem, 
        "~/Downloads/aqu_plant_chem.rds")

Python


# There are a variety of ways to do this in Python; NEON
# doesn't currently have a specific recommendation. If 
# you don't have a data-saving workflow you already use, 
# we suggest you check out the pickle module.

Then you can re-load the object to a programming environment any time.

Other options for saving data locally:

  1. Similar to the workflow we started this tutorial with, but using neonUtilities to download instead of the Portal: Use zipsByProduct() and stackByTable() instead of loadByProduct(). With this option, use the function readTableNEON() to read the files, to get the same column type assignment that loadByProduct() carries out. Details can be found in our neonUtilities tutorial.
  2. Try out the community-developed neonstore package, which is designed for maintaining a local store of the NEON data you use. The neonUtilities function stackFromStore() works with files downloaded by neonstore. See the neonstore tutorial for more information.

Now let’s explore the aquatic plant data. OS data products are simple in that the data are generally tabular, and data volumes are lower than the other NEON data types, but they are complex in that almost all consist of multiple tables containing information collected at different times in different ways. For example, samples collected in the field may be shipped to a laboratory for analysis. Data associated with the field collection will appear in one data table, and the analytical results will appear in another. Complexity in working with OS data usually involves bringing data together from multiple measurements or scales of analysis.

As with the IS data, the variables file can tell you more about the data tables.

OS data products each come with a Data Product User Guide, which can be downloaded with the data, or accessed from the document library on the Data Portal, or the Product Details page for the data product. The User Guide is designed to give a basic introduction to the data product, including a brief summary of the protocol and descriptions of data format and structure.

Explore isotope data

To get started with the aquatic plant chemistry data, let’s take a look at carbon isotope ratios in plants across the three sites we downloaded. The chemical analytes are reported in the apl_plantExternalLabDataPerSample table, and the table is in long format, with one record per sample per analyte, so we’ll subset to only the carbon isotope analyte:

R

boxplot(analyteConcentration~siteID, 
        data=apl_plantExternalLabDataPerSample, 
        subset=analyte=="d13C",
        xlab="Site", ylab="d13C")

Python

apl13C = apl_plantExternalLabDataPerSample[
         apl_plantExternalLabDataPerSample.analyte=="d13C"]
grouped = apl13C.groupby("siteID")["analyteConcentration"]
fig, ax = plt.subplots()
ax.boxplot(x=[group.values for name, group in grouped],
           tick_labels=grouped.groups.keys())
plt.show()

We see plants at Suggs and Toolik are quite low in 13C, with more spread at Toolik than Suggs, and plants at Prairie Lake are relatively enriched. Clearly the next question is what species these data represent. But taxonomic data aren’t present in the apl_plantExternalLabDataPerSample table, they’re in the apl_biomass table. We’ll need to join the two tables to get chemistry by taxon.

Every NEON data product has a Quick Start Guide (QSG), and for OS products it includes a section describing how to join the tables in the data product. Since it’s a pdf file, loadByProduct() doesn’t bring it in, but you can view the Aquatic plant chemistry QSG on the Product Details page. In R, the neonOS package uses the information from the QSGs to provide an automated table-joining function, joinTableNEON().

Explore isotope data by species

R

apct <- joinTableNEON(apl_biomass, 
            apl_plantExternalLabDataPerSample)

Using the merged data, now we can plot carbon isotope ratio for each taxon.

boxplot(analyteConcentration~scientificName, 
        data=apct, subset=analyte=="d13C", 
        xlab=NA, ylab="d13C", 
        las=2, cex.axis=0.7)

Python

There is not yet an equivalent to the neonOS package in Python, so we will code the table join manually, based on the info in the Quick Start Guide:


apct = pd.merge(apl_biomass, 
            apl_plantExternalLabDataPerSample,
            left_on=["siteID", "chemSubsampleID"],
            right_on=["siteID", "sampleID"],
            how="outer")

Using the merged data, now we can plot carbon isotope ratio for each taxon.

apl13Cspp = apct[apct.analyte=="d13C"]
grouped = apl13Cspp.groupby("scientificName")["analyteConcentration"]
fig, ax = plt.subplots()
ax.boxplot(x=[group.values for name, group in grouped],
           tick_labels=grouped.groups.keys())
ax.tick_params(axis='x', labelrotation=90)
plt.show()

And now we can see most of the sampled plants have carbon isotope ratios around -30, with just a few species accounting for most of the more enriched samples.

Download remote sensing data: byFileAOP() and byTileAOP()

Remote sensing data files are very large, so downloading them can take a long time. byFileAOP() and byTileAOP() enable easier programmatic downloads, but be aware it can take a very long time to download large amounts of data.

Input options for the AOP functions are:

  • dpID: the data product ID, e.g. DP1.00002.001
  • site: the 4-letter code of a single site, e.g. HARV
  • year: the 4-digit year to download
  • savepath: the file path you want to download to; defaults to the working directory
  • check.size: T or F: should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set it to F.
  • easting: byTileAOP() only. Vector of easting UTM coordinates whose corresponding tiles you want to download
  • northing: byTileAOP() only. Vector of northing UTM coordinates whose corresponding tiles you want to download
  • buffer: byTileAOP() only. Size in meters of buffer to include around coordinates when deciding which tiles to download
  • token: Optional NEON API token for faster downloads.
  • chunk_size: Only in Python. Set the chunk size of chunked downloads, can improve efficiency in some cases. Defaults to 1 MB.

Here, we’ll download one tile of Ecosystem structure (Canopy Height Model) (DP3.30015.001) from WREF in 2017.

R

byTileAOP(dpID="DP3.30015.001", site="WREF", 
          year=2017,easting=580000, 
          northing=5075000, 
          savepath="~/Downloads")

Python

nu.by_tile_aop(dpid="DP3.30015.001", site="WREF", 
               year=2017,easting=580000, 
               northing=5075000, 
               savepath=os.path.expanduser(
                 "~/Downloads"))

In the directory indicated in savepath, you should now have a folder named DP3.30015.001 with several nested subfolders, leading to a tif file of a canopy height model tile.

Navigate data downloads: AOP

To work with AOP data, the best bet in R is the terra package. It has functionality for most analyses you might want to do. In Python, we’ll use the rasterio package here; explore NEON remote sensing tutorials for more guidance.

First let’s read in the tile we downloaded:

R

chm <- rast("~/Downloads/DP3.30015.001/neon-aop-products/2017/FullSite/D16/2017_WREF_1/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D16_WREF_DP3_580000_5075000_CHM.tif")

Python


chm = rasterio.open(os.path.expanduser("~/Downloads/DP3.30015.001/neon-aop-products/2017/FullSite/D16/2017_WREF_1/L3/DiscreteLidar/CanopyHeightModelGtif/NEON_D16_WREF_DP3_580000_5075000_CHM.tif"))

Plot canopy height model

R

plot(chm, col=topo.colors(6))

Python

plt.imshow(chm.read(1))
plt.show()

Now we can see canopy height across the downloaded tile; the tallest trees are over 60 meters, not surprising in the Pacific Northwest. There is a clearing or clear cut in the lower right quadrant.

Next steps

Now that you’ve learned the basics of downloading and understanding NEON data, where should you go to learn more? There are many more NEON tutorials to explore, including how to align remote sensing and ground-based measurements, a deep dive into the data quality flagging in the sensor data products, and much more. For a recommended suite of tutorials for new users, check out the Getting Started with NEON Data tutorial series.

Use the neonUtilities Package to Access NEON Data

This tutorial provides an overview of functions in the neonUtilities package in R and the neonutilities package in Python. These packages provide a toolbox of basic functionality for working with NEON data.

This tutorial is primarily an index of functions and their inputs; for more in-depth guidance in using these functions to work with NEON data, see the Download and Explore tutorial. If you are already familiar with the neonUtilities package, and need a quick reference guide to function inputs and notation, see the neonUtilities cheat sheet.

Function index

The neonUtilities/neonutilities package contains several functions (use the R and Python tabs to see the syntax in each language):

R

  • stackByTable(): Takes zip files downloaded from the Data Portal or downloaded by zipsByProduct(), unzips them, and joins the monthly files by data table to create a single file per table.
  • zipsByProduct(): A wrapper for the NEON API; downloads data based on data product and site criteria. Stores downloaded data in a format that can then be joined by stackByTable().
  • loadByProduct(): Combines the functionality of zipsByProduct(),
    stackByTable(), and readTableNEON(): Downloads the specified data, stacks the files, and loads the files to the R environment.
  • byFileAOP(): A wrapper for the NEON API; downloads remote sensing data based on data product, site, and year criteria. Preserves the file structure of the original data.
  • byTileAOP(): Downloads remote sensing data for the specified data product, subset to tiles that intersect a list of coordinates.
  • readTableNEON(): Reads NEON data tables into R, using the variables file to assign R classes to each column.
  • getCitation(): Get a BibTeX citation for a particular data product and release.

Python

  • stack_by_table(): Takes zip files downloaded from the Data Portal or downloaded by zips_by_product(), unzips them, and joins the monthly files by data table to create a single file per table.
  • zips_by_product(): A wrapper for the NEON API; downloads data based on data product and site criteria. Stores downloaded data in a format that can then be joined by stack_by_table().
  • load_by_product(): Combines the functionality of zips_by_product(),
    stack_by_table(), and read_table_neon(): Downloads the specified data, stacks the files, and loads the files to the R environment.
  • by_file_aop(): A wrapper for the NEON API; downloads remote sensing data based on data product, site, and year criteria. Preserves the file structure of the original data.
  • by_tile_aop(): Downloads remote sensing data for the specified data product, subset to tiles that intersect a list of coordinates.
  • read_table_neon(): Reads NEON data tables into R, using the variables file to assign R classes to each column.
  • get_citation(): Get a BibTeX citation for a particular data product and release.

If you are only interested in joining data files downloaded from the NEON Data Portal, you will only need to use stackByTable(). Follow the instructions in the first section of the Download and Explore tutorial.

Install and load packages

First, install and load the package. The installation step only needs to be run once, and then periodically to update when new package versions are released. The load step needs to be run every time you run your code.

R

## 
## # install neonUtilities - can skip if already installed
## install.packages("neonUtilities")
## 
## # load neonUtilities
library(neonUtilities)
## 

Python

# install neonutilities - can skip if already installed
# do this in the command line
pip install neonutilities

# load neonutilities in working environment
import neonutilities as nu

Download files and load to working environment

The most popular function in neonUtilities is loadByProduct() (or load_by_product() in neonutilities). This function downloads data from the NEON API, merges the site-by-month files, and loads the resulting data tables into the programming environment, classifying each variable’s data type appropriately. It combines the actions of the zipsByProduct(), stackByTable(), and readTableNEON() functions, described below.

This is a popular choice because it ensures you’re always working with the latest data, and it ends with ready-to-use tables. However, if you use it in a workflow you run repeatedly, keep in mind it will re-download the data every time.

loadByProduct() works on most observational (OS) and sensor (IS) data, but not on surface-atmosphere exchange (SAE) data, remote sensing (AOP) data, and some of the data tables in the microbial data products. For functions that download AOP data, see the byFileAOP() and byTileAOP() sections in this tutorial. For functions that work with SAE data, see the NEON eddy flux data tutorial. SAE functions are not yet available in Python.

The inputs to loadByProduct() control which data to download and how to manage the processing:

R

  • dpID: The data product ID, e.g. DP1.00002.001
  • site: Defaults to “all”, meaning all sites with available data; can be a vector of 4-letter NEON site codes, e.g.  c("HARV","CPER","ABBY").
  • startdate and enddate: Defaults to NA, meaning all dates with available data; or a date in the form YYYY-MM, e.g.  2017-06. Since NEON data are provided in month packages, finer scale querying is not available. Both start and end date are inclusive.
  • package: Either basic or expanded data package. Expanded data packages generally include additional information about data quality, such as chemical standards and quality flags. Not every data product has an expanded package; if the expanded package is requested but there isn’t one, the basic package will be downloaded.
  • timeIndex: Defaults to “all”, to download all data; or the number of minutes in the averaging interval. See example below; only applicable to IS data.
  • release: Specify a particular data Release, e.g.  "RELEASE-2024". Defaults to the most recent Release. For more details and guidance, see the Release and Provisional tutorial.
  • include.provisional: T or F: Should provisional data be downloaded? If release is not specified, set to T to include provisional data in the download. Defaults to F.
  • savepath: the file path you want to download to; defaults to the working directory.
  • check.size: T or F: should the function pause before downloading data and warn you about the size of your download? Defaults to T; if you are using this function within a script or batch process you will want to set it to F.
  • token: Optional API token for faster downloads. See the API token tutorial.
  • nCores: Number of cores to use for parallel processing. Defaults to 1, i.e. no parallelization.

Python

  • dpid: the data product ID, e.g. DP1.00002.001
  • site: defaults to “all”, meaning all sites with available data; can be a list of 4-letter NEON site codes, e.g.  ["HARV","CPER","ABBY"].
  • startdate and enddate: defaults to NA, meaning all dates with available data; or a date in the form YYYY-MM, e.g.  2017-06. Since NEON data are provided in month packages, finer scale querying is not available. Both start and end date are inclusive.
  • package: either basic or expanded data package. Expanded data packages generally include additional information about data quality, such as chemical standards and quality flags. Not every data product has an expanded package; if the expanded package is requested but there isn’t one, the basic package will be downloaded.
  • timeindex: defaults to “all”, to download all data; or the number of minutes in the averaging interval. See example below; only applicable to IS data.
  • release: Specify a particular data Release, e.g.  "RELEASE-2024". Defaults to the most recent Release. For more details and guidance, see the Release and Provisional tutorial.
  • include_provisional: True or False: Should provisional data be downloaded? If release is not specified, set to T to include provisional data in the download. Defaults to F.
  • savepath: the file path you want to download to; defaults to the working directory.
  • check_size: True or False: should the function pause before downloading data and warn you about the size of your download? Defaults to True; if you are using this function within a script or batch process you will want to set it to False.
  • token: Optional API token for faster downloads. See the API token tutorial.
  • cloud_mode: Can be set to True if you are working in a cloud environment; provides more efficient data transfer from NEON cloud storage to other cloud environments.
  • progress: Set to False to omit the progress bar during download and stacking.

The dpID (dpid) is the data product identifier of the data you want to download. The DPID can be found on the Explore Data Products page. It will be in the form DP#.#####.###

Demo data download and read

Let’s get triple-aspirated air temperature data (DP1.00003.001) from Moab and Onaqui (MOAB and ONAQ), from May–August 2018, and name the data object triptemp:

R

triptemp <- loadByProduct(dpID="DP1.00003.001", 
                          site=c("MOAB","ONAQ"),
                          startdate="2018-05", 
                          enddate="2018-08")

Python

triptemp = nu.load_by_product(dpid="DP1.00003.001", 
                              site=["MOAB","ONAQ"],
                              startdate="2018-05", 
                              enddate="2018-08")

View downloaded data

The object returned by loadByProduct() is a named list of data tables, or a dictionary of data tables in Python. To work with each of them, select them from the list.

R

names(triptemp)
## [1] "citation_00003_RELEASE-2024" "issueLog_00003"             
## [3] "readme_00003"                "sensor_positions_00003"     
## [5] "TAAT_1min"                   "TAAT_30min"                 
## [7] "variables_00003"
temp30 <- triptemp$TAAT_30min

If you prefer to extract each table from the list and work with it as an independent object, you can use the list2env() function:

list2env(trip.temp, .GlobalEnv)

Python

triptemp.keys()
## dict_keys(['TAAT_1min', 'TAAT_30min', 'citation_00003_RELEASE-2024', 'issueLog_00003', 'readme_00003', 'sensor_positions_00003', 'variables_00003'])
temp30 = triptemp["TAAT_30min"]

If you prefer to extract each table from the list and work with it as an independent object, you can use
globals().update():

globals().update(triptemp)

For more details about the contents of the data tables and metadata tables, check out the Download and Explore tutorial.

Join data files: stackByTable()

The function stackByTable() joins the month-by-site files from a data download. The output will yield data grouped into new files by table name. For example, the single aspirated air temperature data product contains 1 minute and 30 minute interval data. The output from this function is one .csv with 1 minute data and one .csv with 30 minute data.

Depending on your file size this function may run for a while. For example, in testing for this tutorial, 124 MB of temperature data took about 4 minutes to stack. A progress bar will display while the stacking is in progress.

Download the Data

To stack data from the Portal, first download the data of interest from the NEON Data Portal. To stack data downloaded from the API, see the zipsByProduct() section below.

Your data will download from the Portal in a single zipped file.

The stacking function will only work on zipped Comma Separated Value (.csv) files and not the NEON data stored in other formats (HDF5, etc).

Run stackByTable()

The example data below are single-aspirated air temperature.

To run the stackByTable() function, input the file path to the downloaded and zipped file.

R

# Modify the file path to the file location on your computer
stackByTable(filepath="~neon/data/NEON_temp-air-single.zip")

Python

# Modify the file path to the file location on your computer
nu.stack_by_table(filepath="/neon/data/NEON_temp-air-single.zip")

In the same directory as the zipped file, you should now have an unzipped directory of the same name. When you open this you will see a new directory called stackedFiles. This directory contains one or more .csv files (depends on the data product you are working with) with all the data from the months & sites you downloaded. There will also be a single copy of the associated variables, validation, and sensor_positions files, if applicable (validation files are only available for observational data products, and sensor position files are only available for instrument data products).

These .csv files are now ready for use with the program of your choice.

To read the data tables, we recommend using readTableNEON(), which will assign each column to the appropriate data type, based on the metadata in the variables file. This ensures time stamps and missing data are interpreted correctly.

Load data to environment

R

SAAT30 <- readTableNEON(
  dataFile='~/stackedFiles/SAAT_30min.csv',
  varFile='~/stackedFiles/variables_00002.csv'
)

Python

SAAT30 = nu.read_table_neon(
  dataFile='/stackedFiles/SAAT_30min.csv',
  varFile='/stackedFiles/variables_00002.csv'
)

Other function inputs

Other input options in stackByTable() are:

  • savepath : allows you to specify the file path where you want the stacked files to go, overriding the default. Set to "envt" to load the files to the working environment.
  • saveUnzippedFiles : allows you to keep the unzipped, unstacked files from an intermediate stage of the process; by default they are discarded.

Example usage:

R

stackByTable(filepath="~neon/data/NEON_temp-air-single.zip", 
             savepath="~data/allTemperature", saveUnzippedFiles=T)

tempsing <- stackByTable(filepath="~neon/data/NEON_temp-air-single.zip", savepath="envt", saveUnzippedFiles=F)

Python

nu.stack_by_table(filepath="/neon/data/NEON_temp-air-single.zip", 
                  savepath="/data/allTemperature", 
                  saveUnzippedFiles=True)

tempsing <- nu.stack_by_table(filepath="/neon/data/NEON_temp-air-single.zip", savepath="envt", saveUnzippedFiles=False)

Download files to be stacked: zipsByProduct()

The function zipsByProduct() is a wrapper for the NEON API, it downloads zip files for the data product specified and stores them in a format that can then be passed on to stackByTable().

Input options for zipsByProduct() are the same as those for loadByProduct() described above.

Here, we’ll download single-aspirated air temperature (DP1.00002.001) data from Wind River Experimental Forest (WREF) for April and May of 2019.

R

zipsByProduct(dpID="DP1.00002.001", site="WREF", 
              startdate="2019-04", enddate="2019-05",
              package="basic", check.size=T)

Downloaded files can now be passed to stackByTable() to be stacked.

stackByTable(filepath=paste(getwd(), 
                            "/filesToStack00002", 
                            sep=""))

Python

nu.zips_by_product(dpid="DP1.00002.001", site="WREF", 
                   startdate="2019-04", enddate="2019-05",
                   package="basic", check_size=True)

Downloaded files can now be passed to stackByTable() to be stacked.

nu.stack_by_table(filepath=os.getcwd()+
                  "/filesToStack00002")

For many sensor data products, download sizes can get very large, and stackByTable() takes a long time. The 1-minute or 2-minute files are much larger than the longer averaging intervals, so if you don’t need high- frequency data, the timeIndex input option lets you choose which averaging interval to download.

This option is only applicable to sensor (IS) data, since OS data are not averaged.

Download by averaging interval

Download only the 30-minute data for single-aspirated air temperature at WREF:

R

zipsByProduct(dpID="DP1.00002.001", site="WREF", 
              startdate="2019-04", enddate="2019-05",
              package="basic", timeIndex=30, 
              check.size=T)

Python

nu.zips_by_product(dpid="DP1.00002.001", site="WREF", 
                   startdate="2019-04", 
                   enddate="2019-05", package="basic", 
                   timeindex=30, check_size=True)

The 30-minute files can be stacked and loaded as usual.

Download remote sensing files

Remote sensing data files can be very large, and NEON remote sensing (AOP) data are stored in a directory structure that makes them easier to navigate. byFileAOP() downloads AOP files from the API while preserving their directory structure. This provides a convenient way to access AOP data programmatically.

Be aware that downloads from byFileAOP() can take a VERY long time, depending on the data you request and your connection speed. You may need to run the function and then leave your machine on and downloading for an extended period of time.

Here the example download is the Ecosystem Structure data product at Hop Brook (HOPB) in 2017; we use this as the example because it’s a relatively small year-site-product combination.

R

byFileAOP("DP3.30015.001", site="HOPB", 
          year=2017, check.size=T)

Python

nu.by_file_aop(dpid="DP3.30015.001", 
               site="HOPB", year=2017, 
               check_size=True)

The files should now be downloaded to a new folder in your working directory.

Download remote sensing files for specific coordinates

Often when using remote sensing data, we only want data covering a certain area - usually the area where we have coordinated ground sampling. byTileAOP() queries for data tiles containing a specified list of coordinates. It only works for the tiled, AKA mosaicked, versions of the remote sensing data, i.e. the ones with data product IDs beginning with “DP3”.

Here, we’ll download tiles of vegetation indices data (DP3.30026.001) corresponding to select observational sampling plots. For more information about accessing NEON spatial data, see the API tutorial and the in-development geoNEON package.

For now, assume we’ve used the API to look up the plot centroids of plots SOAP_009 and SOAP_011 at the Soaproot Saddle site. You can also look these up in the Spatial Data folder of the document library. The coordinates of the two plots in UTMs are 298755,4101405 and 299296,4101461. These are 40x40m plots, so in looking for tiles that contain the plots, we want to include a 20m buffer. The “buffer” is actually a square, it’s a delta applied equally to both the easting and northing coordinates.

R

byTileAOP(dpID="DP3.30026.001", site="SOAP", 
          year=2018, easting=c(298755,299296),
          northing=c(4101405,4101461),
          buffer=20)

Python

nu.by_tile_aop(dpid="DP3.30026.001", 
               site="SOAP", year=2018, 
               easting=[298755,299296],
               northing=[4101405,4101461],
               buffer=20)

The 2 tiles covering the SOAP_009 and SOAP_011 plots have
been downloaded.

Introduction to working with NEON eddy flux data

This data tutorial provides an introduction to working with NEON eddy flux data, using the neonUtilities R package or the neonutilities Python package. If you are new to NEON data, we recommend starting with a more general tutorial, such as the neonUtilities tutorial or the Download and Explore tutorial. Some of the functions and techniques described in those tutorials will be used here, as well as functions and data formats that are unique to the eddy flux system.

This tutorial assumes general familiarity with eddy flux data and associated concepts.

1. Setup and data download

Start by installing and loading packages and setting options.

R

To work with the NEON flux data, we need the rhdf5 package, which is hosted on Bioconductor, and requires a different installation process than CRAN packages:

install.packages('BiocManager')
BiocManager::install('rhdf5')
install.packages('neonUtilities')
install.packages('ggplot2')

Now load packages.

library(neonUtilities)
library(ggplot2)

Python

The necessary helper packages should be installed automatically when we install neonUtilities. We’ll also use several modules that are installed automatically with standard Python installations.

pip install neonutilities

Now load packages.


import neonutilities as nu
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from datetime import datetime
import os

Download data

R

Use the zipsByProduct() function from the neonUtilities package to download flux data from two sites and two months. The transformations and functions below will work on any time range and site(s), but two sites and two months allows us to see all the available functionality while minimizing download size.

Inputs to the zipsByProduct() function for this tutorial:

  • dpID: DP4.00200.001, the bundled eddy covariance product
  • package: basic (the expanded package is not covered in this tutorial)
  • site: NIWO = Niwot Ridge and HARV = Harvard Forest
  • startdate: 2018-06 (both dates are inclusive)
  • enddate: 2018-07 (both dates are inclusive)
  • savepath: path to download to; here we use the working directory
  • check.size: T if you want to see file size before downloading, otherwise F

The download may take a while, especially if you’re on a slow network. For faster downloads, consider using an API token.

zipsByProduct(dpID="DP4.00200.001", package="basic", 
              site=c("NIWO", "HARV"), 
              startdate="2018-06", enddate="2018-07",
              savepath=getwd(), 
              check.size=F)

Python

Use the zips_by_product() function from the neonutilities package to download flux data from two sites and two months. The transformations and functions below will work on any time range and site(s), but two sites and two months allows us to see all the available functionality while minimizing download size.

Inputs to the zips_by_product() function for this tutorial:

  • dpid: DP4.00200.001, the bundled eddy covariance product
  • package: basic (the expanded package is not covered in this tutorial)
  • site: NIWO = Niwot Ridge and HARV = Harvard Forest
  • startdate: 2018-06 (both dates are inclusive)
  • enddate: 2018-07 (both dates are inclusive)
  • savepath: path to download to; here we use the working directory
  • check_size: T if you want to see file size before downloading, otherwise F

The download may take a while, especially if you’re on a slow network. For faster downloads, consider using an API token.

nu.zips_by_product(dpid="DP4.00200.001", package="basic", 
              site=["NIWO","HARV"], 
              startdate="2018-06", enddate="2018-07",
              savepath=os.getcwd(), 
              check_size=False)

2. Data Levels

There are five levels of data contained in the eddy flux bundle. For full details, refer to the NEON algorithm document.

Briefly, the data levels are:

  • Level 0’ (dp0p): Calibrated raw observations
  • Level 1 (dp01): Time-aggregated observations, e.g. 30-minute mean gas concentrations
  • Level 2 (dp02): Time-interpolated data, e.g. rate of change of a gas concentration
  • Level 3 (dp03): Spatially interpolated data, i.e. vertical profiles
  • Level 4 (dp04): Fluxes

The dp0p data are available in the expanded data package and are beyond the scope of this tutorial.

The dp02 and dp03 data are used in storage calculations, and the dp04 data include both the storage and turbulent components. Since many users will want to focus on the net flux data, we’ll start there.

3. Extract Level 4 data (Fluxes!)

R

To extract the Level 4 data from the HDF5 files and merge them into a single table, we’ll use the stackEddy() function from the neonUtilities package.

stackEddy() requires two inputs:

  • filepath: Path to a file or folder, which can be any one of:
    1. A zip file of eddy flux data downloaded from the NEON data portal
    2. A folder of eddy flux data downloaded by the zipsByProduct() function
    3. The folder of files resulting from unzipping either of 1 or 2
    4. One or more HDF5 files of NEON eddy flux data
  • level: dp01-4

Input the filepath you downloaded to using zipsByProduct() earlier, including the filestoStack00200 folder created by the function, and dp04:

flux <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
                 level="dp04")
## Extracting data
## Joining data variables by file
## Stacking files by month
## Getting metadata tables

We now have an object called flux. It’s a named list containing four tables: one table for each site’s data, and variables and objDesc tables.

names(flux)
## [1] "HARV"                        "NIWO"                       
## [3] "variables"                   "objDesc"                    
## [5] "issueLog"                    "citation_00200_RELEASE-2026"

Let’s look at the contents of one of the site data files:

head(flux$NIWO)
##               timeBgn             timeEnd data.fluxCo2.nsae.flux
## 1 2018-06-01 00:00:00 2018-06-01 00:29:59              0.1895604
## 2 2018-06-01 00:30:00 2018-06-01 00:59:59              0.9429929
## 3 2018-06-01 01:00:00 2018-06-01 01:29:59              0.5101887
## 4 2018-06-01 01:30:00 2018-06-01 01:59:59              0.7980803
## 5 2018-06-01 02:00:00 2018-06-01 02:29:59              0.4345524
## 6 2018-06-01 02:30:00 2018-06-01 02:59:59              0.8440311
##   data.fluxCo2.stor.flux data.fluxCo2.turb.flux data.fluxH2o.nsae.flux
## 1            -0.06349451              0.2530549              16.428558
## 2             0.09749592              0.8454970               7.965026
## 3             0.02525844              0.4849303               5.387009
## 4             0.23341588              0.5646644               9.933795
## 5             0.17415575              0.2603966               2.963931
## 6            -0.01496958              0.8590007               4.488221
##   data.fluxH2o.stor.flux data.fluxH2o.turb.flux data.fluxMome.turb.veloFric
## 1             3.34458774              13.083971                   0.2025301
## 2            -1.51509573               9.480122                   0.1922362
## 3            -4.26233930               9.649348                   0.1197608
## 4             0.15083070               9.782964                   0.1190342
## 5            -0.01668207               2.980613                   0.1589262
## 6            -0.52273792               5.010959                   0.1115558
##   data.fluxTemp.nsae.flux data.fluxTemp.stor.flux data.fluxTemp.turb.flux
## 1               4.7823169              -1.4575674               6.2398843
## 2              -0.5981044               0.3404014              -0.9385058
## 3              -4.3428949               0.1870749              -4.5299698
## 4             -12.6516137              -2.4904875             -10.1611262
## 5              -5.0544306              -0.7514722              -4.3029584
## 6              -7.6999704              -1.9047428              -5.7952276
##   data.foot.stat.angZaxsErth data.foot.stat.distReso
## 1                    94.2262                    8.34
## 2                   355.4252                    8.34
## 3                   359.8013                    8.34
## 4                   137.7743                    8.34
## 5                   188.4800                    8.34
## 6                   183.1920                    8.34
##   data.foot.stat.veloYaxsHorSd data.foot.stat.veloZaxsHorSd
## 1                    0.7955893                    0.2713232
## 2                    0.8590177                    0.2300000
## 3                    1.2601763                    0.2300000
## 4                    0.7332641                    0.2300000
## 5                    0.7096286                    0.2300000
## 6                    0.3789859                    0.2300000
##   data.foot.stat.veloFric data.foot.stat.distZaxsMeasDisp
## 1               0.2025428                            8.34
## 2               0.2000000                            8.34
## 3               0.2000000                            8.34
## 4               0.2000000                            8.34
## 5               0.2000000                            8.34
## 6               0.2000000                            8.34
##   data.foot.stat.distZaxsRgh data.foot.stat.distObkv data.foot.stat.paraStbl
## 1                 0.04083968               -72.43385            -0.115139545
## 2                 0.29601180               965.24689             0.008640276
## 3                 0.23002973                27.47032             0.303600430
## 4                 0.83400000                10.29068             0.810442357
## 5                 0.83400000                58.07739             0.143601482
## 6                 0.83400000                15.03590             0.554672317
##   data.foot.stat.distZaxsAbl data.foot.stat.distXaxs90
## 1                       1000                    325.26
## 2                       1000                    258.54
## 3                       1000                    275.22
## 4                       1000                    208.50
## 5                       1000                    208.50
## 6                       1000                    208.50
##   data.foot.stat.distXaxsMax data.foot.stat.distYaxs90 qfqm.fluxCo2.nsae.qfFinl
## 1                     133.44                     25.02                        1
## 2                     108.42                     50.04                        1
## 3                     108.42                     75.06                        1
## 4                      83.40                     75.06                        1
## 5                      83.40                     66.72                        1
## 6                      83.40                     41.70                        1
##   qfqm.fluxCo2.stor.qfFinl qfqm.fluxCo2.turb.qfFinl qfqm.fluxH2o.nsae.qfFinl
## 1                        1                        1                        1
## 2                        1                        1                        1
## 3                        1                        1                        1
## 4                        1                        1                        1
## 5                        1                        1                        1
## 6                        1                        1                        1
##   qfqm.fluxH2o.stor.qfFinl qfqm.fluxH2o.turb.qfFinl qfqm.fluxMome.turb.qfFinl
## 1                        1                        1                         0
## 2                        0                        1                         0
## 3                        0                        1                         1
## 4                        0                        1                         1
## 5                        0                        1                         0
## 6                        0                        1                         0
##   qfqm.fluxTemp.nsae.qfFinl qfqm.fluxTemp.stor.qfFinl qfqm.fluxTemp.turb.qfFinl
## 1                         1                         0                         1
## 2                         1                         0                         1
## 3                         1                         0                         1
## 4                         1                         0                         1
## 5                         1                         0                         1
## 6                         1                         0                         1
##   qfqm.foot.turb.qfFinl
## 1                     0
## 2                     0
## 3                     0
## 4                     0
## 5                     0
## 6                     0

The variables and objDesc tables can help you interpret the column headers in the data table. The objDesc table contains definitions for many of the terms used in the eddy flux data product, but it isn’t complete. To get the terms of interest, we’ll break up the column headers into individual terms and look for them in the objDesc table:

term <- unlist(strsplit(names(flux$NIWO), split=".", fixed=T))
flux$objDesc[which(flux$objDesc$X...Object %in% term),]
##           X...Object
## 22       angZaxsErth
## 84              data
## 91          distReso
## 92        distXaxs90
## 93       distXaxsMax
## 94        distYaxs90
## 95       distZaxsAbl
## 100 distZaxsMeasDisp
## 101      distZaxsRgh
## 125             flux
## 126          fluxCo2
## 128          fluxH2o
## 130         fluxMome
## 132         fluxTemp
## 133             foot
## 200             nsae
## 244           qfFinl
## 274             qfqm
## 451             stat
## 453             stor
## 474          timeBgn
## 475          timeEnd
## 477             turb
## 505         veloFric
## 512    veloYaxsHorSd
## 515    veloZaxsHorSd
##                                                                                                           Description
## 22                                                                                                     Wind direction
## 84                                                                                             Represents data fields
## 91                                                                      Resolution of vector of distances (units = m)
## 92                                     Distance of 90% contriubtion of footprint aligned with dominant wind direction
## 93                                        Max distance of footprint contriubtion aligned with dominant wind direction
## 94  Distance of 90% contriubtion of footprint aligned with direction that is perpendicular to dominant wind direction
## 95                                              Atmospheric boundary layer height used in flux footprint calculations
## 100                                                                     Measurement height minus displacement height 
## 101                                                               Roughness length used in flux footprint calculation
## 125                                                                                             General term for flux
## 126                                                                     CO2 flux (can be turbulent, storage, or nsae)
## 128                                      H2O flux (can be turbulent, storage, or nsae)  - relates to latent heat flux
## 130                                                                Momentum flux (can be turbulent, storage, or nsae)
## 132                             Temperature flux (can be turbulent, storage, or nsae) - relates to sensible heat flux
## 133                                                                                                         Footprint
## 200                                                       Net surface atmospher exchange (turbulent + storage fluxes)
## 244         The final quality flag indicating if the data are valid for the given aggregation period (1=fail, 0=pass)
## 274   Quality flag and quality metrics, represents quality flags and quality metrics that accompany the provided data
## 451                                                                                                        Statistics
## 453                                                                                                           Storage
## 474                                                                      The beginning time of the aggregation period
## 475                                                                            The end time of the aggregation period
## 477                                                                                                         Turbulent
## 505                                                                           Friction velocity/ustar (units = m s-1)
## 512                                        Standard deviation of horizontal wind speed in Y direction (units = m s-1)
## 515                                        Standard deviation of horizontal wind speed in X direction (units = m s-1)

For the terms that aren’t captured here, fluxCo2, fluxH2o, and fluxTemp are self-explanatory. The flux components are

  • turb: Turbulent flux
  • stor: Storage
  • nsae: Net surface-atmosphere exchange

The variables table contains the units for each field:

flux$variables
##    category   system variable             stat           units
## 1      data  fluxCo2     nsae          timeBgn              NA
## 2      data  fluxCo2     nsae          timeEnd              NA
## 3      data  fluxCo2     nsae             flux umolCo2 m-2 s-1
## 4      data  fluxCo2     stor          timeBgn              NA
## 5      data  fluxCo2     stor          timeEnd              NA
## 6      data  fluxCo2     stor             flux umolCo2 m-2 s-1
## 7      data  fluxCo2     turb          timeBgn              NA
## 8      data  fluxCo2     turb          timeEnd              NA
## 9      data  fluxCo2     turb             flux umolCo2 m-2 s-1
## 10     data  fluxH2o     nsae          timeBgn              NA
## 11     data  fluxH2o     nsae          timeEnd              NA
## 12     data  fluxH2o     nsae             flux           W m-2
## 13     data  fluxH2o     stor          timeBgn              NA
## 14     data  fluxH2o     stor          timeEnd              NA
## 15     data  fluxH2o     stor             flux           W m-2
## 16     data  fluxH2o     turb          timeBgn              NA
## 17     data  fluxH2o     turb          timeEnd              NA
## 18     data  fluxH2o     turb             flux           W m-2
## 19     data fluxMome     turb          timeBgn              NA
## 20     data fluxMome     turb          timeEnd              NA
## 21     data fluxMome     turb         veloFric           m s-1
## 22     data fluxTemp     nsae          timeBgn              NA
## 23     data fluxTemp     nsae          timeEnd              NA
## 24     data fluxTemp     nsae             flux           W m-2
## 25     data fluxTemp     stor          timeBgn              NA
## 26     data fluxTemp     stor          timeEnd              NA
## 27     data fluxTemp     stor             flux           W m-2
## 28     data fluxTemp     turb          timeBgn              NA
## 29     data fluxTemp     turb          timeEnd              NA
## 30     data fluxTemp     turb             flux           W m-2
## 31     data     foot     stat          timeBgn              NA
## 32     data     foot     stat          timeEnd              NA
## 33     data     foot     stat      angZaxsErth             deg
## 34     data     foot     stat         distReso               m
## 35     data     foot     stat    veloYaxsHorSd           m s-1
## 36     data     foot     stat    veloZaxsHorSd           m s-1
## 37     data     foot     stat         veloFric           m s-1
## 38     data     foot     stat distZaxsMeasDisp               m
## 39     data     foot     stat      distZaxsRgh               m
## 40     data     foot     stat         distObkv               m
## 41     data     foot     stat         paraStbl               -
## 42     data     foot     stat      distZaxsAbl               m
## 43     data     foot     stat       distXaxs90               m
## 44     data     foot     stat      distXaxsMax               m
## 45     data     foot     stat       distYaxs90               m
## 46     qfqm  fluxCo2     nsae          timeBgn              NA
## 47     qfqm  fluxCo2     nsae          timeEnd              NA
## 48     qfqm  fluxCo2     nsae           qfFinl              NA
## 49     qfqm  fluxCo2     stor           qfFinl              NA
## 50     qfqm  fluxCo2     stor          timeBgn              NA
## 51     qfqm  fluxCo2     stor          timeEnd              NA
## 52     qfqm  fluxCo2     turb          timeBgn              NA
## 53     qfqm  fluxCo2     turb          timeEnd              NA
## 54     qfqm  fluxCo2     turb           qfFinl              NA
## 55     qfqm  fluxH2o     nsae          timeBgn              NA
## 56     qfqm  fluxH2o     nsae          timeEnd              NA
## 57     qfqm  fluxH2o     nsae           qfFinl              NA
## 58     qfqm  fluxH2o     stor           qfFinl              NA
## 59     qfqm  fluxH2o     stor          timeBgn              NA
## 60     qfqm  fluxH2o     stor          timeEnd              NA
## 61     qfqm  fluxH2o     turb          timeBgn              NA
## 62     qfqm  fluxH2o     turb          timeEnd              NA
## 63     qfqm  fluxH2o     turb           qfFinl              NA
## 64     qfqm fluxMome     turb          timeBgn              NA
## 65     qfqm fluxMome     turb          timeEnd              NA
## 66     qfqm fluxMome     turb           qfFinl              NA
## 67     qfqm fluxTemp     nsae          timeBgn              NA
## 68     qfqm fluxTemp     nsae          timeEnd              NA
## 69     qfqm fluxTemp     nsae           qfFinl              NA
## 70     qfqm fluxTemp     stor           qfFinl              NA
## 71     qfqm fluxTemp     stor          timeBgn              NA
## 72     qfqm fluxTemp     stor          timeEnd              NA
## 73     qfqm fluxTemp     turb          timeBgn              NA
## 74     qfqm fluxTemp     turb          timeEnd              NA
## 75     qfqm fluxTemp     turb           qfFinl              NA
## 76     qfqm     foot     turb          timeBgn              NA
## 77     qfqm     foot     turb          timeEnd              NA
## 78     qfqm     foot     turb           qfFinl              NA

Python

To extract the Level 4 data from the HDF5 files and merge them into a single table, we’ll use the stack_eddy() function from the neonutilities package.

stack_eddy() requires two inputs:

  • filepath: Path to a file or folder, which can be any one of:
    1. A zip file of eddy flux data downloaded from the NEON data portal
    2. A folder of eddy flux data downloaded by the zips_by_product() function
    3. The folder of files resulting from unzipping either of 1 or 2
    4. One or more HDF5 files of NEON eddy flux data
  • level: dp01-4

Input the filepath you downloaded to using zips_by_product() earlier, including the filestoStack00200 folder created by the function, and dp04:

flux = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
                 level="dp04")

We now have an object called flux. It’s a dictionary containing four tables: one table for each site’s data, and variables and objDesc tables.

flux.keys()
## dict_keys(['HARV', 'NIWO', 'variables', 'objDesc', 'issueLog', 'citation_00200_RELEASE-2026'])

Let’s look at the contents of one of the site data files:

flux['NIWO'].head
## <bound method NDFrame.head of                        timeBgn  ... qfqm.foot.turb.qfFinl
## 0    2018-06-01 00:00:00+00:00  ...                     0
## 1    2018-06-01 00:30:00+00:00  ...                     0
## 2    2018-06-01 01:00:00+00:00  ...                     0
## 3    2018-06-01 01:30:00+00:00  ...                     0
## 4    2018-06-01 02:00:00+00:00  ...                     0
## ...                        ...  ...                   ...
## 2923 2018-07-31 21:30:00+00:00  ...                     0
## 2924 2018-07-31 22:00:00+00:00  ...                     0
## 2925 2018-07-31 22:30:00+00:00  ...                     0
## 2926 2018-07-31 23:00:00+00:00  ...                     0
## 2927 2018-07-31 23:30:00+00:00  ...                     0
## 
## [2928 rows x 36 columns]>

The variables and objDesc tables can help you interpret the column headers in the data table. The objDesc table contains definitions for many of the terms used in the eddy flux data product, but it isn’t complete. To get the terms of interest, we’ll break up the column headers into individual terms and look for them in the objDesc table:

termlist = [f.split('.') for f in flux['NIWO'].keys()]
term = [t for sublist in termlist for t in sublist]

objDesc = flux['objDesc'] objDesc[objDesc['Object'].isin(term)]

##                Object                                        Description
## 21        angZaxsErth                                     Wind direction
## 83               data                             Represents data fields
## 90           distReso      Resolution of vector of distances (units = m)
## 91         distXaxs90  Distance of 90% contriubtion of footprint alig...
## 92        distXaxsMax  Max distance of footprint contriubtion aligned...
## 93         distYaxs90  Distance of 90% contriubtion of footprint alig...
## 94        distZaxsAbl  Atmospheric boundary layer height used in flux...
## 99   distZaxsMeasDisp      Measurement height minus displacement height 
## 100       distZaxsRgh  Roughness length used in flux footprint calcul...
## 124              flux                              General term for flux
## 125           fluxCo2      CO2 flux (can be turbulent, storage, or nsae)
## 127           fluxH2o  H2O flux (can be turbulent, storage, or nsae) ...
## 129          fluxMome  Momentum flux (can be turbulent, storage, or n...
## 131          fluxTemp  Temperature flux (can be turbulent, storage, o...
## 132              foot                                          Footprint
## 199              nsae  Net surface atmospher exchange (turbulent + st...
## 243            qfFinl  The final quality flag indicating if the data ...
## 273              qfqm  Quality flag and quality metrics, represents q...
## 450              stat                                         Statistics
## 452              stor                                            Storage
## 473           timeBgn       The beginning time of the aggregation period
## 474           timeEnd             The end time of the aggregation period
## 476              turb                                          Turbulent
## 504          veloFric            Friction velocity/ustar (units = m s-1)
## 511     veloYaxsHorSd  Standard deviation of horizontal wind speed in...
## 514     veloZaxsHorSd  Standard deviation of horizontal wind speed in...

For the terms that aren’t captured here, fluxCo2, fluxH2o, and fluxTemp are self-explanatory. The flux components are

  • turb: Turbulent flux
  • stor: Storage
  • nsae: Net surface-atmosphere exchange

The variables table contains the units for each field:

flux['variables']
##    category    system variable     stat            units
## 0      data   fluxCo2     nsae  timeBgn               NA
## 1      data   fluxCo2     nsae  timeEnd               NA
## 2      data   fluxCo2     nsae     flux  umolCo2 m-2 s-1
## 3      data   fluxCo2     stor  timeBgn               NA
## 4      data   fluxCo2     stor  timeEnd               NA
## ..      ...       ...      ...      ...              ...
## 73     qfqm  fluxTemp     turb  timeEnd               NA
## 74     qfqm  fluxTemp     turb   qfFinl               NA
## 75     qfqm      foot     turb  timeBgn               NA
## 76     qfqm      foot     turb  timeEnd               NA
## 77     qfqm      foot     turb   qfFinl               NA
## 
## [78 rows x 5 columns]

Plot fluxes

Let’s plot some data! First, a brief aside about time stamps, since these are time series data.

Time stamps

NEON sensor data come with time stamps for both the start and end of the averaging period. Depending on the analysis you’re doing, you may want to use one or the other; for general plotting, re-formatting, and transformations, I prefer to use the start time, because there are some small inconsistencies between data products in a few of the end time stamps.

Note that all NEON data use UTC time, aka Greenwich Mean Time. This is true across NEON’s instrumented, observational, and airborne measurements. When working with NEON data, it’s best to keep everything in UTC as much as possible, otherwise it’s very easy to end up with data in mismatched times, which can cause insidious and hard-to-detect problems. In the code below, time stamps and time zones have been handled by stackEddy() and loadByProduct(), so we don’t need to do anything additional. But if you’re writing your own code and need to convert times, remember that if the time zone isn’t specified, R will default to the local time zone it detects on your operating system.

R

plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn, 
     pch=".", xlab="Date", ylab="CO2 flux",
     ylim=c(-20,20))

There is a clear diurnal pattern, and an increase in daily carbon uptake as the growing season progresses.

Let’s trim down to just two days of data to see a few other details.

plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn, 
     pch=20, xlab="Date", ylab="CO2 flux",
     xlim=c(as.POSIXct("2018-07-07", tz="GMT"),
            as.POSIXct("2018-07-09", tz="GMT")),
    ylim=c(-20,20), xaxt="n")
axis.POSIXct(1, x=flux$NIWO$timeBgn, 
             format="%Y-%m-%d %H:%M:%S")

Python

plt.scatter(flux['NIWO'].timeBgn, 
            flux['NIWO']['data.fluxCo2.nsae.flux'],
           marker='.')
plt.ylim((-20,20))
## (-20.0, 20.0)
plt.show()

There is a clear diurnal pattern, and an increase in daily carbon uptake as the growing season progresses.

Let’s trim down to just two days of data to see a few other details.

plt.scatter(flux['NIWO'].timeBgn, 
            flux['NIWO']['data.fluxCo2.nsae.flux'],
           marker='.')
plt.ylim((-20,20))
## (-20.0, 20.0)
plt.xlim((datetime.strptime('2018-07-07 00:00:00', '%Y-%m-%d %H:%M:%S'),
          datetime.strptime('2018-07-09 00:00:00', '%Y-%m-%d %H:%M:%S')))
## (np.float64(17719.0), np.float64(17721.0))
plt.show()

Note the timing of C uptake; the UTC time zone is clear here, where uptake occurs at times that appear to be during the night.

4. Merge flux data with other sensor data

Many of the data sets we would use to interpret and model flux data are measured as part of the NEON project, but are not present in the eddy flux data product bundle. In this section, we’ll download PAR data and merge them with the flux data; the steps taken here can be applied to any of the NEON instrumented (IS) data products.

Download PAR data

R

To get NEON PAR data, use the loadByProduct() function from the neonUtilities package. loadByProduct() takes the same inputs as zipsByProduct(), but it loads the downloaded data directly into the current R environment.

Let’s download PAR data matching the Niwot Ridge flux data. The inputs needed are:

  • dpID: DP1.00024.001
  • site: NIWO
  • startdate: 2018-06
  • enddate: 2018-07
  • package: basic
  • timeIndex: 30

The new input here is timeIndex=30, which downloads only the 30-minute data. Since the flux data are at a 30-minute resolution, we can save on download time by disregarding the 1-minute data files (which are of course 30 times larger). The timeIndex input can be left off if you want to download all available averaging intervals.

pr <- loadByProduct("DP1.00024.001", site="NIWO", 
                    timeIndex=30, package="basic", 
                    startdate="2018-06", enddate="2018-07",
                    check.size=F)

Python

To get NEON PAR data, use the load_by_product() function from the neonutilities package. load_by_product() takes the same inputs as zips_by_product(), but it loads the downloaded data directly into the current Python environment.

Let’s download PAR data matching the Niwot Ridge flux data. The inputs needed are:

  • dpid: DP1.00024.001
  • site: NIWO
  • startdate: 2018-06
  • enddate: 2018-07
  • package: basic
  • timeindex: 30

The new input here is timeindex=30, which downloads only the 30-minute data. Since the flux data are at a 30-minute resolution, we can save on download time by disregarding the 1-minute data files (which are of course 30 times larger). The timeindex input can be left off if you want to download all available averaging intervals.

pr = nu.load_by_product("DP1.00024.001", site="NIWO", 
                    timeindex=30, package="basic", 
                    startdate="2018-06", enddate="2018-07",
                    check_size=False)

pr is another named list, and again, metadata and units can be found in the variables table. The PARPAR_30min table contains a verticalPosition field. This field indicates the position on the tower, with 010 being the first tower level, and 020, 030, etc going up the tower.

Join PAR to flux data

We’ll connect PAR data from the tower top to the flux data.

R

pr.top <- pr$PARPAR_30min[which(pr$PARPAR_30min$verticalPosition==
                                max(pr$PARPAR_30min$verticalPosition)),]

As noted above, loadByProduct() automatically converts time stamps to a recognized date-time format when it reads the data. However, the field names for the time stamps differ between the flux data and the other meteorological data: the start of the averaging interval is timeBgn in the flux data and startDateTime in the PAR data.

Let’s create a new variable in the PAR data:

pr.top$timeBgn <- pr.top$startDateTime

And now use the matching time stamp fields to merge the flux and PAR data.

fx.pr <- merge(pr.top, flux$NIWO, by="timeBgn")

And now we can plot net carbon exchange as a function of light availability:

plot(fx.pr$data.fluxCo2.nsae.flux~fx.pr$PARMean,
     pch=".", ylim=c(-20,20),
     xlab="PAR", ylab="CO2 flux")

Python

As noted above, load_by_product() automatically converts time stamps to a recognized date-time format when it reads the data. However, the field names for the time stamps differ between the flux data and the other meteorological data: the start of the averaging interval is timeBgn in the flux data and startDateTime in the PAR data.

Let’s create a new variable in the PAR data and subset to the top of the tower.


par30 = pr['PARPAR_30min']
par30['timeBgn'] = par30['startDateTime']
prtop = par30[par30['verticalPosition']=='040']

And now use the matching time stamp fields to merge the flux and PAR data.


fxpr = pd.merge(flux['NIWO'], prtop, how='outer', on='timeBgn')

And now we can plot net carbon exchange as a function of light availability:

plt.scatter(fxpr.PARMean, 
            fxpr['data.fluxCo2.nsae.flux'],
           marker='.')
plt.ylim((-20,20))
## (-20.0, 20.0)
plt.show()

If you’re interested in data in the eddy covariance bundle besides the net flux data, the rest of this tutorial will guide you through how to get those data out of the bundle.

5. Vertical profile data (Level 3)

The Level 3 (dp03) data are the spatially interpolated profiles of the rates of change of CO2, H2O, and temperature. Extract the Level 3 data from the HDF5 file using stackEddy() with the same syntax as for the Level 4 data.

R

prof <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
                 level="dp03")

As with the Level 4 data, the result is a named list with data tables for each site.

head(prof$NIWO[,1:7])
##               timeBgn             timeEnd data.co2Stor.rateRtioMoleDryCo2.0.1 m
## 1 2018-06-01 00:00:00 2018-06-01 00:29:59                         -2.681938e-04
## 2 2018-06-01 00:30:00 2018-06-01 00:59:59                          1.912094e-05
## 3 2018-06-01 01:00:00 2018-06-01 01:29:59                          3.709964e-04
## 4 2018-06-01 01:30:00 2018-06-01 01:59:59                          2.374054e-03
## 5 2018-06-01 02:00:00 2018-06-01 02:29:59                          3.397046e-03
## 6 2018-06-01 02:30:00 2018-06-01 02:59:59                          3.156952e-03
##   data.co2Stor.rateRtioMoleDryCo2.0.2 m data.co2Stor.rateRtioMoleDryCo2.0.3 m
## 1                         -2.681938e-04                         -2.681938e-04
## 2                          1.912094e-05                          1.912094e-05
## 3                          3.709964e-04                          3.709964e-04
## 4                          2.374054e-03                          2.374054e-03
## 5                          3.397046e-03                          3.397046e-03
## 6                          3.156952e-03                          3.156952e-03
##   data.co2Stor.rateRtioMoleDryCo2.0.4 m data.co2Stor.rateRtioMoleDryCo2.0.5 m
## 1                         -2.681938e-04                         -2.681938e-04
## 2                          1.912094e-05                          1.912094e-05
## 3                          3.709964e-04                          3.709964e-04
## 4                          2.374054e-03                          2.374054e-03
## 5                          3.397046e-03                          3.397046e-03
## 6                          3.156952e-03                          3.156952e-03

Python

Note (Feb 17, 2026): There is a bug in the Python implementation of the Level 3 code. This section may not run correctly, we will update as soon as possible.

prof = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
                 level="dp03")

As with the Level 4 data, the result is a named list with data tables for each site.

prof['NIWO'].head()

6. Un-interpolated vertical profile data (Level 2)

The Level 2 data are interpolated in time but not in space. They contain the rates of change at each of the measurement heights.

R

Again, they can be extracted from the HDF5 files using stackEddy() with the same syntax:

prof.l2 <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
                 level="dp02")
head(prof.l2$HARV)
##   horizontalPosition verticalPosition             timeBgn             timeEnd
## 1                000              010 2018-06-01 00:00:00 2018-06-01 00:29:59
## 2                000              010 2018-06-01 00:30:00 2018-06-01 00:59:59
## 3                000              010 2018-06-01 01:00:00 2018-06-01 01:29:59
## 4                000              010 2018-06-01 01:30:00 2018-06-01 01:59:59
## 5                000              010 2018-06-01 02:00:00 2018-06-01 02:29:59
## 6                000              010 2018-06-01 02:30:00 2018-06-01 02:59:59
##   data.co2Stor.rateRtioMoleDryCo2.mean data.h2oStor.rateRtioMoleDryH2o.mean
## 1                                  NaN                                  NaN
## 2                          0.002666576                                  NaN
## 3                         -0.011224223                                  NaN
## 4                          0.006133056                                  NaN
## 5                         -0.019554655                                  NaN
## 6                         -0.007855632                                  NaN
##   data.tempStor.rateTemp.mean qfqm.co2Stor.rateRtioMoleDryCo2.qfFinl
## 1                2.583333e-05                                      1
## 2               -2.008056e-04                                      1
## 3               -1.901111e-04                                      1
## 4               -7.419444e-05                                      1
## 5               -1.537083e-04                                      1
## 6               -1.874861e-04                                      1
##   qfqm.h2oStor.rateRtioMoleDryH2o.qfFinl qfqm.tempStor.rateTemp.qfFinl
## 1                                      1                             0
## 2                                      1                             0
## 3                                      1                             0
## 4                                      1                             0
## 5                                      1                             0
## 6                                      1                             0

Python

Again, they can be extracted from the HDF5 files using stackeddy() with the same syntax:

profl2 = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
                 level="dp02")
profl2['HARV'].head()
##   horizontalPosition  ... qfqm.tempStor.rateTemp.qfFinl
## 0                000  ...                             0
## 1                000  ...                             0
## 2                000  ...                             0
## 3                000  ...                             0
## 4                000  ...                             0
## 
## [5 rows x 10 columns]

Note that here, as in the PAR data, there is a verticalPosition field. It has the same meaning as in the PAR data, indicating the tower level of the measurement.

7. Calibrated raw data (Level 1)

Level 1 (dp01) data are calibrated, and aggregated in time, but otherwise untransformed. Use Level 1 data for raw gas concentrations and atmospheric stable isotopes.

Extract Level 1 data

R

Using stackEddy() to extract Level 1 data requires additional inputs. The Level 1 files are too large to simply pull out all the variables by default, and they include multiple averaging intervals, which can’t be merged. So two additional inputs are needed:

  • avg: The averaging interval to extract
  • var: One or more variables to extract

What variables are available, at what averaging intervals? Another function in the neonUtilities package, getVarsEddy(), returns a list of HDF5 file contents. It requires only one input, a filepath to a single NEON HDF5 file:

vars <- getVarsEddy(paste0(getwd(),
                           "/filesToStack00200/NEON.D01.HARV.DP4.00200.001.nsae.2018-07.basic.20260115T064348Z.h5"))
head(vars)
##    site level category system hor ver tmi subsys       name       otype
## 5  HARV  dp01     data   amrs 000 060 01m   <NA> angNedXaxs H5I_DATASET
## 6  HARV  dp01     data   amrs 000 060 01m   <NA> angNedYaxs H5I_DATASET
## 7  HARV  dp01     data   amrs 000 060 01m   <NA> angNedZaxs H5I_DATASET
## 9  HARV  dp01     data   amrs 000 060 30m   <NA> angNedXaxs H5I_DATASET
## 10 HARV  dp01     data   amrs 000 060 30m   <NA> angNedYaxs H5I_DATASET
## 11 HARV  dp01     data   amrs 000 060 30m   <NA> angNedZaxs H5I_DATASET
##      dclass   dim  oth
## 5  COMPOUND 44640 <NA>
## 6  COMPOUND 44640 <NA>
## 7  COMPOUND 44640 <NA>
## 9  COMPOUND  1488 <NA>
## 10 COMPOUND  1488 <NA>
## 11 COMPOUND  1488 <NA>

Inputs to var can be any values from the name field in the table returned by getVarsEddy(). Let’s take a look at CO2 and H2O, 13C in CO2 and 18O in H2O, at 30-minute aggregation. Let’s look at Harvard Forest for these data, since deeper canopies generally have more interesting profiles:

iso <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
               level="dp01", var=c("rtioMoleDryCo2","rtioMoleDryH2o",
                                   "dlta13CCo2","dlta18OH2o"), avg=30)
head(iso$HARV[,1:8])
##   horizontalPosition verticalPosition             timeBgn             timeEnd
## 1                000              010 2018-06-01 00:00:00 2018-06-01 00:29:59
## 2                000              010 2018-06-01 00:30:00 2018-06-01 00:59:59
## 3                000              010 2018-06-01 01:00:00 2018-06-01 01:29:59
## 4                000              010 2018-06-01 01:30:00 2018-06-01 01:59:59
## 5                000              010 2018-06-01 02:00:00 2018-06-01 02:29:59
## 6                000              010 2018-06-01 02:30:00 2018-06-01 02:59:59
##   data.co2Stor.rtioMoleDryCo2.mean data.co2Stor.rtioMoleDryCo2.min
## 1                         509.3375                        451.4786
## 2                         502.2736                        463.5470
## 3                         521.6139                        442.8649
## 4                         469.6317                        432.6588
## 5                         484.7725                        436.2842
## 6                         476.8554                        443.7055
##   data.co2Stor.rtioMoleDryCo2.max data.co2Stor.rtioMoleDryCo2.vari
## 1                        579.3518                         845.0795
## 2                        533.6622                         161.3652
## 3                        563.0518                         547.9924
## 4                        508.7463                         396.8379
## 5                        537.4641                         662.9449
## 6                        515.6598                         246.6969

Python

Using stack_eddy() to extract Level 1 data requires additional inputs. The Level 1 files are too large to simply pull out all the variables by default, and they include multiple averaging intervals, which can’t be merged. So two additional inputs are needed:

  • avg: The averaging interval to extract
  • var: One or more variables to extract

Inputs to var can be any values from the name field in the table returned by get_vars_eddy(). Let’s take a look at CO2 and H2O, 13C in CO2 and 18O in H2O, at 30-minute aggregation. Let’s look at Harvard Forest for these data, since deeper canopies generally have more interesting profiles:

iso = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
                 level="dp01", avg=30, 
                    var=["rtioMoleDryCo2","rtioMoleDryH2o",
                          "dlta13CCo2","dlta18OH2o"])
iso['HARV'].head()
##   horizontalPosition  ... ucrt.h2oTurb.rtioMoleDryH2o.se
## 0                000  ...                            NaN
## 1                000  ...                            NaN
## 2                000  ...                            NaN
## 3                000  ...                            NaN
## 4                000  ...                            NaN
## 
## [5 rows x 85 columns]

Plot vertical profiles

Let’s plot vertical profiles of CO2 and 13C in CO2 on a single day.

Here we’ll use the time stamps to select all of the records for a single day. And discard the verticalPosition values that are string values - those are the calibration gases.

R

iso.d <- iso$HARV[grep("2018-06-25", iso$HARV$timeBgn, fixed=T),]
iso.d <- iso.d[-which(is.na(as.numeric(iso.d$verticalPosition))),]

ggplot is well suited to these types of data. We can plot CO2 relative to height on the tower, with separate lines for each time interval.

g <- ggplot(iso.d, aes(y=verticalPosition)) + 
  geom_path(aes(x=data.co2Stor.rtioMoleDryCo2.mean, 
                group=timeBgn, col=timeBgn)) + 
  theme(legend.position="none") + 
  xlab("CO2") + ylab("Tower level")
g

And the same plot for 13C in CO2:

g <- ggplot(iso.d, aes(y=verticalPosition)) + 
  geom_path(aes(x=data.isoCo2.dlta13CCo2.mean, 
                group=timeBgn, col=timeBgn)) + 
  theme(legend.position="none") + 
  xlab("d13C") + ylab("Tower level")
g

Python


isoH = iso['HARV']
isoH6 = isoH[(isoH['timeBgn'] >= '2018-06-25') & (isoH['timeBgn'] < '2018-06-26')]
isod = isoH6[isoH6['horizontalPosition'] == '000']

Let’s plot CO2 relative to height on the tower, with separate lines for each time interval.

fig, ax = plt.subplots()

groups = isod["timeBgn"].unique()

cmap = plt.cm.viridis colors = cmap(np.linspace(0, 1, len(groups)))

for color, group_name in zip(colors, groups): group = isod[isod["timeBgn"] == group_name] ax.plot(group['data.co2Stor.rtioMoleDryCo2.mean'], group['verticalPosition'], color=color)

ax.set_xlabel("CO2") ax.set_ylabel("Tower level")

plt.show()

And the same plot for 13C in CO2:

fig, ax = plt.subplots()

for color, group_name in zip(colors, groups): group = isod[isod["timeBgn"] == group_name] ax.plot(group["data.isoCo2.dlta13CCo2.mean"], group["verticalPosition"], color=color)

ax.set_xlabel("13C") ax.set_ylabel("Tower level")

plt.show()

The legends are omitted for space, see if you can use the concentration and isotope ratio buildup and drawdown below the canopy to work out the times of day the different colors represent.

Share

NSF NEON, Operated by Battelle

Follow Us:

Join Our Newsletter

Get updates on events, opportunities, and how NEON is being used today.

Subscribe Now

Footer

  • About Us
  • Contact Us
  • Terms & Conditions
  • Careers
  • Code of Conduct

Copyright © Battelle, 2026

The National Ecological Observatory Network is a major facility fully funded by the U.S. National Science Foundation.

Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the U.S. National Science Foundation.