Tutorial
Introduction to working with NEON eddy flux data
Authors: [Claire K. Lunch]
Last Updated: Feb 17, 2026
This data tutorial provides an introduction to working with NEON eddy
flux data, using the neonUtilities R package or the
neonutilities Python package. If you are new to NEON data,
we recommend starting with a more general tutorial, such as the
neonUtilities
tutorial or the
Download
and Explore tutorial. Some of the functions and techniques described
in those tutorials will be used here, as well as functions and data
formats that are unique to the eddy flux system.
This tutorial assumes general familiarity with eddy flux data and associated concepts.
1. Setup and data download
Start by installing and loading packages and setting options.
R
To work with the NEON flux data, we need the rhdf5
package, which is hosted on Bioconductor, and requires a different
installation process than CRAN packages:
install.packages('BiocManager')
BiocManager::install('rhdf5')
install.packages('neonUtilities')
install.packages('ggplot2')
Now load packages.
library(neonUtilities)
library(ggplot2)
Python
The necessary helper packages should be installed automatically when
we install neonUtilities. We’ll also use several modules
that are installed automatically with standard Python installations.
pip install neonutilities
Now load packages.
import neonutilities as nu
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from datetime import datetime
import os
Download data
R
Use the zipsByProduct() function from the
neonUtilities package to download flux data from two sites
and two months. The transformations and functions below will work on any
time range and site(s), but two sites and two months allows us to see
all the available functionality while minimizing download size.
Inputs to the zipsByProduct() function for this
tutorial:
dpID: DP4.00200.001, the bundled eddy covariance productpackage: basic (the expanded package is not covered in this tutorial)site: NIWO = Niwot Ridge and HARV = Harvard Foreststartdate: 2018-06 (both dates are inclusive)enddate: 2018-07 (both dates are inclusive)savepath: path to download to; here we use the working directorycheck.size: T if you want to see file size before downloading, otherwise F
The download may take a while, especially if you’re on a slow network. For faster downloads, consider using an API token.
zipsByProduct(dpID="DP4.00200.001", package="basic",
site=c("NIWO", "HARV"),
startdate="2018-06", enddate="2018-07",
savepath=getwd(),
check.size=F)
Python
Use the zips_by_product() function from the
neonutilities package to download flux data from two sites
and two months. The transformations and functions below will work on any
time range and site(s), but two sites and two months allows us to see
all the available functionality while minimizing download size.
Inputs to the zips_by_product() function for this
tutorial:
dpid: DP4.00200.001, the bundled eddy covariance productpackage: basic (the expanded package is not covered in this tutorial)site: NIWO = Niwot Ridge and HARV = Harvard Foreststartdate: 2018-06 (both dates are inclusive)enddate: 2018-07 (both dates are inclusive)savepath: path to download to; here we use the working directorycheck_size: T if you want to see file size before downloading, otherwise F
The download may take a while, especially if you’re on a slow network. For faster downloads, consider using an API token.
nu.zips_by_product(dpid="DP4.00200.001", package="basic",
site=["NIWO","HARV"],
startdate="2018-06", enddate="2018-07",
savepath=os.getcwd(),
check_size=False)
2. Data Levels
There are five levels of data contained in the eddy flux bundle. For full details, refer to the NEON algorithm document.
Briefly, the data levels are:
- Level 0’ (dp0p): Calibrated raw observations
- Level 1 (dp01): Time-aggregated observations, e.g. 30-minute mean gas concentrations
- Level 2 (dp02): Time-interpolated data, e.g. rate of change of a gas concentration
- Level 3 (dp03): Spatially interpolated data, i.e. vertical profiles
- Level 4 (dp04): Fluxes
The dp0p data are available in the expanded data package and are beyond the scope of this tutorial.
The dp02 and dp03 data are used in storage calculations, and the dp04 data include both the storage and turbulent components. Since many users will want to focus on the net flux data, we’ll start there.
3. Extract Level 4 data (Fluxes!)
R
To extract the Level 4 data from the HDF5 files and merge them into a
single table, we’ll use the stackEddy() function from the
neonUtilities package.
stackEddy() requires two inputs:
filepath: Path to a file or folder, which can be any one of:- A zip file of eddy flux data downloaded from the NEON data portal
- A folder of eddy flux data downloaded by the
zipsByProduct()function - The folder of files resulting from unzipping either of 1 or 2
- One or more HDF5 files of NEON eddy flux data
level: dp01-4
Input the filepath you downloaded to using
zipsByProduct() earlier, including the
filestoStack00200 folder created by the function, and
dp04:
flux <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
level="dp04")
## Extracting data
## Joining data variables by file
## Stacking files by month
## Getting metadata tables
We now have an object called flux. It’s a named list
containing four tables: one table for each site’s data, and
variables and objDesc tables.
names(flux)
## [1] "HARV" "NIWO"
## [3] "variables" "objDesc"
## [5] "issueLog" "citation_00200_RELEASE-2026"
Let’s look at the contents of one of the site data files:
head(flux$NIWO)
## timeBgn timeEnd data.fluxCo2.nsae.flux
## 1 2018-06-01 00:00:00 2018-06-01 00:29:59 0.1895604
## 2 2018-06-01 00:30:00 2018-06-01 00:59:59 0.9429929
## 3 2018-06-01 01:00:00 2018-06-01 01:29:59 0.5101887
## 4 2018-06-01 01:30:00 2018-06-01 01:59:59 0.7980803
## 5 2018-06-01 02:00:00 2018-06-01 02:29:59 0.4345524
## 6 2018-06-01 02:30:00 2018-06-01 02:59:59 0.8440311
## data.fluxCo2.stor.flux data.fluxCo2.turb.flux data.fluxH2o.nsae.flux
## 1 -0.06349451 0.2530549 16.428558
## 2 0.09749592 0.8454970 7.965026
## 3 0.02525844 0.4849303 5.387009
## 4 0.23341588 0.5646644 9.933795
## 5 0.17415575 0.2603966 2.963931
## 6 -0.01496958 0.8590007 4.488221
## data.fluxH2o.stor.flux data.fluxH2o.turb.flux data.fluxMome.turb.veloFric
## 1 3.34458774 13.083971 0.2025301
## 2 -1.51509573 9.480122 0.1922362
## 3 -4.26233930 9.649348 0.1197608
## 4 0.15083070 9.782964 0.1190342
## 5 -0.01668207 2.980613 0.1589262
## 6 -0.52273792 5.010959 0.1115558
## data.fluxTemp.nsae.flux data.fluxTemp.stor.flux data.fluxTemp.turb.flux
## 1 4.7823169 -1.4575674 6.2398843
## 2 -0.5981044 0.3404014 -0.9385058
## 3 -4.3428949 0.1870749 -4.5299698
## 4 -12.6516137 -2.4904875 -10.1611262
## 5 -5.0544306 -0.7514722 -4.3029584
## 6 -7.6999704 -1.9047428 -5.7952276
## data.foot.stat.angZaxsErth data.foot.stat.distReso
## 1 94.2262 8.34
## 2 355.4252 8.34
## 3 359.8013 8.34
## 4 137.7743 8.34
## 5 188.4800 8.34
## 6 183.1920 8.34
## data.foot.stat.veloYaxsHorSd data.foot.stat.veloZaxsHorSd
## 1 0.7955893 0.2713232
## 2 0.8590177 0.2300000
## 3 1.2601763 0.2300000
## 4 0.7332641 0.2300000
## 5 0.7096286 0.2300000
## 6 0.3789859 0.2300000
## data.foot.stat.veloFric data.foot.stat.distZaxsMeasDisp
## 1 0.2025428 8.34
## 2 0.2000000 8.34
## 3 0.2000000 8.34
## 4 0.2000000 8.34
## 5 0.2000000 8.34
## 6 0.2000000 8.34
## data.foot.stat.distZaxsRgh data.foot.stat.distObkv data.foot.stat.paraStbl
## 1 0.04083968 -72.43385 -0.115139545
## 2 0.29601180 965.24689 0.008640276
## 3 0.23002973 27.47032 0.303600430
## 4 0.83400000 10.29068 0.810442357
## 5 0.83400000 58.07739 0.143601482
## 6 0.83400000 15.03590 0.554672317
## data.foot.stat.distZaxsAbl data.foot.stat.distXaxs90
## 1 1000 325.26
## 2 1000 258.54
## 3 1000 275.22
## 4 1000 208.50
## 5 1000 208.50
## 6 1000 208.50
## data.foot.stat.distXaxsMax data.foot.stat.distYaxs90 qfqm.fluxCo2.nsae.qfFinl
## 1 133.44 25.02 1
## 2 108.42 50.04 1
## 3 108.42 75.06 1
## 4 83.40 75.06 1
## 5 83.40 66.72 1
## 6 83.40 41.70 1
## qfqm.fluxCo2.stor.qfFinl qfqm.fluxCo2.turb.qfFinl qfqm.fluxH2o.nsae.qfFinl
## 1 1 1 1
## 2 1 1 1
## 3 1 1 1
## 4 1 1 1
## 5 1 1 1
## 6 1 1 1
## qfqm.fluxH2o.stor.qfFinl qfqm.fluxH2o.turb.qfFinl qfqm.fluxMome.turb.qfFinl
## 1 1 1 0
## 2 0 1 0
## 3 0 1 1
## 4 0 1 1
## 5 0 1 0
## 6 0 1 0
## qfqm.fluxTemp.nsae.qfFinl qfqm.fluxTemp.stor.qfFinl qfqm.fluxTemp.turb.qfFinl
## 1 1 0 1
## 2 1 0 1
## 3 1 0 1
## 4 1 0 1
## 5 1 0 1
## 6 1 0 1
## qfqm.foot.turb.qfFinl
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
The variables and objDesc tables can help
you interpret the column headers in the data table. The
objDesc table contains definitions for many of the terms
used in the eddy flux data product, but it isn’t complete. To get the
terms of interest, we’ll break up the column headers into individual
terms and look for them in the objDesc table:
term <- unlist(strsplit(names(flux$NIWO), split=".", fixed=T))
flux$objDesc[which(flux$objDesc$X...Object %in% term),]
## X...Object
## 22 angZaxsErth
## 84 data
## 91 distReso
## 92 distXaxs90
## 93 distXaxsMax
## 94 distYaxs90
## 95 distZaxsAbl
## 100 distZaxsMeasDisp
## 101 distZaxsRgh
## 125 flux
## 126 fluxCo2
## 128 fluxH2o
## 130 fluxMome
## 132 fluxTemp
## 133 foot
## 200 nsae
## 244 qfFinl
## 274 qfqm
## 451 stat
## 453 stor
## 474 timeBgn
## 475 timeEnd
## 477 turb
## 505 veloFric
## 512 veloYaxsHorSd
## 515 veloZaxsHorSd
## Description
## 22 Wind direction
## 84 Represents data fields
## 91 Resolution of vector of distances (units = m)
## 92 Distance of 90% contriubtion of footprint aligned with dominant wind direction
## 93 Max distance of footprint contriubtion aligned with dominant wind direction
## 94 Distance of 90% contriubtion of footprint aligned with direction that is perpendicular to dominant wind direction
## 95 Atmospheric boundary layer height used in flux footprint calculations
## 100 Measurement height minus displacement height
## 101 Roughness length used in flux footprint calculation
## 125 General term for flux
## 126 CO2 flux (can be turbulent, storage, or nsae)
## 128 H2O flux (can be turbulent, storage, or nsae) - relates to latent heat flux
## 130 Momentum flux (can be turbulent, storage, or nsae)
## 132 Temperature flux (can be turbulent, storage, or nsae) - relates to sensible heat flux
## 133 Footprint
## 200 Net surface atmospher exchange (turbulent + storage fluxes)
## 244 The final quality flag indicating if the data are valid for the given aggregation period (1=fail, 0=pass)
## 274 Quality flag and quality metrics, represents quality flags and quality metrics that accompany the provided data
## 451 Statistics
## 453 Storage
## 474 The beginning time of the aggregation period
## 475 The end time of the aggregation period
## 477 Turbulent
## 505 Friction velocity/ustar (units = m s-1)
## 512 Standard deviation of horizontal wind speed in Y direction (units = m s-1)
## 515 Standard deviation of horizontal wind speed in X direction (units = m s-1)
For the terms that aren’t captured here, fluxCo2,
fluxH2o, and fluxTemp are self-explanatory.
The flux components are
turb: Turbulent fluxstor: Storagensae: Net surface-atmosphere exchange
The variables table contains the units for each
field:
flux$variables
## category system variable stat units
## 1 data fluxCo2 nsae timeBgn NA
## 2 data fluxCo2 nsae timeEnd NA
## 3 data fluxCo2 nsae flux umolCo2 m-2 s-1
## 4 data fluxCo2 stor timeBgn NA
## 5 data fluxCo2 stor timeEnd NA
## 6 data fluxCo2 stor flux umolCo2 m-2 s-1
## 7 data fluxCo2 turb timeBgn NA
## 8 data fluxCo2 turb timeEnd NA
## 9 data fluxCo2 turb flux umolCo2 m-2 s-1
## 10 data fluxH2o nsae timeBgn NA
## 11 data fluxH2o nsae timeEnd NA
## 12 data fluxH2o nsae flux W m-2
## 13 data fluxH2o stor timeBgn NA
## 14 data fluxH2o stor timeEnd NA
## 15 data fluxH2o stor flux W m-2
## 16 data fluxH2o turb timeBgn NA
## 17 data fluxH2o turb timeEnd NA
## 18 data fluxH2o turb flux W m-2
## 19 data fluxMome turb timeBgn NA
## 20 data fluxMome turb timeEnd NA
## 21 data fluxMome turb veloFric m s-1
## 22 data fluxTemp nsae timeBgn NA
## 23 data fluxTemp nsae timeEnd NA
## 24 data fluxTemp nsae flux W m-2
## 25 data fluxTemp stor timeBgn NA
## 26 data fluxTemp stor timeEnd NA
## 27 data fluxTemp stor flux W m-2
## 28 data fluxTemp turb timeBgn NA
## 29 data fluxTemp turb timeEnd NA
## 30 data fluxTemp turb flux W m-2
## 31 data foot stat timeBgn NA
## 32 data foot stat timeEnd NA
## 33 data foot stat angZaxsErth deg
## 34 data foot stat distReso m
## 35 data foot stat veloYaxsHorSd m s-1
## 36 data foot stat veloZaxsHorSd m s-1
## 37 data foot stat veloFric m s-1
## 38 data foot stat distZaxsMeasDisp m
## 39 data foot stat distZaxsRgh m
## 40 data foot stat distObkv m
## 41 data foot stat paraStbl -
## 42 data foot stat distZaxsAbl m
## 43 data foot stat distXaxs90 m
## 44 data foot stat distXaxsMax m
## 45 data foot stat distYaxs90 m
## 46 qfqm fluxCo2 nsae timeBgn NA
## 47 qfqm fluxCo2 nsae timeEnd NA
## 48 qfqm fluxCo2 nsae qfFinl NA
## 49 qfqm fluxCo2 stor qfFinl NA
## 50 qfqm fluxCo2 stor timeBgn NA
## 51 qfqm fluxCo2 stor timeEnd NA
## 52 qfqm fluxCo2 turb timeBgn NA
## 53 qfqm fluxCo2 turb timeEnd NA
## 54 qfqm fluxCo2 turb qfFinl NA
## 55 qfqm fluxH2o nsae timeBgn NA
## 56 qfqm fluxH2o nsae timeEnd NA
## 57 qfqm fluxH2o nsae qfFinl NA
## 58 qfqm fluxH2o stor qfFinl NA
## 59 qfqm fluxH2o stor timeBgn NA
## 60 qfqm fluxH2o stor timeEnd NA
## 61 qfqm fluxH2o turb timeBgn NA
## 62 qfqm fluxH2o turb timeEnd NA
## 63 qfqm fluxH2o turb qfFinl NA
## 64 qfqm fluxMome turb timeBgn NA
## 65 qfqm fluxMome turb timeEnd NA
## 66 qfqm fluxMome turb qfFinl NA
## 67 qfqm fluxTemp nsae timeBgn NA
## 68 qfqm fluxTemp nsae timeEnd NA
## 69 qfqm fluxTemp nsae qfFinl NA
## 70 qfqm fluxTemp stor qfFinl NA
## 71 qfqm fluxTemp stor timeBgn NA
## 72 qfqm fluxTemp stor timeEnd NA
## 73 qfqm fluxTemp turb timeBgn NA
## 74 qfqm fluxTemp turb timeEnd NA
## 75 qfqm fluxTemp turb qfFinl NA
## 76 qfqm foot turb timeBgn NA
## 77 qfqm foot turb timeEnd NA
## 78 qfqm foot turb qfFinl NA
Python
To extract the Level 4 data from the HDF5 files and merge them into a
single table, we’ll use the stack_eddy() function from the
neonutilities package.
stack_eddy() requires two inputs:
filepath: Path to a file or folder, which can be any one of:- A zip file of eddy flux data downloaded from the NEON data portal
- A folder of eddy flux data downloaded by the
zips_by_product()function - The folder of files resulting from unzipping either of 1 or 2
- One or more HDF5 files of NEON eddy flux data
level: dp01-4
Input the filepath you downloaded to using
zips_by_product() earlier, including the
filestoStack00200 folder created by the function, and
dp04:
flux = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
level="dp04")
We now have an object called flux. It’s a dictionary
containing four tables: one table for each site’s data, and
variables and objDesc tables.
flux.keys()
## dict_keys(['HARV', 'NIWO', 'variables', 'objDesc', 'issueLog', 'citation_00200_RELEASE-2026'])
Let’s look at the contents of one of the site data files:
flux['NIWO'].head
## <bound method NDFrame.head of timeBgn ... qfqm.foot.turb.qfFinl
## 0 2018-06-01 00:00:00+00:00 ... 0
## 1 2018-06-01 00:30:00+00:00 ... 0
## 2 2018-06-01 01:00:00+00:00 ... 0
## 3 2018-06-01 01:30:00+00:00 ... 0
## 4 2018-06-01 02:00:00+00:00 ... 0
## ... ... ... ...
## 2923 2018-07-31 21:30:00+00:00 ... 0
## 2924 2018-07-31 22:00:00+00:00 ... 0
## 2925 2018-07-31 22:30:00+00:00 ... 0
## 2926 2018-07-31 23:00:00+00:00 ... 0
## 2927 2018-07-31 23:30:00+00:00 ... 0
##
## [2928 rows x 36 columns]>
The variables and objDesc tables can help
you interpret the column headers in the data table. The
objDesc table contains definitions for many of the terms
used in the eddy flux data product, but it isn’t complete. To get the
terms of interest, we’ll break up the column headers into individual
terms and look for them in the objDesc table:
termlist = [f.split('.') for f in flux['NIWO'].keys()]
term = [t for sublist in termlist for t in sublist]
objDesc = flux['objDesc']
objDesc[objDesc['Object'].isin(term)]
## Object Description
## 21 angZaxsErth Wind direction
## 83 data Represents data fields
## 90 distReso Resolution of vector of distances (units = m)
## 91 distXaxs90 Distance of 90% contriubtion of footprint alig...
## 92 distXaxsMax Max distance of footprint contriubtion aligned...
## 93 distYaxs90 Distance of 90% contriubtion of footprint alig...
## 94 distZaxsAbl Atmospheric boundary layer height used in flux...
## 99 distZaxsMeasDisp Measurement height minus displacement height
## 100 distZaxsRgh Roughness length used in flux footprint calcul...
## 124 flux General term for flux
## 125 fluxCo2 CO2 flux (can be turbulent, storage, or nsae)
## 127 fluxH2o H2O flux (can be turbulent, storage, or nsae) ...
## 129 fluxMome Momentum flux (can be turbulent, storage, or n...
## 131 fluxTemp Temperature flux (can be turbulent, storage, o...
## 132 foot Footprint
## 199 nsae Net surface atmospher exchange (turbulent + st...
## 243 qfFinl The final quality flag indicating if the data ...
## 273 qfqm Quality flag and quality metrics, represents q...
## 450 stat Statistics
## 452 stor Storage
## 473 timeBgn The beginning time of the aggregation period
## 474 timeEnd The end time of the aggregation period
## 476 turb Turbulent
## 504 veloFric Friction velocity/ustar (units = m s-1)
## 511 veloYaxsHorSd Standard deviation of horizontal wind speed in...
## 514 veloZaxsHorSd Standard deviation of horizontal wind speed in...
For the terms that aren’t captured here, fluxCo2,
fluxH2o, and fluxTemp are self-explanatory.
The flux components are
turb: Turbulent fluxstor: Storagensae: Net surface-atmosphere exchange
The variables table contains the units for each
field:
flux['variables']
## category system variable stat units
## 0 data fluxCo2 nsae timeBgn NA
## 1 data fluxCo2 nsae timeEnd NA
## 2 data fluxCo2 nsae flux umolCo2 m-2 s-1
## 3 data fluxCo2 stor timeBgn NA
## 4 data fluxCo2 stor timeEnd NA
## .. ... ... ... ... ...
## 73 qfqm fluxTemp turb timeEnd NA
## 74 qfqm fluxTemp turb qfFinl NA
## 75 qfqm foot turb timeBgn NA
## 76 qfqm foot turb timeEnd NA
## 77 qfqm foot turb qfFinl NA
##
## [78 rows x 5 columns]
Plot fluxes
Let’s plot some data! First, a brief aside about time stamps, since these are time series data.
Time stamps
NEON sensor data come with time stamps for both the start and end of the averaging period. Depending on the analysis you’re doing, you may want to use one or the other; for general plotting, re-formatting, and transformations, I prefer to use the start time, because there are some small inconsistencies between data products in a few of the end time stamps.
Note that all NEON data use UTC time, aka Greenwich
Mean Time. This is true across NEON’s instrumented, observational, and
airborne measurements. When working with NEON data, it’s best to keep
everything in UTC as much as possible, otherwise it’s very easy to end
up with data in mismatched times, which can cause insidious and
hard-to-detect problems. In the code below, time stamps and time zones
have been handled by stackEddy() and
loadByProduct(), so we don’t need to do anything
additional. But if you’re writing your own code and need to convert
times, remember that if the time zone isn’t specified, R will default to
the local time zone it detects on your operating system.
R
plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn,
pch=".", xlab="Date", ylab="CO2 flux",
ylim=c(-20,20))
There is a clear diurnal pattern, and an increase in daily carbon uptake as the growing season progresses.
Let’s trim down to just two days of data to see a few other details.
plot(flux$NIWO$data.fluxCo2.nsae.flux~flux$NIWO$timeBgn,
pch=20, xlab="Date", ylab="CO2 flux",
xlim=c(as.POSIXct("2018-07-07", tz="GMT"),
as.POSIXct("2018-07-09", tz="GMT")),
ylim=c(-20,20), xaxt="n")
axis.POSIXct(1, x=flux$NIWO$timeBgn,
format="%Y-%m-%d %H:%M:%S")
Python
plt.scatter(flux['NIWO'].timeBgn,
flux['NIWO']['data.fluxCo2.nsae.flux'],
marker='.')
plt.ylim((-20,20))
## (-20.0, 20.0)
plt.show()
There is a clear diurnal pattern, and an increase in daily carbon uptake as the growing season progresses.
Let’s trim down to just two days of data to see a few other details.
plt.scatter(flux['NIWO'].timeBgn,
flux['NIWO']['data.fluxCo2.nsae.flux'],
marker='.')
plt.ylim((-20,20))
## (-20.0, 20.0)
plt.xlim((datetime.strptime('2018-07-07 00:00:00', '%Y-%m-%d %H:%M:%S'),
datetime.strptime('2018-07-09 00:00:00', '%Y-%m-%d %H:%M:%S')))
## (np.float64(17719.0), np.float64(17721.0))
plt.show()
Note the timing of C uptake; the UTC time zone is clear here, where uptake occurs at times that appear to be during the night.
4. Merge flux data with other sensor data
Many of the data sets we would use to interpret and model flux data are measured as part of the NEON project, but are not present in the eddy flux data product bundle. In this section, we’ll download PAR data and merge them with the flux data; the steps taken here can be applied to any of the NEON instrumented (IS) data products.
Download PAR data
R
To get NEON PAR data, use the loadByProduct() function
from the neonUtilities package.
loadByProduct() takes the same inputs as
zipsByProduct(), but it loads the downloaded data directly
into the current R environment.
Let’s download PAR data matching the Niwot Ridge flux data. The inputs needed are:
dpID: DP1.00024.001site: NIWOstartdate: 2018-06enddate: 2018-07package: basictimeIndex: 30
The new input here is timeIndex=30, which downloads only
the 30-minute data. Since the flux data are at a 30-minute resolution,
we can save on download time by disregarding the 1-minute data files
(which are of course 30 times larger). The timeIndex input
can be left off if you want to download all available averaging
intervals.
pr <- loadByProduct("DP1.00024.001", site="NIWO",
timeIndex=30, package="basic",
startdate="2018-06", enddate="2018-07",
check.size=F)
Python
To get NEON PAR data, use the load_by_product() function
from the neonutilities package.
load_by_product() takes the same inputs as
zips_by_product(), but it loads the downloaded data
directly into the current Python environment.
Let’s download PAR data matching the Niwot Ridge flux data. The inputs needed are:
dpid: DP1.00024.001site: NIWOstartdate: 2018-06enddate: 2018-07package: basictimeindex: 30
The new input here is timeindex=30, which downloads only
the 30-minute data. Since the flux data are at a 30-minute resolution,
we can save on download time by disregarding the 1-minute data files
(which are of course 30 times larger). The timeindex input
can be left off if you want to download all available averaging
intervals.
pr = nu.load_by_product("DP1.00024.001", site="NIWO",
timeindex=30, package="basic",
startdate="2018-06", enddate="2018-07",
check_size=False)
pr is another named list, and again, metadata and units
can be found in the variables table. The
PARPAR_30min table contains a verticalPosition
field. This field indicates the position on the tower, with 010 being
the first tower level, and 020, 030, etc going up the tower.
Join PAR to flux data
We’ll connect PAR data from the tower top to the flux data.
R
pr.top <- pr$PARPAR_30min[which(pr$PARPAR_30min$verticalPosition==
max(pr$PARPAR_30min$verticalPosition)),]
As noted above, loadByProduct() automatically converts
time stamps to a recognized date-time format when it reads the data.
However, the field names for the time stamps differ between the flux
data and the other meteorological data: the start of the averaging
interval is timeBgn in the flux data and
startDateTime in the PAR data.
Let’s create a new variable in the PAR data:
pr.top$timeBgn <- pr.top$startDateTime
And now use the matching time stamp fields to merge the flux and PAR data.
fx.pr <- merge(pr.top, flux$NIWO, by="timeBgn")
And now we can plot net carbon exchange as a function of light availability:
plot(fx.pr$data.fluxCo2.nsae.flux~fx.pr$PARMean,
pch=".", ylim=c(-20,20),
xlab="PAR", ylab="CO2 flux")
Python
As noted above, load_by_product() automatically converts
time stamps to a recognized date-time format when it reads the data.
However, the field names for the time stamps differ between the flux
data and the other meteorological data: the start of the averaging
interval is timeBgn in the flux data and
startDateTime in the PAR data.
Let’s create a new variable in the PAR data and subset to the top of the tower.
par30 = pr['PARPAR_30min']
par30['timeBgn'] = par30['startDateTime']
prtop = par30[par30['verticalPosition']=='040']
And now use the matching time stamp fields to merge the flux and PAR data.
fxpr = pd.merge(flux['NIWO'], prtop, how='outer', on='timeBgn')
And now we can plot net carbon exchange as a function of light availability:
plt.scatter(fxpr.PARMean,
fxpr['data.fluxCo2.nsae.flux'],
marker='.')
plt.ylim((-20,20))
## (-20.0, 20.0)
plt.show()
If you’re interested in data in the eddy covariance bundle besides the net flux data, the rest of this tutorial will guide you through how to get those data out of the bundle.
5. Vertical profile data (Level 3)
The Level 3 (dp03) data are the spatially interpolated
profiles of the rates of change of CO2, H2O, and
temperature. Extract the Level 3 data from the HDF5 file using
stackEddy() with the same syntax as for the Level 4
data.
R
prof <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
level="dp03")
As with the Level 4 data, the result is a named list with data tables for each site.
head(prof$NIWO[,1:7])
## timeBgn timeEnd data.co2Stor.rateRtioMoleDryCo2.0.1 m
## 1 2018-06-01 00:00:00 2018-06-01 00:29:59 -2.681938e-04
## 2 2018-06-01 00:30:00 2018-06-01 00:59:59 1.912094e-05
## 3 2018-06-01 01:00:00 2018-06-01 01:29:59 3.709964e-04
## 4 2018-06-01 01:30:00 2018-06-01 01:59:59 2.374054e-03
## 5 2018-06-01 02:00:00 2018-06-01 02:29:59 3.397046e-03
## 6 2018-06-01 02:30:00 2018-06-01 02:59:59 3.156952e-03
## data.co2Stor.rateRtioMoleDryCo2.0.2 m data.co2Stor.rateRtioMoleDryCo2.0.3 m
## 1 -2.681938e-04 -2.681938e-04
## 2 1.912094e-05 1.912094e-05
## 3 3.709964e-04 3.709964e-04
## 4 2.374054e-03 2.374054e-03
## 5 3.397046e-03 3.397046e-03
## 6 3.156952e-03 3.156952e-03
## data.co2Stor.rateRtioMoleDryCo2.0.4 m data.co2Stor.rateRtioMoleDryCo2.0.5 m
## 1 -2.681938e-04 -2.681938e-04
## 2 1.912094e-05 1.912094e-05
## 3 3.709964e-04 3.709964e-04
## 4 2.374054e-03 2.374054e-03
## 5 3.397046e-03 3.397046e-03
## 6 3.156952e-03 3.156952e-03
Python
Note (Feb 17, 2026): There is a bug in the Python implementation of the Level 3 code. This section may not run correctly, we will update as soon as possible.
prof = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
level="dp03")
As with the Level 4 data, the result is a named list with data tables for each site.
prof['NIWO'].head()
6. Un-interpolated vertical profile data (Level 2)
The Level 2 data are interpolated in time but not in space. They contain the rates of change at each of the measurement heights.
R
Again, they can be extracted from the HDF5 files using
stackEddy() with the same syntax:
prof.l2 <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
level="dp02")
head(prof.l2$HARV)
## horizontalPosition verticalPosition timeBgn timeEnd
## 1 000 010 2018-06-01 00:00:00 2018-06-01 00:29:59
## 2 000 010 2018-06-01 00:30:00 2018-06-01 00:59:59
## 3 000 010 2018-06-01 01:00:00 2018-06-01 01:29:59
## 4 000 010 2018-06-01 01:30:00 2018-06-01 01:59:59
## 5 000 010 2018-06-01 02:00:00 2018-06-01 02:29:59
## 6 000 010 2018-06-01 02:30:00 2018-06-01 02:59:59
## data.co2Stor.rateRtioMoleDryCo2.mean data.h2oStor.rateRtioMoleDryH2o.mean
## 1 NaN NaN
## 2 0.002666576 NaN
## 3 -0.011224223 NaN
## 4 0.006133056 NaN
## 5 -0.019554655 NaN
## 6 -0.007855632 NaN
## data.tempStor.rateTemp.mean qfqm.co2Stor.rateRtioMoleDryCo2.qfFinl
## 1 2.583333e-05 1
## 2 -2.008056e-04 1
## 3 -1.901111e-04 1
## 4 -7.419444e-05 1
## 5 -1.537083e-04 1
## 6 -1.874861e-04 1
## qfqm.h2oStor.rateRtioMoleDryH2o.qfFinl qfqm.tempStor.rateTemp.qfFinl
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 0
## 5 1 0
## 6 1 0
Python
Again, they can be extracted from the HDF5 files using
stackeddy() with the same syntax:
profl2 = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
level="dp02")
profl2['HARV'].head()
## horizontalPosition ... qfqm.tempStor.rateTemp.qfFinl
## 0 000 ... 0
## 1 000 ... 0
## 2 000 ... 0
## 3 000 ... 0
## 4 000 ... 0
##
## [5 rows x 10 columns]
Note that here, as in the PAR data, there is a
verticalPosition field. It has the same meaning as in the
PAR data, indicating the tower level of the measurement.
7. Calibrated raw data (Level 1)
Level 1 (dp01) data are calibrated, and aggregated in
time, but otherwise untransformed. Use Level 1 data for raw gas
concentrations and atmospheric stable isotopes.
Extract Level 1 data
R
Using stackEddy() to extract Level 1 data requires
additional inputs. The Level 1 files are too large to simply pull out
all the variables by default, and they include multiple averaging
intervals, which can’t be merged. So two additional inputs are
needed:
avg: The averaging interval to extractvar: One or more variables to extract
What variables are available, at what averaging intervals? Another
function in the neonUtilities package,
getVarsEddy(), returns a list of HDF5 file contents. It
requires only one input, a filepath to a single NEON HDF5 file:
vars <- getVarsEddy(paste0(getwd(),
"/filesToStack00200/NEON.D01.HARV.DP4.00200.001.nsae.2018-07.basic.20260115T064348Z.h5"))
head(vars)
## site level category system hor ver tmi subsys name otype
## 5 HARV dp01 data amrs 000 060 01m <NA> angNedXaxs H5I_DATASET
## 6 HARV dp01 data amrs 000 060 01m <NA> angNedYaxs H5I_DATASET
## 7 HARV dp01 data amrs 000 060 01m <NA> angNedZaxs H5I_DATASET
## 9 HARV dp01 data amrs 000 060 30m <NA> angNedXaxs H5I_DATASET
## 10 HARV dp01 data amrs 000 060 30m <NA> angNedYaxs H5I_DATASET
## 11 HARV dp01 data amrs 000 060 30m <NA> angNedZaxs H5I_DATASET
## dclass dim oth
## 5 COMPOUND 44640 <NA>
## 6 COMPOUND 44640 <NA>
## 7 COMPOUND 44640 <NA>
## 9 COMPOUND 1488 <NA>
## 10 COMPOUND 1488 <NA>
## 11 COMPOUND 1488 <NA>
Inputs to var can be any values from the
name field in the table returned by
getVarsEddy(). Let’s take a look at CO2 and
H2O, 13C in CO2 and 18O in
H2O, at 30-minute aggregation. Let’s look at Harvard Forest
for these data, since deeper canopies generally have more interesting
profiles:
iso <- stackEddy(filepath=paste0(getwd(), "/filesToStack00200"),
level="dp01", var=c("rtioMoleDryCo2","rtioMoleDryH2o",
"dlta13CCo2","dlta18OH2o"), avg=30)
head(iso$HARV[,1:8])
## horizontalPosition verticalPosition timeBgn timeEnd
## 1 000 010 2018-06-01 00:00:00 2018-06-01 00:29:59
## 2 000 010 2018-06-01 00:30:00 2018-06-01 00:59:59
## 3 000 010 2018-06-01 01:00:00 2018-06-01 01:29:59
## 4 000 010 2018-06-01 01:30:00 2018-06-01 01:59:59
## 5 000 010 2018-06-01 02:00:00 2018-06-01 02:29:59
## 6 000 010 2018-06-01 02:30:00 2018-06-01 02:59:59
## data.co2Stor.rtioMoleDryCo2.mean data.co2Stor.rtioMoleDryCo2.min
## 1 509.3375 451.4786
## 2 502.2736 463.5470
## 3 521.6139 442.8649
## 4 469.6317 432.6588
## 5 484.7725 436.2842
## 6 476.8554 443.7055
## data.co2Stor.rtioMoleDryCo2.max data.co2Stor.rtioMoleDryCo2.vari
## 1 579.3518 845.0795
## 2 533.6622 161.3652
## 3 563.0518 547.9924
## 4 508.7463 396.8379
## 5 537.4641 662.9449
## 6 515.6598 246.6969
Python
Using stack_eddy() to extract Level 1 data requires
additional inputs. The Level 1 files are too large to simply pull out
all the variables by default, and they include multiple averaging
intervals, which can’t be merged. So two additional inputs are
needed:
avg: The averaging interval to extractvar: One or more variables to extract
Inputs to var can be any values from the
name field in the table returned by
get_vars_eddy(). Let’s take a look at CO2 and
H2O, 13C in CO2 and 18O in
H2O, at 30-minute aggregation. Let’s look at Harvard Forest
for these data, since deeper canopies generally have more interesting
profiles:
iso = nu.stack_eddy(filepath=os.getcwd() + "/filesToStack00200",
level="dp01", avg=30,
var=["rtioMoleDryCo2","rtioMoleDryH2o",
"dlta13CCo2","dlta18OH2o"])
iso['HARV'].head()
## horizontalPosition ... ucrt.h2oTurb.rtioMoleDryH2o.se
## 0 000 ... NaN
## 1 000 ... NaN
## 2 000 ... NaN
## 3 000 ... NaN
## 4 000 ... NaN
##
## [5 rows x 85 columns]
Plot vertical profiles
Let’s plot vertical profiles of CO2 and 13C in CO2 on a single day.
Here we’ll use the time stamps to select all of the records for a
single day. And discard the verticalPosition values that
are string values - those are the calibration gases.
R
iso.d <- iso$HARV[grep("2018-06-25", iso$HARV$timeBgn, fixed=T),]
iso.d <- iso.d[-which(is.na(as.numeric(iso.d$verticalPosition))),]
ggplot is well suited to these types of data. We can
plot CO2 relative to height on the tower, with separate lines
for each time interval.
g <- ggplot(iso.d, aes(y=verticalPosition)) +
geom_path(aes(x=data.co2Stor.rtioMoleDryCo2.mean,
group=timeBgn, col=timeBgn)) +
theme(legend.position="none") +
xlab("CO2") + ylab("Tower level")
g
And the same plot for 13C in CO2:
g <- ggplot(iso.d, aes(y=verticalPosition)) +
geom_path(aes(x=data.isoCo2.dlta13CCo2.mean,
group=timeBgn, col=timeBgn)) +
theme(legend.position="none") +
xlab("d13C") + ylab("Tower level")
g
Python
isoH = iso['HARV']
isoH6 = isoH[(isoH['timeBgn'] >= '2018-06-25') & (isoH['timeBgn'] < '2018-06-26')]
isod = isoH6[isoH6['horizontalPosition'] == '000']
Let’s plot CO2 relative to height on the tower, with separate lines for each time interval.
fig, ax = plt.subplots()
groups = isod["timeBgn"].unique()
cmap = plt.cm.viridis
colors = cmap(np.linspace(0, 1, len(groups)))
for color, group_name in zip(colors, groups):
group = isod[isod["timeBgn"] == group_name]
ax.plot(group['data.co2Stor.rtioMoleDryCo2.mean'],
group['verticalPosition'],
color=color)
ax.set_xlabel("CO2")
ax.set_ylabel("Tower level")
plt.show()
And the same plot for 13C in CO2:
fig, ax = plt.subplots()
for color, group_name in zip(colors, groups):
group = isod[isod["timeBgn"] == group_name]
ax.plot(group["data.isoCo2.dlta13CCo2.mean"],
group["verticalPosition"],
color=color)
ax.set_xlabel("13C")
ax.set_ylabel("Tower level")
plt.show()
The legends are omitted for space, see if you can use the concentration and isotope ratio buildup and drawdown below the canopy to work out the times of day the different colors represent.