Tutorial
Introduction to the NEON Continuous Discharge Data Product
Authors: Zachary Nickerson
Last Updated: Jan 27, 2026
Objectives
After completing this activity, you will be able to:
- Download and explore the contents of the NEON Continuous discharge data product (DP4.00130.001)
- Understand how the publication history of this data product influences the tables available for download
- Create a single continuous time series of discharge data for a site via data aggregation and table joining
- Plot continuous time series of stage and discharge with associated uncertainties for the lifetime of a site
Things You’ll Need To Complete This Tutorial
You can follow either the R or Python code throughout this tutorial.
- For R users, we recommend using R version 4+ and RStudio.
- For Python users, we recommend using Python 3.9+.
Set up: Install Packages
Packages only need to be installed once, you can skip this step after the first time:
R
- neonUtilities: Basic functions for accessing NEON data
- tidyverse: Collection of R packages designed for data science
install.packages("neonUtilities")
install.packages("tidyverse")
Python
- os: Module allowing interaction with user’s operating system
- pandas: Module for working with data frames
- neonutilities: Basic functions for accessing NEON data
- matplotlib: Functions for plotting
pip install os
pip install pandas
pip install neonutilities
pip install matplotlib
Additional Resources
- Tutorial for using neonUtilities from both R and Python environments.
- GitHub repository for neonUtilities
- neonUtilities cheat sheet. A quick reference guide for users.
Set up: Load Packages
R
library(neonUtilities)
library(tidyverse)
Python
import os
import pandas as pd
import neonutilities as nu
import matplotlib.pyplot as plt
Set NEON Data Portal API Token
It is recommended that NEON data users have a NEON Data Portal API token set as an environment variable. See this tutorial. for instructions on obtaining a NEON API token.
R
Sys.setenv(NEON_PAT="YOUR_API_TOKEN_HERE")
Python
os.environ.setdefault('NEON_PAT',"YOUR_API_TOKEN_HERE")
Download Data Product
In this tutorial, we will focus on a single site, Lower Hop Brook
(HOPB) in NEON Domain 01, but the workflow can be replicated for any
site at which continuous discharge is published by changing the
site input variable. Visit the
NEON
Data Portal landing page for this data product for an availability
chart showing all the sites for which Continuous discharge is published
and the period of record for each site.
For this tutorial we will download all data across the lifetime of a site by commenting out the start and end date in the function. If you wish to query a shorter period of time they can be un-commented. The NEON Data Portal stores data in site*month download packages, so date range inputs are formatted as YYYY-MM.
For this tutorial we will also download both RELEASED and PROVISIONAL
data by setting the include.provisional argument to ‘true’
in the function. To download only RELEASED data this argument should be
set to ‘false’. To learn more about the differences between released and
provisional data, see the
Understanding
Releases and Provisional Data tutorial on the NEON website.
R
# Download continuous discharge data across the lifetime of a site
csd_r <- neonUtilities::loadByProduct(dpID="DP4.00130.001",
site="HOPB",
# startdate="2020-10",
# enddate="2021-09",
release="current",# Downloads data from the most recent release
include.provisional = T,# Includes provisional data
package='expanded',
check.size = F,
token=Sys.getenv("NEON_PAT"))
Python
# Download continuous discharge data across the lifetime of a site
csd_py = nu.load_by_product(dpid="DP4.00130.001",
site="HOPB",
# start_date="2020-10",
# end_date="2021-09",
release="current",
include_provisional=True,
package="expanded",
check_size=False,
token=os.environ.get("NEON_PAT"))
Navigate Data Download
Downloads from the NEON Utilities R package and Python module contain multiple files, including data tables, metadata, and data product documentation. Let’s explore each set of files in turn.
Files Associated with Downloads
The data we’ve downloaded comes as an object that is a named list/dictionary of objects. Let’s view the contents of the download package.
R
# Get all file names in the download package
names(csd_r)
Python
# Get all file names in the download package
csd_py.keys()
In this tutorial, we downloaded the expanded download
package. What are the files contained in this download package and why
are they useful?
- csd_data_tables: Includes the primary data tables of the continuous discharge data product. We will dive deeper into data tables in the next section.
- sensor_positions_00130: Reports the geolocation of each sensor included in the download.
- science_review_flags_00130: Lists each science review flag (SRF) date range, flag value, and justification applied to the data included in this download.
- issueLog_00130: Reports issues that may impact data quality, or changes to a data product that affects all sites.
- variables_00130: This file contains all the variables found in the data table(s) included in this download. This includes full definitions, units, and other important information.
- categoricalCodes_00130: Some variables in the data tables are published as strings and constrained to a standardized list of values (LOV). This file shows all the LOV options for variables published in this data product.
- readme_00130: The readme file provides important information relevant to the data product and the specific instance of downloading the data.
- data citations: At least one data citation file is
included in each data download, depending on the
releasefunction inputs and date range queried.
Explore Data Tables
The basic download package for DP4.00130.001 contains two data tables that report continuous discharge time series data modeled from in situ pressure transducers and stage-discharge rating curves:
csd_15_min: Continuous discharge data averaged to 15-minute intervals.csd_continuousDischarge: Instantaneous (1-minute) continuous discharge data.
The expanded download package for DP4.00130.001 contains multiple additional tables that aid in understanding data relationships, model coefficients, uncertainty propagation, and data corrections.
sdrc_gaugePressureRelationship: Reports the relationship between measured gauge heights and calculated stage values derived from gauge height‐water column height linear regressions.csd_gaugeWaterColumnHeightRegression: Reports the linear regression coefficients for each unique gauge height-water column height regression model developed to estimate stage from surface water pressure data.csd_dataGapToFillMethodMapping: Reports periods of gaps or erroneous data in uncorrected continuous discharge data and the method used to correct the period.csd_constantBiasShift: Reports periods of continuous discharge data corrected via constant bias shift and the magnitude of the shift.csd_gapFillingRegression: Reports the linear regression coefficients for each unique relationship developed to correct NEON continuous discharge data using the method specified in the mapping table.
At the Domain 08 - Tombigbee River (TOMB) site, a stage-discharge relationship cannot be developed due to the influence of a downstream lock and dam system. Therefore, the above tables are not published for TOMB. Rather, the following tables are published for TOMB.
csd_continuousDischargeUSGS: USGS discharge and associated uncertainty based on the fit of USGS data with corresponding discharge measurements collected at TOMB.csd_dischargeRegressionUSGS: Reports the linear regression coefficients for each unique relationship developed to estimate NEON discharge from USGS discharge.
Below, we will explore the first few rows of each high-frequency time series table included in the basic download package. Add to the code below to also view those tables included in the expanded download package.
R
# Print the first 5 records in each of the time series tables to view structure
print("First 5 rows of csd_continuousDischarge")
head(csd_r$csd_continuousDischarge)
print("First 5 rows of csd_15_min")
head(csd_r$csd_15_min)
Python
# Print the first 5 records in each of the time series tables to view structure
print("First 5 rows of csd_continuousDischarge")
print(csd_py['csd_continuousDischarge'].head())
print("First 5 rows of csd_15_min")
print(csd_py['csd_15_min'].head())
Explore Variables
The variables_00130 file provides insight into the
structure of each data table and associated variables included in a
download package. For the Continuous discharge data product, it is
crucial to understand the relationships between the
csd_continuousDischarge and csd_15_min
tables to aggregate the two tables and produce a continuous timeseries.
That is an exercise we will conduct later in this tutorial. For now,
view the variables file and familiarize yourself with the different
fields, data types, and units used in this data product.
R
# View variables file to understand data table structure
View(csd_r$variables_00130)
Python
# View variables file to understand data table structure
print(csd_py['variables_00130'])
Wrangle Stage & Discharge Data
This data product was partially updated in 2026 from uncorrected 1 minute values to corrected 15-min averages. Until these updates are fully applied backwards in time, some data wrangling and table joining are required to produce a continuous time series of continuous discharge. The publication history of this data product is explained in detail in the NEON User Guide to Continuous Discharge (NEON.DP4.00130.001), which is published in the “Documentation” section of the NEON Data Portal landing page for Continuous discharge for this data product.
In brief, the csd_15_min table was first published
beginning in 2026 (data included in RELEASE-2026), and was published for
data beginning (for most sites) in 2021-10 (beginning of WY 2022). Prior
to the first date published for a site in csd_15_min, the
csd_continuousDischarge table is published. As more WYs are
corrected prior to WY 2022, records in
csd_continuousDischarge will be removed from publication
and replaced with corrected records in csd_15_min.
A good rule of thumb for managing NEON continuous discharge data:
| Table | Frequency | Record Description |
|---|---|---|
| csd_continuousDischarge | 1 minute | uncorrected data unchanged since RELEASE-2025 |
| csd_15_min | 15 minute | if release = RELEASE-2026, data
processed within new pipeline; data corrections applied |
| csd_15_min | 15 minute | if release = PROVISIONAL, data processed
within new pipeline; determine status of data correction in the
dischargeCorrectionApplied field* |
* - PROVISIONAL data can be corrected prior to a subsequent data
release. Do not assume all PROVISIONAL data are uncorrected. If the
csd_15_min:dischargeCorrectionApplied field equals 1 or 0,
the record has been reviewed for corrections and corrections were
applied where appropriate. If the field is NA, the record has
not been reviewed for corrections. More details on the definition and
methods of correction can be found in the ATBD for this data product
(NEON.DOC.005403)
Aggregate 1-minute Table to 15-minutes
The first step to generating a continuous period of record for a site
is to aggregate the uncorrected 1-minute
csd_continuousDischarge data to a 15-minute interval. We
will use this opportunity to conform the column headers of the
aggregated table we create to fit those of csd_15_min, and
to convert quality flags to either 1 (flagged) or NA.
R
# Round, group, and summarize data, and update column names to match csd_15_min
csd_15_min_agg <- csd_r$csd_continuousDischarge%>%
mutate(endDateTime=floor_date(endDate,"15 min"))%>%
group_by(siteID,endDateTime)%>%
summarise(stageContinuous=mean(equivalentStage,
na.rm=T),
stageTotalUncert=mean(stageUnc,
na.rm=T),
dischargeContinuous=mean(maxpostDischarge,
na.rm=T),
dischargeUpperParamUncert=mean(withParaUncQUpper2Std,
na.rm=T),
dischargeLowerParamUncert=mean(withParaUncQUpper1Std,
na.rm=T),
dischargeUpperRemnUncert=mean(withRemnUncQUpper2Std,
na.rm=T),
dischargeLowerRemnUncert=mean(withRemnUncQLower2Std,
na.rm=T),
dischargeFinalQF=sum(dischargeFinalQF==1),
dischargeFinalQFSciRvw=sum(dischargeFinalQFSciRvw==1,
na.rm = T))
csd_15_min_agg$dischargeFinalQF[
csd_15_min_agg$dischargeFinalQF>0
] <- 1
csd_15_min_agg$dischargeFinalQFSciRvw[
csd_15_min_agg$dischargeFinalQFSciRvw>0
] <- 1
csd_15_min_agg$dischargeFinalQFSciRvw[
csd_15_min_agg$dischargeFinalQFSciRvw==0
] <- NA
Python
# Round, group, and summarize data, and update column names to match csd_15_min
csd_py['csd_continuousDischarge']['endDate'] = pd.to_datetime(csd_py['csd_continuousDischarge']['endDate'])
csd_py['csd_continuousDischarge']['endDateTime'] = csd_py['csd_continuousDischarge']['endDate'].dt.round('15min')
csd_15_min_agg = csd_py['csd_continuousDischarge'].groupby(['siteID', 'endDateTime']).agg({
'equivalentStage': 'mean',
'stageUnc': 'mean',
'maxpostDischarge': 'mean',
'withParaUncQUpper2Std': 'mean',
'withParaUncQUpper1Std': 'mean',
'withRemnUncQUpper2Std': 'mean',
'withRemnUncQLower2Std': 'mean',
'dischargeFinalQF': lambda x: (x == 1).sum(),
'dischargeFinalQFSciRvw': lambda x: (x == 1).sum()
}).reset_index()
csd_15_min_agg = csd_15_min_agg.rename(columns={
'equivalentStage': 'stageContinuous',
'stageUnc': 'stageTotalUncert',
'maxpostDischarge': 'dischargeContinuous',
'withParaUncQUpper2Std': 'dischargeUpperParamUncert',
'withParaUncQUpper1Std': 'dischargeLowerParamUncert',
'withRemnUncQUpper2Std': 'dischargeUpperRemnUncert',
'withRemnUncQLower2Std': 'dischargeLowerRemnUncert'
})
csd_15_min_agg.loc[csd_15_min_agg['dischargeFinalQF'] > 0, 'dischargeFinalQF'] = 1
csd_15_min_agg.loc[csd_15_min_agg['dischargeFinalQFSciRvw'] > 0, 'dischargeFinalQFSciRvw'] = 1
csd_15_min_agg.loc[csd_15_min_agg['dischargeFinalQFSciRvw'] == 0, 'dischargeFinalQFSciRvw'] = pd.NA
Join Tables
Because we conformed column headers in the uncorrected aggregated
data, it should now be relatively easy to join the data to the
csd_15_min. Let’s now create a single data frame
representing the entire period of record aggregated to a 15-minute
interval.
R
# Merge aggregated table created earlier with csd_15_min from the download
csd_15_min_all <- merge(csd_15_min_agg,
csd_r$csd_15_min,
all = T)
csd_15_min_all <- csd_15_min_all[order(csd_15_min_all$endDateTime),]
Python
# Align datetime fields in each data frame
csd_15_min_agg['endDateTime'] = pd.to_datetime(
csd_15_min_agg['endDateTime'], utc=True
)
csd_py['csd_15_min']['endDateTime'] = pd.to_datetime(
csd_py['csd_15_min']['endDateTime'], utc=True
)
csd_15_min_all = pd.merge(csd_15_min_agg,
csd_py['csd_15_min'],
how='outer')
csd_15_min_all = csd_15_min_all.sort_values(by='endDateTime')
Plot Stage Data
Continuous stage is modeled by developing a linear regression between empirical staff gauge heights (DP1.20267.001) and continuous water column height calculated using surface water pressure data (DP1.20016.001). Uncertainty is estimated for continuous stage by summing uncertainty associated with the calibration of the pressure transducer (termed: nonsystematic) and uncertainty associated with the fit of the linear regression (termed: systematic).
More details on the modeling algorithm for continuous stage can be found in the ATBD for this data product (NEON.DOC.005403).
Here, we will plot the continuous stage time series for the entire period of record, then plot a subset of data to zoom in on the time series to get a better visual of the associated uncertainty.
R
# Use the ggplot package in tidyverse to plot continuous stage and uncertainty
csd_15_min_all%>%
ggplot(aes(x=endDateTime))+
geom_ribbon(aes(ymin=stageContinuous-stageTotalUncert,
ymax=stageContinuous+stageTotalUncert),
fill="grey70")+
geom_line(aes(y=stageContinuous))+
theme_classic()+
labs(title = "Stage Series: Entire Period of Record",
x="Date",
y="Stage (m)")
# Plot 2: Zoomed in to March 2024
csd_15_min_all%>%
filter(endDateTime>="2024-03-01"
&endDateTime<"2024-04-01")%>%
ggplot(aes(x=endDateTime))+
geom_ribbon(aes(ymin=stageContinuous-stageTotalUncert,
ymax=stageContinuous+stageTotalUncert),
fill="grey70")+
geom_line(aes(y=stageContinuous))+
theme_classic()+
labs(title = "Stage Series: Zoomed In to View Uncertainty - 2024-03",
x="Date",
y="Stage (m)")
Python
# Use matplotlib to plot continuous stage and uncertainty
fig, ax = plt.subplots(figsize=(12, 6))
ax.fill_between(csd_15_min_all['endDateTime'],
csd_15_min_all['stageContinuous'] - csd_15_min_all['stageTotalUncert'],
csd_15_min_all['stageContinuous'] + csd_15_min_all['stageTotalUncert'],
color='grey', alpha=0.7, label='Uncertainty')
ax.plot(csd_15_min_all['endDateTime'], csd_15_min_all['stageContinuous'],
color='black', linewidth=1)
ax.set_title('Stage Series: Entire Period of Record')
ax.set_xlabel('Date')
ax.set_ylabel('Stage (m)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Plot 2: Zoomed in to March 2024
filtered_data = csd_15_min_all[
(csd_15_min_all['endDateTime'] >= '2024-03-01') &
(csd_15_min_all['endDateTime'] < '2024-04-01')
]
fig, ax = plt.subplots(figsize=(12, 6))
ax.fill_between(filtered_data['endDateTime'],
filtered_data['stageContinuous'] - filtered_data['stageTotalUncert'],
filtered_data['stageContinuous'] + filtered_data['stageTotalUncert'],
color='grey', alpha=0.7, label='Uncertainty')
ax.plot(filtered_data['endDateTime'], filtered_data['stageContinuous'],
color='black', linewidth=1)
ax.set_title('Stage Series: Zoomed In to View Uncertainty - 2024-03')
ax.set_xlabel('Date')
ax.set_ylabel('Stage (m)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Plot Discharge Data
Continuous discharge is modeled within a Bayesian modeling framework using an executable (BaM) first developed and published by Le Coz et al. in 2014 The continuous stage series and inputs from the Stage-discharge rating curves data product (DP4.00133.001) are used to model continuous discharge. Two types of uncertainty are associated with continuous discharge:
- Parametric uncertainty: derived from the uncertainty of the model priors
- Remnant uncertainty: derived from how the model fits the observations
More details on the modeling algorithm for continuous discharge be found in the ATBD for this data product (NEON.DOC.005403).
Here, we will plot the continuous discharge time series for the entire period of record, then plot a subset of data to zoom in on the time series to get a better visual of the associated uncertainties.
R
# Use the ggplot package in tidyverse to plot continuous discharge and uncertainty
csd_15_min_all%>%
ggplot(aes(x=endDateTime))+
geom_ribbon(aes(ymin=dischargeLowerRemnUncert,
ymax=dischargeUpperRemnUncert),
fill="#D55E00")+
geom_ribbon(aes(ymin=dischargeLowerParamUncert,
ymax=dischargeUpperParamUncert),
fill="#E69F00")+
geom_line(aes(y=dischargeContinuous))+
theme_classic()+
labs(title = "Discharge Series: Entire Period of Record",
x="Date",
y="Discharge (L/s)")
# Plot 2: Zoomed in to March 2024
csd_15_min_all%>%
filter(endDateTime>="2024-03-01"
&endDateTime<"2024-04-01")%>%
ggplot(aes(x=endDateTime))+
geom_ribbon(aes(ymin=dischargeLowerRemnUncert,
ymax=dischargeUpperRemnUncert),
fill="#D55E00")+
geom_ribbon(aes(ymin=dischargeLowerParamUncert,
ymax=dischargeUpperParamUncert),
fill="#E69F00")+
geom_line(aes(y=dischargeContinuous))+
theme_classic()+
labs(title = "Discharge Series: Zoomed In to View Uncertainty - 2024-03",
x="Date",
y="Discharge (L/s)")
Python
# Use matplotlib to plot continuous discharge and uncertainty
fig, ax = plt.subplots(figsize=(12, 6))
ax.fill_between(csd_15_min_all['endDateTime'],
csd_15_min_all['dischargeLowerRemnUncert'],
csd_15_min_all['dischargeUpperRemnUncert'],
color='#D55E00', alpha=0.7, label='Remnant Uncertainty')
ax.fill_between(csd_15_min_all['endDateTime'],
csd_15_min_all['dischargeLowerParamUncert'],
csd_15_min_all['dischargeUpperParamUncert'],
color='#E69F00', alpha=0.7, label='Parametric Uncertainty')
ax.plot(csd_15_min_all['endDateTime'], csd_15_min_all['dischargeContinuous'],
color='black', linewidth=1)
ax.set_title('Discharge Series: Entire Period of Record')
ax.set_xlabel('Date')
ax.set_ylabel('Discharge (L/s)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Plot 2: Zoomed in to March 2024
filtered_data = csd_15_min_all[
(csd_15_min_all['endDateTime'] >= '2024-03-01') &
(csd_15_min_all['endDateTime'] < '2024-04-01')
]
fig, ax = plt.subplots(figsize=(12, 6))
ax.fill_between(filtered_data['endDateTime'],
filtered_data['dischargeLowerRemnUncert'],
filtered_data['dischargeUpperRemnUncert'],
color='#D55E00', alpha=0.7, label='Remnant Uncertainty')
ax.fill_between(filtered_data['endDateTime'],
filtered_data['dischargeLowerParamUncert'],
filtered_data['dischargeUpperParamUncert'],
color='#E69F00', alpha=0.7, label='Parametric Uncertainty')
ax.plot(filtered_data['endDateTime'], filtered_data['dischargeContinuous'],
color='black', linewidth=1)
ax.set_title('Discharge Series: Zoomed In to View Uncertainty - 2024-03')
ax.set_xlabel('Date')
ax.set_ylabel('Discharge (L/s)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()