How do we support transparency and reproducibility in the science generated using NEON data? In part, by thoroughly documenting our data and by making static, unchangeable versions of our data available for the lifetime of the Observatory each year.
NEON data are published as “data products”, each of which is a collection of measurements, organized into one or more tables or files, that were generated from the same sensor assembly or collected using the same protocol or set of protocols. When working with NEON data, it is important to understand the comparability of data within a data product. In general, all of the data within a product are comparable from the first date of collection to the last. That said, it is important to understand whether the data you work with are Provisional or Released, and whether a data product has been Revised over time.
Provisional and Released Data
Data are initially published with a Provisional status, which means that data may be updated on an as-needed basis, without guarantee of reproducibility. Until the first data Release was published in January of 2021, all NEON data were Provisional. Provisional data allows NEON to publish data more rapidly, while retaining the ability to make corrections or additions as they are identified.
After initial publication, a lag time occurs before the data are formally Released. During this lag time, extra quality control (QC) procedures, which is described in data product specific documentation, may be performed. This lag time also ensures all data from laboratory analyses are available before a data Release. Additionally, the user community will have had the opportunity to work with the data and provide quality-related feedback.
Each Release consists of a complete set of data files that will not be changed further and will remain accessible (with the exception of large AOP datasets) throughout the lifetime of the Observatory. Each year’s Release will include all data collected for each included data product up to a subsystem-specific lag period prior to the release date. For IS, AOP, and OS data products, the respective provisional periods are 6, 6, and 12 months. Exceptions may be made for certain OS data products that need more time to obtain and publish external lab data, and for other unexpected disruptions to data availability. Users should refer to the release manifest for specific data that is included in each release.
Each data product within a Release is associated with a Digital Object Identifier (DOI) for reference and citation. DOI URLs will always resolve back to the dataset, and are thus ideal for citing NEON data in publications and applications. Data products that are sub-products of another product, and are not downloadable individually, use their parent products’ DOIs. Data products that are hosted fully by another repository are not included in any release and are not assigned DOIs by NEON.
Data Quality in Provisional and Released Data
Data quality review is an active and continual process for NEON data products, including both automated and manual procedures that may detect quality issues in published data. For an overview of NEON quality assurance and quality control procedures, see the Data Quality page. These processes yield corrections and quality flagging in Provisional data that are applied on a rolling basis, without notice to end users. If corrections are also identified for data already included in a Release, those data are also updated but only on an annual basis such that updates are included in the next Release. Thanks to these processes, the data in each year’s Release are the highest-quality data available at the time. The most recently published Provisional data have been subject only to the automated quality control procedures, and these data are the most likely to change in response to additional quality assessment.
Some data products are processed on a schedule that aligns with the Releases, resulting in a more distinct change between Provisional and Release, or between Releases. Stage-discharge rating curves and Continuous discharge are initially calculated and published Provisionally using the previous water year’s model, and are recalculated at the end of the water year before inclusion in the Release. Similarly, Eddy covariance data are reprocessed prior to the Release each year using the latest code version, so that data within a Release are all based on the same code.
Downloading and Using Provisional and Released Data
The default download for any given data product from the NEON Data Portal will include the most recent Release plus all Provisional data generated since the Release. Alternatively, you may select a specific Release.
A manifest file is included with all downloaded data packages. This file provides names and information about all files included in the package, including file size, checksums for verification purposes, and permanent links to each file. The manifest also specifies whether each file is provisional or associated with a release.
Your download will include files packaged within folders within a single zip file. Our R package, neonUtilities v2.0 and above, has numerous functionalities including the ability to join files across sites and months for IS and OS data.
If you have downloaded the same provisional data file on two different dates, it is possible to discover whether changes have occurred by inspecting the time stamp at the end of each file name (for instrumented and observational data products, not data products from the airborne observation platform). The time stamp corresponds to the date and time at which the file was created, and is only changed when data are republished. More information is available at Data Formats and Conventions.
Please plan to publish and archive Provisional data used in publication in an appropriate repository, as NEON will not assign a DOI until the data are included in an official Release. Please read our Publishing Research Outputs page to learn more.
|Learn about Release 2023||Learn about Release 2022||Learn about Release 2021|
Data Product Revisions
If an instrument or protocol is significantly changed to the extent that users should be aware of potential issues with incompatibility, we will generate a new Revision of the data product, denoted by a change in data product identifier. Data from different revisions of the same data product are not directly comparable and should be used with caution when combining for use or analysis. Upon a data product revision, the REV field of the data product identifier will be incremented. The data product identifier takes the form DPL.PRNUM.REV, where DPL is the data product level, PRNUM is the product number, and REV is the product revision. Each data product revision will be findable in the Explore Data Product page, along with a short summary of the changes made between revisions.
Last updated April 28, 2023