How do we support transparency and reproducibility in the science generated using NEON data? In part, by thoroughly documenting our data and by making static, unchangeable versions of our data available for the lifetime of the Observatory each year.
NEON data are published as “data products”, each of which is a collection of measurements, organized into one or more tables or files, that were generated from the same sensor assembly or collected using the same protocol or set of protocols. When working with NEON data, it is important to understand the comparability of data within a data product. In general, all of the data within a product are comparable from the first date of collection to the last. That said, it is important to understand whether the data you work with are Provisional or Released, and whether a data product has been Revised over time.
Provisional and Released Data
Data are initially published with a Provisional status, which means that data may be updated on an as-needed basis, without guarantee of reproducibility. Until the first Data Release was published in January of 2021, all NEON data were Provisional. Provisional data allow us to more rapidly provide data on the Data Portal, while retaining the ability to make corrections or additions as they are identified.
After initial publication, a lag time occurs before the data are more formally Released. During this lag time, extra quality control (QC) procedures (as described in data product documentation) may be performed. This lag time also ensures all data from laboratory analyses are available before a data Release. Additionally, the user community will have had the opportunity to use the data in scientific applications and provide quality-related feedback.
Each Release consists of a complete set of data files that will not be changed further and will remain accessible (with the possible exception of large AOP datasets) throughout the lifetime of the Observatory. Each year’s Release will include all data collected for each included data product up to a subsystem-specific lag period prior to the release date. For IS, AOP, and OS data products, the respective provisional periods are 6, 6, and 12 months. Exceptions may be made for certain OS data products that need more time to obtain and publish external lab data, and for other unexpected disruptions to data availability.
Each data product within a Release is associated with a Digital Object Identifier (DOI) for reference and citation. DOI URLs will always resolve back to the dataset, and are thus ideal for citing NEON data in publications and applications. Data products that are sub-products of another product, and are not downloadable individually, use their parent products’ DOIs. Data products that are hosted fully by another repository, are not included in any release and are not assigned DOIs by NEON.
Both provisional and Released data have been checked to the greatest extent possible for any errors before publication and are considered fit for research. It is always possible that additional data or quality information may become available at a later date, for both provisional and Released data. The important difference between Provisional and Released data is that Provisional data files are subject to change at any time, without traceability, and therefore do not have a guarantee of reproducibility, while Released data files will be unchangeable and past Releases will continue to be publicly accessible to users. Any updates or corrections to data in a Release will be reflected in a subsequent Release.
Downloading and Using Provisional and Released Data
The default download for any given data product from the NEON Data Portal will include the most recent Release plus all Provisional data generated since the Release. Alternatively, you may select a specific Release.
A manifest file is included with all downloaded data packages. This file provides names and information about all files included in the package, including file size, checksums for verification purposes, and permanent links to each file. The manifest also specifies whether each file is provisional or associated with a release.
Your download will include files packaged within folders within a single zip file. Our R package, neonUtilities v2.0 and above, has numerous functionalities including the ability to join files across sites and months for IS and OS data.
If you have downloaded the same provisional data file on two different dates, it is possible to discover whether changes have occurred by inspecting the time stamp at the end of each file name (for instrumented and observational data products, not data products from the airborne observation platform). The time stamp corresponds to the date and time at which the file was created, and is only changed when data are republished. More information is available at Data Formats and Conventions.
Users should plan to archive Provisional data used in publication in an appropriate repository, as NEON will not assign a DOI until the data are included in an official Release. More guidance on how to do this is forthcoming.
Data Product Revisions
If an instrument or protocol is significantly changed to the extent that users should be aware of potential issues with incompatibility, we will generate a new Revision of the data product, denoted by a change in data product identifier. Data from different revisions of the same data product are not directly comparable and should be used with caution when combining for use or analysis. Upon a data product revision, the REV field of the data product identifier will be incremented. The data product identifier takes the form DPL.PRNUM.REV, where DPL is the data product level, PRNUM is the product number, and REV is the product revision. Each data product revision will be findable in the Explore Data Product page, along with a short summary of the changes made between revisions.
Last updated January 26, 2021