January 2021 will mark an exciting milestone for the NEON program: our first data release. What is a data release? We're glad you asked!
What is a Data Release?
A data release is a complete set of data files that are considered to be final or unchanging (static). A release includes data from the first date of collection to the cutoff date for the releases (generally six to 12 months before the release is published). These released files will remain stable and accessible throughout the life of the Observatory. Each data product within a release will be associated with a Digital Object Identifier (DOI) for reference and citation. These releases are planned to occur annually for each data product, starting in January 2021.
What is the Difference Between Released and Provisional Data?
Both provisional and released data have been checked to the greatest extent possible for any errors before publication and are considered fit for research. The important difference between provisional and released data is that provisional data files are subject to change and therefore do not have a guarantee of reproducibility, while released data files do as they will be unchangeable and publicly accessible to users.
Data collected by the Observatory are made available through the NEON Data Portal after going through initial quality checks; this process allows us to get research-grade data into the hands of the science community as quickly as possible. After publication as a provisional product, additions and corrections may be made in subsequent months as new information comes to light (e.g., a calibration problem with a sensor or an error in manual data recording that was not discovered immediately). Provisional data are updated as needed, and the original data will no longer be available from the Portal.
At the end of each calendar year, NEON will conduct a thorough review of the data, make any final corrections, and tag data files with a release year. These tagged files will be designated to never be changed and to always be available to the public while NEON operates. Finally, a unique DOI will be assigned to each data product within the release, and the release will be published on the Data Portal. After the release, if additional corrections are needed, they will be made available in a future release.
For airborne remote sensing data collected by the Airborne Observation Platform (AOP), the large volume of data makes it prohibitively expensive to maintain accessible copies of all releases over time. Therefore, users will only have immediate access to the most current release and provisional data, not prior releases.
Why Do We Need a Data Release?
NEON is a supporter of the FAIR Data Principles – that data be Findable, Accessible, Interoperable, and Reusable. Releases particularly support the Accessible and Reusable principles. Associating a DOI with each data product release will support best practices for citation in papers and other publications. The static files are also ideal for use in data analytics spanning across multiple years and ingestion into various applications - this will enable our users to also support FAIR principles. Including the relevant DOIs in your citations will allow for reproducibility and will support more accurate and efficient tracking of NEON data use to better characterize the value of NEON data.
What is Included in the Data Release?
Each annual release will generally include all NEON data and metadata from the first date of collection to a cutoff date prior to the release publication date. Different data products have different cutoff dates, depending on the type of work that needs to be completed to verify the data. In general, the provisional period for Instrument Systems (IS) and Airborne Observation Platform (AOP) data is six months, while the provisional period for Observation Systems (OS) data is 12 months. This 12 month period reflects the additional time needed for downstream lab analysis and quality control processes for observations and samples collected manually in the field. There may be data-product specific exceptions to these guidelines. The January 2021 release is therefore expected to include IS and AOP data products through June 2020 and OS data products through December 2019.
Where Can the Static Data Release Be Found?
The January 2021 release will be available through the NEON Data Portal. Each data product details page will have a link to a specific page about that data product's release. Over time, each data product release will have its own page, all linked to the parent data product detail page.
Releases, as well as the provisional data, will also be downloadable through the API.
How Will this Change My Workflow?
- When you download a zip package of data, you will no longer see zip files within the parent zip file. Instead, each site by month folder will be clearly visible and will not need to be unzipped. The folder names will be updated to include publication date and either "provisional" or the current release tag.
- The neonUtilities package will be updated to provide the same functionality as before – it will download and join files correctly as it does now despite changes to the folder structure and naming convention. You will need to update to the latest version of neonUtilities, which will be released in parallel with the data release.
- The API will be updated to support data releases and will be backward compatible to prevent existing code from breaking to the extent possible.
- Citations of NEON data in your products and publications will now include DOIs.
You can learn more about static and provisional data from the NEON program Data Product Revisions and Releases page. As the date for the Data Release draws nearer, we will provide more information on this page and others on NEONscience.org. If you would like to stay up to date with new data and technical updates, please visit our Data Notifications page.