NEON end users commonly ask how quickly new data will be published after it is collected, and whether and when data will be available in the future. NEON began full operations in May 2019. Data collection during construction began at different times for every data product and every site, and data pipelines were built in parallel with early data collection. As a consequence, current data availability varies by site and by data product, and a few data products still have a backlog of data waiting to be processed.
Data availability can depend on the expected sampling frequency and timing for any given data product throughout a year, as well as how long it might take for data to go from the point of collection through the data processing pipelines and finally to the portal for use. In general, for automated instrument data, this process will take less time than for observational sampling. Below, we explain some of the processes and limiting factors that affect data availability for each data collection system.
- For the expected start year of collection by data product, download the Availability Spreadsheet.
- For detailed information about current availability and general processing latencies of a given data product, visit each data product's detail page through the links found in Explore Data Products.
- For information about the quality of available data, see Data Quality.
- To learn more about data that are sent to external repositories for hosting, see Externally Hosted Data.
Observation System (OS) Data
Frequency of data collection varies widely across observational data products. Information about collection frequency can be found in the data product detail pages in the Explore Data Products page and the data product readme files that are packaged with downloaded data; more detailed collection information can be found in protocol documents.
Overall, OS data collection frequency varies from three times per week to once every five years. For products collected every five years, one-fifth of all sites are collected each year. Due to this schedule, data may not yet be available for some sites. For the tentative schedule of initial sampling by site by year, out to 2021, see the Availability Spreadsheet.
Ongoing Data Latency
Observational data are collected by human observers, and are either directly observed or derived from analyses on physical samples collected in the field. The collection and analysis workflows result in four general categories of data latency for OS data:
- Data collected directly in the field
- Data from samples analyzed in NEON lab facilities
- Data from samples shipped to contracted analytical facilities on a rolling basis - i.e., samples are shipped as soon as they are collected or in small batches throughout the year
- Data from samples shipped to contracted analytical facilities on an annual basis - i.e., samples are shipped in bulk at the end of the field season
With a few exceptions, the latency duration increases from category 1 to 4. Field-collected data may be available on the NEON Data Portal within a month of collection, while samples analyzed on a rolling basis are typically available between 90 days and nine months after collection. Samples analyzed in bulk annually are a small proportion of OS data and have a latency of slightly more than one year.
Many OS data products include multiple data tables, each of which may fall into a different one of the categories above. For example, in the data product Plant foliar physical and chemical properties (DP1.10026.001), the cfc_fieldData table contains records of leaf collection in the field, the cfc_LMA table contains leaf mass per area measurements collected in the NEON lab, and the cfc_chlorophyll, cfc_elements, and cfc_lignin tables contain analyses performed by contracted facilities. Each data table is published to the data portal as it becomes available, so two months after leaf collection, the cfc_fieldData and cfc_LMA data may be available, while the chemical analyses are still pending. If downloaded data do not include all expected tables, check back later!
Legacy Data Latency
For some products and some sites, data collection began as early as 2012. Because fully operational data pipelines were not yet in place until 2018, a backlog of data accrued that had to be cleaned and quality controlled for ingest into the finished pipeline. In addition, some samples were collected without contracts finalized for labs to carry out analyses. These backlogs are mostly resolved, with most legacy data available on the portal. The only OS data products with a significant backlog are the microbial products and stream discharge products. Data will be added to the portal as they become available.
Instrument System (IS) Data
Raw sensor measurement frequency varies by data product and ranges from 40 Hz to one measurement every five minutes. Averaging is commonly applied during the process of transforming raw measurements into the data products served on the NEON Data Portal. Most IS data products are provided at two aggregation intervals, commonly 1 and 30 minutes. The Explore Data Products and associated data product detail pages provide information on raw sensor measurement frequency and aggregation intervals, in addition to other details about each data product.
Most often, sensors operate continuously after install at a site, barring local power or other unplanned issues. However, some sensor locations at some sites are removed seasonally due to adverse or inappropriate measurement conditions. For example, some aquatic sensors are removed during periods in which the lake or stream is dry or frozen, and the lowest 2D wind sensor is removed from towers that experience snow. Currently OKSR is the only site where the instrumented system is shut down entirely during winter due to lack of power supply. No data are processed during this period (no files available). We are working towards pausing the processing during any period that a sensor is seasonally removed. However, during NEON’s construction phase and in some current cases, processing is or was not paused for these scenarios, resulting in downloadable files with only quality flags populated (no sensor values).
Although there is a high degree of consistency in data product collection across sites, not all terrestrial/aquatic instrument data products are collected at all terrestrial/aquatic sites. To learn more about measurement collection across site types, visit the pages for Meteorology, Phenocam, Soil, Surface Water, and Groundwater sensor measurements.
Ongoing Data Latency
Instrumented data collected during any given month are published to the NEON Data Portal typically in the second week of the following month. All sites are online and streaming instrumented data, and most instrumented data products are continuously made available according to this monthly schedule.
All terrestrial instrumented sites are producing surface-atmosphere exchange data (DP4.00200.001: Bundled data products – eddy covariance) and the processed data availability for all sites is near-current (following the publishing schedule above). However, the code base for this data suite is still being updated frequently to improve quality flagging routines and output data coverage. Thus, reprocessing is anticipated to occur on an ongoing basis until the first static NEON data release occurs at the end of 2020.
Legacy Data Latency
Data collected prior to January 2018 may contain large gaps in coverage or low availability although the site was operational. Legacy data availability will improve as data are reprocessed prior to the first static NEON data release planned for the end of 2020.
Airborne Observation Platform (AOP) Data
Collection Frequency and Data Latency
The AOP collects airborne remote sensing data at a subset of the NEON sites annually. The schedule for each flight season is published on the NEON website several months prior to commencement of collections (see this page for the latest updates on AOP’s flight season schedule), and provides the time period that the AOP will be at each domain. Precise collection times for a site within the domain’s scheduled time period is not predetermined, as it is primarily driven by weather conditions encountered upon arrival. The total collection time required for each site will vary based on the size of the area covered as well as weather conditions. Under good weather conditions, the smallest NEON sites can be collected in a single day, while the largest could take up to a week. If weather conditions prove unsuitable for collections during the entire scheduled time at a domain, portions of sites or entire sites may not be collected. The AOP data will typically be published on the NEON data portal in less than 60 days after the final collection day at a site. All non-legacy data collected to date are currently available.
Legacy Data Latency
The NEON AOP conducted collections of several sites prior to NEON reaching its operational phase. Collections conducted between 2013 and 2016 were processed with algorithms that had not yet reached their current level of maturity. These ‘legacy’ products are currently provided through the NEON data portal and are concurrently undergoing a re-processing so that they conform in structure and quality with current products. Publishing of legacy data has commenced with the publishing of a re-processed site from the 2013 flight season. Publishing of further legacy products is expected to be ongoing until completion which is expected by the end of August 2020.