Knowing whether data will be collected in the future, and how quickly new data will be published after they are collected, is important for planning research. This article describes NEON’s process for providing the community with information about data availability.
Data availability includes two aspects that combine to yield scientifically useable data (Sebastian-Coleman 2013):
- Completeness (technical availability): The quantity of data (e.g., number of records, pixels) published over a period of time, compared to the amount of data expected.
- Validity (scientific availability): The proportion of data published over a period of time that has passed all quality checks.
NEON began full operations in May 2019. Data collection during construction (i.e., prior to May 2019) began at different times for every data product and every site, and data pipelines were built in parallel with early data collection. As a consequence, current data availability varies by site and by data product.
The factors that impact the completeness of NEON data products are detailed below for each data collection system. Please visit the Data Quality page to better understand NEON’s processes for assessing, addressing, and communicating data validity. If you are interested in the nitty-gritty details of how NEON measures data availability and sets performance thresholds, you can read our Data Availability Plan.
Data product completeness depends on two factors:
- Collection Frequency: The expected sampling frequency and timing for any given data product throughout a year; and
- Data Latency: How long it takes for data to go from the point of collection through the data processing pipelines and finally to the portal for use. In general, for automated instrument data, this process will take less time than for observational sampling.
Below, we explain some of the processes and limiting factors that affect data completeness for each data collection system.
Observation System (OS) Data
Frequency of data collection varies widely across observational data products. Information about collection frequency can be found in the data product detail pages in the Explore Data Products page and the data product readme files that are packaged with downloaded data; more detailed collection information can be found in protocol documents.
Overall, OS data collection frequency varies from three times per week to once every five years. For products collected every five years, one-fifth of all sites are collected each year. Due to this schedule, data may not yet be available for some sites.
Observational data are collected by human observers, and are either directly observed or derived from analyses on physical samples collected in the field. The collection and analysis workflows result in four general categories of data latency for OS data:
- Data collected directly in the field
- Data from samples analyzed in NEON lab facilities
- Data from samples shipped to contracted analytical facilities on a rolling basis - i.e., samples are shipped as soon as they are collected or in small batches throughout the year
- Data from samples shipped to contracted analytical facilities on an annual basis - i.e., samples are shipped in bulk at the end of the field season
With a few exceptions, the latency duration increases from category 1 to 4. Field-collected data may be available on the NEON Data Portal within a month of collection, while samples analyzed on a rolling basis are typically available between 90 days and nine months after collection. Samples analyzed in bulk annually are a small proportion of OS data and have a latency of slightly more than one year.
Many OS data products include multiple data tables, each of which may fall into a different one of the categories above. For example, in the data product Plant foliar traits (DP1.10026.001), the cfc_fieldData table contains records of leaf collection in the field, the cfc_LMA table contains leaf mass per area measurements collected in the NEON lab, and the cfc_chlorophyll, cfc_elements, and cfc_lignin tables contain analyses performed by contracted facilities. Each data table is published to the data portal as it becomes available, so three months after leaf collection, the cfc_fieldData and cfc_LMA data may be available, while the chemical analyses are still pending. If downloaded data do not include all expected tables, check back later!
Instrument System (IS) Data
Raw sensor measurement frequency varies by data product and ranges from 40 Hz to one measurement every five minutes. Averaging is commonly applied during the process of transforming raw measurements into the data products served on the NEON Data Portal. Most IS data products are provided at two aggregation intervals, commonly 1 and 30 minutes. The Explore Data Products and associated data product detail pages provide information on raw sensor measurement frequency and aggregation intervals, in addition to other details about each data product.
Most often, sensors operate continuously after install at a site, barring local power or other unplanned issues. However, some sensor locations at some sites are removed seasonally due to adverse or inappropriate measurement conditions. For example, some aquatic sensors are removed during periods in which the lake or stream is dry or frozen, and the lowest 2D wind sensor is removed from towers that experience snow. Currently OKSR is the only site where the instrumented system is shut down entirely during winter due to lack of power supply. No data are processed during this period (no files available). We are working towards pausing the processing during any period that a sensor is seasonally removed. However, during NEON’s construction phase and in some current cases, processing is or was not paused for these scenarios, resulting in downloadable files with only quality flags populated (no sensor values).
Although there is a high degree of consistency in data product collection across sites, not all terrestrial/aquatic instrument data products are collected at all terrestrial/aquatic sites. To learn more about measurement collection across site types, visit the pages for Meteorology, Phenocam, Soil, Surface Water, and Groundwater sensor measurements.
Instrumented data collected during any given month are published to the NEON Data Portal typically in the second week of the following month. All sites are online and streaming instrumented data, and most instrumented data products are continuously made available according to this monthly schedule.
All terrestrial instrumented sites are producing surface-atmosphere exchange data (DP4.00200.001: Bundled data products – eddy covariance) and the processed data availability for all sites is near-current (following the publishing schedule above). However, the code base for this data suite is still being updated frequently to improve quality flagging routines and output data coverage.
Airborne Observation Platform (AOP) Data
The AOP collects airborne remote sensing data at a subset of the NEON sites annually. The schedule for each flight season is published on the NEON website several months prior to commencement of collections (see this page for the latest updates on AOP’s flight season schedule), and provides the time period that the AOP will be at each domain. Precise collection times for a site within the domain’s scheduled time period is not predetermined, as it is primarily driven by weather conditions encountered upon arrival. The total collection time required for each site will vary based on the size of the area covered as well as weather conditions. Under good weather conditions, the smallest NEON sites can be collected in a single day, while the largest could take up to a week. If weather conditions prove unsuitable for collections during the entire scheduled time at a domain, portions of sites or entire sites may not be collected.
The AOP data require extensive processing, including quality checks, after collection. Consequently, these data will typically be published on the NEON data portal approximately 60 days after the final collection day at a site.
Sebastian-Coleman, Laura.2013. Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework, In MK Series on Business Intelligence, ISBN 9780123970336, https://doi.org/10.1016/B978-0-12-397033-6.00034-1.
Last updated June 13, 2023