A Primer on the NEON Sampling Design
By Sandra Chung
For a nationwide, long-term study of climate and ecology, location is key. The locations of NEON’s 20 core data collection sites play a critical role in the observatory’s ability to address the broad ecology and climate questions it was uniquely designed to answer. Careful study of un-invaded and invaded sites and of wildland and disturbed sites can help illuminate complex relationships between ecosystems, invasive species and land use. But to construct a detailed ecological forecast for the entire country, NEON also needs to collect data from locations that represent (as much as possible) the enormous variety of ecological and climate conditions across the United States.
Selecting locations to meet all these design requirements isn’t easy. Ample computational power and scientific expertise have gone into generating the current map of NEON sites, which is the product of dozens of meetings and workshops over the past five years and incorporates input from hundreds of scientists. The sites will have to anchor national, regional and local studies over NEON’s planned 30-year lifetime. “We know site selection is a very important and sensitive process,” says the NEON Project's first Principal Investigator David Schimel. “And doing it was an exciting exercise in collaborative continental science.”
"Most existing ecological network sites are created to serve the most exciting science, wherever it may be found," Schimel says. Scientists at each site collect data at different times using different methods to serve different scientific priorities. These networks generate a lot of interesting science. But anyone attempting to conduct a long-term, nationwide analysis of ecological and climate change has no choice but to paste together odd patches of data with wobbly statistical glue. NEON is designed to take a more seamless approach to gathering continental-scale data.
To tease out complex cause-and-effect relationships and construct a consistent forecast for the whole continent, NEON needs a strategy to collect data from a representative sample of the U.S.’s broad range of ecological and climate conditions. Back when NEON was founded, such a strategy didn’t exist. The National Science Foundation assigned NEON the task of developing an objective way to design a nationally representative network of ecological monitoring sites.
The first step in choosing representative NEON core sites was to statistically partition the U.S. into an optimum number of domains that were as different from each other as possible. Choosing representative network sites from each of these domains helped ensure that the network would capture as much of the full range of U.S. ecological and climate diversity as it could with a limited number of sampling sites. Oak Ridge National Lab scientists working with the NEON National Network Design Committee used a process called multivariate geographic clustering to divide the U.S. into regions of similar climate and ecology. The clustering algorithm works like so: slice the country into many equal-sized patches, then cluster the patches based on how similar they are to each cluster’s average of important climate and ecological conditions, like vegetation, topography and precipitation. Plug the results back into the algorithm and repeat until the algorithm converges on a set of clusters that are as different from each other as possible. “We wanted the domain boundaries to be based on consistent, national-scale data sets that had high spatial resolution,” Schimel says.
In 2006, William Hargrove and Forrest Hoffman at Oak Ridge National Laboratory had a large collection of continental-scale data sets and access to enough supercomputing power to conduct a multivariate analysis on millions of square kilometers. The Oak Ridge scientists had data on nine climatic variables for each of nearly eight million one square kilometer patches. A supercomputer crunched the data and clustered the patches into 25 rough eco-regions. But that first cluster map had a long way to go before becoming workable research domains under NEON’s funding and logistical constraints.
Two neighboring patches of the U.S. tend to have similar climate and biology, so most patches with similar ecology are also neighbors in space. "However, the clustering process also carved out some ecological islands, like a few places in the Smoky Mountains, where the climate and biology might resemble faraway Colorado Rockies more than a nearby north Georgia valley," Schimel says. "What’s more, climate and ecological variability don’t distribute themselves evenly across the country. In a day’s drive across California, a traveler might experience several wide swings between cold and hot, wet and dry, lush and barren. On the other hand, a huge swath of the U.S. stretching from the Midwest almost to the Atlantic coast has pretty much the same climate," Schimel says. The scientists and managers working within such a huge domain would have to spend many costly days traveling between the domain’s research sites.
Hargrove and Hoffman worked with the design committee to smooth the domain borders, eliminate “islands,” and merge or subdivide eco-regions to make domains of manageable size. They came up with 20 domains with borders that mostly hew true to the original statistical cluster map. “If we were doing it again today we might use another method, but the results probably wouldn’t be very different,” Schimel says. The number 20 almost certainly wouldn’t budge much, he noted. The design committee and the Oak Ridge team considered maps with between 10 and 200 regions. “More is always better,” Schimel says. But more costs more, too, and the NEON Project has to stay within budget and human resource limits. “Going beyond 20, the gain in information with each additional site slows down,” Schimel says. A network with 20 domains hits a sweet spot in the tradeoff between the cost of creating more domains and infrastructure and the amount of useful information the network can collect. “It’s a number that has scientific support,” Schimel says.
Choosing Sites: a collaboration across the scientific community
Once the domains were defined, NEON asked the scientific community for suggestions for core research site locations within each domain. Hundreds of people responded from a wide variety of geographic locations and scientific disciplines within the ecological research community. In February 2007, a National Science Foundation review panel met with NEON staff and board members at the U.S. Geological Survey’s EROS Data Center in Sioux Falls, Dakota to develop scientific and practical criteria to screen the suggestions. The Sioux Falls meeting participants decided that the core sites needed to be wildland sites representative of conditions in that domain. The core site would also act as reference points to study the effects of human-driven changes and would have to be accessible for 30 years and to NEON’s own Airborne Observation Platform.
At a subsequent technical review meeting, NEON Project senior staff and members of the ecological research community used these criteria and data from the U.S. Geological Service to select candidate core sites. Once the core sites had been selected and vetted via field visits from NEON Project scientists, NEON Project visiting ecosystem ecologist David Moore and his computational modeling colleagues set to work evaluating just how well the 20 sites represent the entire U.S.’s ecology and climate.
In order to meet NEON’s goal of ecological and climate forecasting, computer modelers need to be able to correlate data from NEON sites with every other same-size patch of the U.S. The modeling team put the core sites through their paces with a well-established ecosystem model called the Community Land Model. The model simulated 30 years of water and carbon exchange in several million patches of the U.S., including the core sites. After comparing the results in each patch, the team found that "most of the U.S. correlated well with one or more core sites," Moore says. The areas that correlated the least with core sites tended to be in the desert Southwest and southern Florida. "The relatively poor correlation in the southern tier might have to do with model bias: the model doesn’t sample many vegetation types found in desert regions," Moore says. The sheer diversity of desert types could also pose a particular challenge to a nationwide climate and ecology model. "In addition to verifying that the network design is generally robust, the model shows us where we have some places we could still work on,” Schimel says.
Relocatable sites and mobile deployment platforms designed to extend coverage
NEON network designers are already planning to expand on core site coverage using relocatable and mobile sites. Because the core sites stay anchored to representative wildland spots, they can’t directly document disruptive events like changes in land use or natural and man-made disasters, and they may not be ideally situated to study important long-range phenomena like air pollution and nutrient transport. Relocatable sites will stay put for around five years in one location to gather data for research projects under a few major themes including land use change, invasive species, and nitrogen deposition. Mobile sites, on the other hand, can be dispatched on an even shorter time scale to cover events like wildfires and oil spills. NEON Project scientists continue to evaluate and tune their computer models and observatory design with available data. Moore adds that the model correlation results might change as the model itself evolves. Schimel says there’s room for improvement in the design process, too. “The next time we do this we need more sophisticated data. We’ll have 100 or more variables, not just nine,” Schimel says. All indications are that the current NEON design is a solid start – with plenty room to grow and evolve along with the tools and knowledge of the ecological and climate research community.