NEON and Google
NEON is funded by the National Science Foundation to enable fundamental understanding of the dynamics underlying the causes and responses to environmental change. This understanding enables forecasting – prediction – of our future.
To improve and increase the ways in which people might engage with NEON data, we have moved all of our data products to Google’s Cloud Platform. Our data are just as available to researchers as before - they can be downloaded free of charge through our Data Portal and API. But in addition, they can be worked with directly in the cloud, using any of Google’s tools such as Google Earth Engine (for visualizing and querying our remote sensing data), BigQuery (for querying of our tabular data), Vertex AI (to leverage artificial intelligence capabilities), Vertex Workbook or RadLab (to create Jupyter notebooks), and Looker (to visualize data). This is especially useful for our large datasets that could otherwise take a lot of time to download, and that require computational resources sometimes beyond that of a typical laptop.
Google creates awareness of the existence of NEON data by tapping into the creativity and capacity of an interdisciplinary and global community. Google has huge global access, is known for making great user-friendly tools (e.g., Google Search, Google Workspace), and has a long history of leadership in open source development. In our partnership, we are looking forward to actively collaborating with Google to explore and evolve tools to support innovation.
We currently have several remote sensing datasets available on Google Earth Engine (GEE) as well as a tutorial series, Introduction to NEON Remote Sensing in Google Earth Engine, available to use. The ability to bring together NEON’s large remote sensing data with the many satellite-derived datasets provided by other organizations in the Google Earth Engine platform will advance the efforts to understand how ecosystem processes operate across spatial scales.
We also recently loaded two pilot demonstration datasets to the BigQuery database system, available via Google’s Analytics Hub (must have Google account to access) for the NEON community to explore. The pilot datasets are Chemical Properties of Surface Water (DP1.20093.001) and Continuous Discharge (DP4.00130.001). These will be accompanied by a Jupyter notebook in RAD Lab to show how the datasets may be analyzed and visualized in the cloud.
There are a few important things to consider when using NEON data in the cloud. First, you must have a Google account. Second, it is free to use the GEE datasets, but using the tabular datasets in Google BigQuery will incur minor costs for downloading or querying within the cloud platform. However, and third, researchers can get $300 in free credits to start with, and can also apply for larger pools of funding from Google.
If you have any questions regarding this new partnership, have any issues with the tutorials or demonstration projects, please contact us. We will be actively adding to the Frequently Asked Questions section below, so please check it before sending in your own questions.
Why is Google a good choice for NEON?
Generally, cloud infrastructure, including both data storage and compute, will allow NEON to be agile and efficient; for example, compute resource demands can be patchy over time, and data volume increases over time. A pay-as-we-grow model with cloud infrastructure - versus the pre-pay model that on-premises infrastructure requires - is economically more efficient for NEON. We followed a robust procurement process using government guidelines and regulations for services needed, and Google proposed the best overall deal. Google is a company that serves global needs and can provide economies of scale for NEON in terms of data storage, backups, and access. Google also has a long history of leadership in open source—from providing open code on Github to developing projects like Kubernetes, TensorFlow, and more. Open source provides the flexibility to deploy—and, if necessary, migrate—critical workloads across or off public cloud platforms. Also, Google Cloud’s Research Credits program supports scientific research and is a great tool to help advance NEON-enabled science. It is focused on enabling researchers to jumpstart their research in Google Cloud with seed-funding credits and some training support. Finally, Google supplies high quality backup services, and we are also backing data up to Amazon Web Service (AWS).
What challenges does having data in the cloud solve?
We know that many data users are struggling to analyze NEON data on their local machines - NEON data are rapidly becoming too big to handle in this way. This is especially a problem for users without local resources to store and compute on these big data. The cloud democratizes these big data analyses. Currently, Google Earth Engine (GEE) is the most exciting toolbox for NEON because of the size of remote sensing data and the difficulty to work with them on a local machine. GEE has a cloud compute environment that allows users to work with our largest data sets. We have 10 TB in GEE currently and expect much more over the 30 year lifetime of NEON. The AOP image collection includes cloud-free hyperspectral reflectance data, discrete lidar derived rasters (i.e., digital terrain models and canopy height models), as well as RGB camera imagery at 5 NEON sites spanning the United States, over multiple (2-4) years at each site. We have a great GEE Tutorial to learn more. Finally, the compute on Google is really fast - for example, an SQL query on the 23 million records of our Continuous Discharge data product runs in only 8 seconds!
What about interoperability across Clouds?
The Google buckets that the NEON data are in are publicly accessible and can be reached via other platforms. Google uses many cloud-agnostic protocols and is very open. As this partnership evolves, we will be working to bridge between platforms and provide resources to serve NEON data users on whatever cloud platform they choose.