NEON Data Institute 2016: Remote Sensing with Reproducible Workflows in R

June 20, 2016 - June 25, 2016
NEON

NEON's Data Institutes provide critical skills and foundational knowledge for graduate students and early career scientists working with heterogeneous spatio-temporal data to address ecological questions.

Data Institute Overview

Our 2016 Institute focused on remote sensing of vegetation using open source tools and reproducible science workflows. The programming language of instruction in 2016 was R. This Institute was held at NEON headquarters in June 2016.

In addition to the six day institute there were three weeks of pre-institute materials is to ensure that everyone comes to the Institute ready to work in a collaborative research environment. Pre-institute materials are online & individually paced, expect to spend 1-5 hrs/week depending on familiarity with the topic.

Schedule

Time Day Description
-- Computer Setup Materials
-- 25 May - 1 June Intro to NEON & Reproducible Science
-- 2-8 June Version Control & Collaborative Science with Git & GitHub
-- 9-15 June Documentation of Your Workflow with R Markdown
-- 19-24 June Data Institute
7:50am - 10:00 pm Monday Intro to NEON, Intro to HDF5 & Hyperspectral Remote Sensing
8:00am - 6:30pm Tuesday Intro to LiDAR data, Automated Workflows
8:00am - 6:30pm Wednesday Remote Sensing Uncertainty
8:00am - 6:30pm Thursday LiDAR & Hyperspectral Data Fusion
9:00am - 6:30pm Friday Individual/Group Applications
9:00am - 2:00pm Saturday Group Application Presentations

Key Dates

  • Application Deadline: March 28, 2016
  • Notification of Acceptance: April 4, 2016
  • Tuition payment due by: April 18, 2016
  • Pre-institute online activities: June 1-17, 2016
  • Institute Dates: June 20-25, 2016

Instructors

Dr. Leah Wasser, Supervising Scientist, NEON: As part of her work at NEON, Leah is passionate about helping the scientific community harness the power of remote sensing and other large spatio-temporal data using efficient, quantitative, reproducible approaches and open science workflows to better understand ecological change over time. Leah has a Ph.D. in ecology with a focus on using remote sensing techniques to measure landscape level ecological change.

Dr. Naupaka Zimmerman, Assistant Professor of Biology, University of San Francisco: Naupaka’s research focuses on the microbial ecology of plant-fungal interactions. Naupaka brings to the course experience and enthusiasm for reproducible workflows developed after discovering how challenging it is to keep track of complex analyses in his own dissertation and postdoctoral work. As a co-founder of the International Network of Next-Generation Ecologists and an instructor and lesson maintainer for Software Carpentry and Data Carpentry, Naupaka is very interested in providing and improving training experiences in open science and reproducible research methods.

Dr. Kyla Dahlin, Assistant Professor, Michigan State University: Kyla's research aims to better understand and quantify ecosystem processes and disturbance responses through the application of emerging technologies, including air- and space-borne remote sensing, spatial statistics, and process-based modeling. She is currently interested in semi-arid forest/grassland transition zones, where vegetation patterns are readily observable but poorly understood. Kyla approaches questions by integrating observational data, modeling, and focused field experiments to both refine our understanding of ecosystem function and to improve our ability to predict how ecosystems and the climate will change in the future.

Registration & Logistics

For registration information, please see the event registration page

Read here for more information on the logistics of the Data Institute. 


Online Resources

The teaching materials from the 2016 Data Institute are provided free on this site for use outside the Data Institute. They can be found in the Workshop Materials section of this page. These materials were designed to be used in the context of the workshop with an instructor, however, they may also be suitable for self-paced online instruction.

You too can watch several of the presentations that were given at the 2016 Data Institute!


2016 Data Institute Recap

In addition to the three core faculty listed above the Data Institute participants were instructed by and interacted with guest instructors and NEON project scientists:

  • Lindsay Powers, H5 Group –  HDF5 data structure
  • Chris Crosby, UNAVCO/Open Topography – LiDAR remote sensing
  • David Schimel, NASA Jet Propulsion Lab – remote sensing, open science, ecology
  • David Hulslander, NEON – Remote sensing data processing 
  • Tristan Goulden, NEON – Remote sensing theory & Hyperspectral remote sensing
  • Nathan Leisso, NEON – Introduction to NEON AOP data collection and processing
  • Courtney Meier, NEON – NEON in situ field measurements.
  • Keith Krause, NEON – NEON full waveform LiDAR

Participants

Participants came from institutions in the USA, Canada and the Netherlands. While 70% of the participants were graduate students, the Data Institute also attracted an undergraduate student, post-docs, and university research staff and faculty.

Participant were interested in using remote sensing data to answer a wide range of questions from wanting to be able to characterize forest structure and composition to using time series to detect vegetation disturbance patterns to from remote sensing data.

According to NEON science educator, Megan Jones, “Participants really appreciated the opportunities to work with data in small-group settings and the emphasis of using reproducible science methods. The science theme for 2016 was use of remote sensing data, but this was taught along with reproducible science methods including the importance of well documented code, version control and collaborative tools like GitHub, and quick sharing of results using RMarkdown and knitr.”  

Institute outcomes

At the end of the Institute, participants presented group projects illustrating the use of reproducible workflows with remote sensing data. The skills learned are applicable to remote sensing data from any source, however, all participants were allowed to use NEON remote sensing data as well as their own data sets. According to Robert Paul from the University of Illinois at Urbana-Champaign, “The course offered a comprehensive overview of best practices for managing and analyzing remote sensing data, and how to make data analysis workflows well-documented, collaborative, and reproducible.”

Sarah Graves, from the University of Florida, said, “The NEON Data Institute gave us the tools to work with novel ecological data. With our own knowledge of the domain combined with NEON data and tools, we are in a position to ask novel ecological questions that will advance the field of ecology beyond what has been traditionally possible.” Jeff Atkins of Virginia Commonwealth University added, “Ecology increasingly depends on "big data" and remote sensing and scientists need the skills necessary to work with this data and to inform their hypotheses. NEON does an amazing job at helping scientists learn how to work with and use a suite of data and data products.”

Group projects

Exploring the relationship between functional traits and spectral reflectance for Ordway Swisher Biological Station, FL
Sarah Graves, Jeff Atkins, Kunxuan Wang, and Catherine Hulshof de la Pena

We calculated plot-level foliar nitrogen content and functional diversity from in situ data.  These metrics were related to mean plot reflectance and a spectral diversity metric from a PCA transformation.

Describing landscape-level phenology with MODIS vegetation index time series
Robert Paul, Jeff Stephens

This workflow detects the length of time for NDVI and EVI to go from baseline to peak over the course of the year. Each pixel is classified with a value reflecting the length of time in the year for NDVI and EVI to reach peak greenness.

Characterizing the forest using trees: how do forest characteristics vary with respect to disturbance history at Soaproot Saddle
We attempted species-level classification using Random Forest on LiDAR and imaging spectroscopy.
Megan Cattau, Stella Cousins, Kristin Braziunas, Allie Weill

Towards individual tree crown segmentation with spectral indices
Enrique Montano & Dave McCaffrey

We attempted to implement an individual tree crown extraction algorithm, optimized with vegetation structure data from in situ plots. The ability to identify individual tree canopy with confidence will allow for comparison of spectral indices among individuals and across species.

Plant structure and function in complex terrain: Landscape controls and microclimatic consequences
Holly Andrews, Nate Looker, Amy Hudson

We examined climate, topography, and vegetation interactions. Specifically, we assessed spectral and LiDAR-based properties of vegetation across topographic gradients of water availability and compared land surface temperature to NDVI.

Upscaling Structure for Soaproot Field Site, California
Cassondra Walker, Jon Weiner, Richard Remigio

We attempted to link vegetation indices to plot-level tree characteristics, and then upscale those indices to the landscape scale to predict structure that was derived from LiDAR.

Using HyperSpectral Imaging techniques to predict foliar nutrient concentrations
Michiel Veldhuis

This page includes all of the materials needed for the Data Institute including the pre-institute materials.  Please use the sidebar menu to find the appropriate week or day.  If you have problems with any of the materials please email us or use the comments section at the bottom of the appropriate page.  

Please note that the format of the HDF5 files use for many of the tutorials for the 2016 Data Institute are in a format no longer used for NEON data.  We are in the process of updating the content to the new HDF5 format. As the tutorials are updated they will be linked below.  In the meantime, you can view the tutorials using the old format on the original NEON Data Insitute 2016 website.


Pre-Institute: Computer Set Up Materials 

It is important that you have your computer setup, prior to diving into the pre-institute materials in week 2!
Please review the links below to setup the laptop you will be bringing to the Data Institute.

Let's Get Your Computer Setup!

Go to each of the following tutorials and complete the directions to set your computer up for the Data Institute. 

Tutorial: Install Git, Bash Shell, R & RStudio

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Set up GitHub Working Directory - Quick Intro to Bash

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Data Institute: Install Required R Packages

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Install QGIS & HDF5View

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Data Institute 2016: Download the Data

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre-Institute Week 1: Introduction to NEON & Reproducible Science

In the first week of the pre-institute activities, we will review the NEON project. We will also provide you with a general overview of reproducible science. Over the next few weeks will we ask you to review materials and submit something that demonstrates you have mastered the materials.

Learning Objectives

After completing these activities, you will be able to:

  • Explain sources of uncertainty in remote sensing data.
  • Measure the differences between a metric derived from remote sensing data and the same metric derived from data collected on the ground.

Week 1 Assignment

After reviewing the materials below, please write up a summary of a project that you are interested working on at the Data Institute. Be sure to consider what data you will need (NEON or other). You will have time to refine your idea over the next few weeks. Save this document as you will submit it next week as a part of week 2 materials!

Deadline: Please complete this by Thursday June 2nd 2016 @ 11:59 MDT.

Week 1 Materials 

Please carefully read and review the materials below:

Tutorial: Introduction to the National Ecological Observatory Network (NEON)

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: The Importance of Reproducible Science

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre-Institute Week 2: Version Control & Collaborative Science with Git & Git Hub

The goal of the pre-institute materials is to ensure that everyone comes to the Institute ready to work in a collaborative research environment. If you recall, from last week, the four facets of reproducibility are documentation, organization, automation, and dissemination.

This week we will focus on learning to use tools to help us with these facets: Git and GitHub. The Git Hub environment supports both a collaborative approach to science through code sharing and dissemination, and a powerful version control system that supports both efficient project organization, and an effective way to save your work.

Learning Objectives

After completing these activities, you will be able to:

  • Summarize the key components of a version control system
  • Know how to setup a GitHub account
  • Know how to setup Git locally 
  • Work in a collaborative workflow on GitHub

Week 2 Assignment

The assignment for this week is to revise the Data Institute capstone project summary that you developed last week. You will submit your project summary, with a brief biography to introduce yourself, to a shared GitHub repository.

Please complete this assignment by Thursday June 9th @ 11:59 PM MDT.

If you are familiar with forked repos and pull requests GitHub, and the use of Git in the command line, you may be able to complete the assignment without viewing the
tutorials.

 

Tutorial: Assignment: Version Control with GitHub

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Week 2 Materials 

Please complete each of the short tutorials in this series. 

Series: Version Control with GitHub

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre-Institute Week 3: Documentation of Your Workflow with R Markdown

In week 3, you will use the R Markdown file format to document code and efficiently publish code results & outputs. You will practice your Git skills by publishing your work in the NEON-WorkWithData/DI-NEON-participants GitHub repository.

Learning Objectives

After completing these activities, you will be able to:

  • Use R Markdown and knitr to create code with formatted context text
  • Describe the value of documented workflows

Week 3 Assignment

Please complete the activity and submit your work to the GitHub repo by 11:59 Thursday June 16th.

If you are familiar with using R Markdown files to document your workflow and knitting to HTML then you may be able to complete the assignment without viewing the tutorials.

Tutorial: Assignment: Reproducible Workflows with R Markdown

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Week 3 Materials

Please complete each of the short tutorials in this series: 

Series: Document Your Code with R Markdown

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Monday: NEON, HDF5, & Hyperspectral Data

Welcome to Day One of the Institute!

Learning Objectives

After completing these activities, you will be able to:

  • Describe the NEON project & NEON AOP data
  • Understand key remote sensing data types (active vs passive sensors)
  • Open and work with raster data stored in HDF5 format in R
  • Explain the key components of the HDF5 data structure (groups, datasets and attributes)
  • Open and use attribute data (metadata) from an HDF5 file in R
Time Topic Instructor
8:00 Welcome & Introductions
9:30 Introduction to NEON AOP Nathan Leisso
10:15 Break
10:30 Big Data, Open Data and Biodiversity (video) Dave Schimel
12:00 Lunch
1:00 Introduction to the HDF5 File Format (download presentation PDF) Lindsay Powers
1:45 An Introduction to Hyperspectral Remote Sensing (video) Tristan Goulden
2:00 Work with Hyperspectral Remote Sensing data in R - HDF5 Leah & Naupaka
2:45 NEON Hyperspectral Remote Sensing Data in R - Efficient Processing Using Functions Leah & Naupaka
3:15 Plot a Spectral Signature from Hyperspectral Remote Sensing data in R - HDF5 Leah & Naupaka
3:45 Break
4:00 Calculate NDVI from NEON Hyperspectral Remote Sensing Data in R Leah & Naupaka
6:00 Dinner Break
8:00 Reproducible Science Methods Naupaka
10:00 End

Additional Resources

These tutorials were not part of the day's curriculum but may be useful to those looking for more background information on raster data and HDF5 formats or to go beyond the day's materials


Tuesday: LiDAR & Automation

In the morning, we will review the basics of discrete return and full waveform lidar data. In the afternoon, we will focus on automation as a means to write more efficient, usable code.

Learning Objectives

After completing these activities, you will be able to:

  • Explain the difference between active and passive sensors.
  • Explain the difference between discrete return and full waveform LiDAR.
  • Describe applications of LiDAR remote sensing data in the natural sciences.
  • Describe several NEON LiDAR remote sensing data products.
  • Explain why modularization is important and supports efficient coding practices.
  • How to modularize code using functions.
  • Integrate basic automation into your existing data workflow.
Time Topic Instructor
8:00 Introduction to LiDAR (video) Tristan Goulden
8:30 Introduction to full waveform LiDAR (video) Keith Krause
9:00 OpenTopography: Increasing the Impact of High Resolution Topography through Open, Online Access to Data and Processing (download presentation PDF) Christopher Crosby
10:00 Break
10:15 Classify a Raster using Threshold Values in R Leah, Naupaka
11:00 Mask a Raster using Threshold Values in R Leah, Naupaka
12:00 Lunch
1:00 Code Automation - Adapted from Reproducible Science Curriculum materials Naupaka
5:30 End

Additional Resources

These tutorials were not part of the day's curriculum but may be useful to those looking for more background information on raster data or to go beyond the day's materials


Wednesday: Comparing Ground to Airborne – Uncertainty

Today, we will focus on the importance of uncertainty when using remote sensing data. We will work on a hands-on activity where we compare remote sensing derived vegetation metrics to metrics collected on the ground (in situ).

Learning Objectives

After completing these activities, you will be able to:

  • Explain sources of uncertainty in remote sensing data.
  • Measure the differences between a metric derived from remote sensing data and the same metric derived from data collected on the ground.
Time Topic Instructor
8:00 Vegetation Data Indices and NEON Data Products (video) Dave Hulslander
8:45 NEON Terrestrial Observation Vegetation Sampling & Integration with Remote Sensing (video) Courtney Meier
9:30 The Importance of Validation & Uncertainty Issues when Using Remote Sensing Data Kyla Dahlin
10:00 Break
10:15 Extract Values from Rasters in R & Compare Ground to Airborne Kyla
12:00 Lunch
1:00 Collaborative mini-projects in Uncertainty Leah & Kyla
3:30 Break
3:45 Collaborative mini-projects in Uncertainty, continued
5:00 Present Results & Methods Discussion
6:00 End

Additional Resources

These three presentations (videos linked) were not part of the 2016 Data Institute but the topics are aligned with the day's theme and may be of interest.


Thursday: LiDAR & Hyperspectral Data Fusion

Today, we will focus on the importance of uncertainty when using remote sensing data. We will work on a hands-on activity where we compare remote sensing derived vegetation metrics to metrics collected on the ground (in situ).

Learning Objectives

After completing these activities, you will be able to:

  • Use the thresholding approach to data fusion that masks and identify areas of a site with similar physical and other characteristics (e.g., the same slope, aspect, elevation, vegetation height).
  • Describe a statistical approach to data fusion.
Time Topic Instructor
8:00 Combining LiDAR & Hyperspectral Data to Advance Ecology Kyla Dahlin
9:00 Thresholding Kyla
10:15 Break
10:30 Data Fusion Group Coding Activity Kyla
12:00 Lunch
1:00 Data Fusion Group Coding Activity Kyla
3:30 Break
3:45 NEON facilities tour
5:15 Group Project Selection & Prep
6:00 End

Friday: Data Institute Capstone Projects

Today, you will use all of the skills you’ve learned at the Institute to work on a team project that uses NEON and/or related data!

Learning Objectives

During this activity, you will:

  • Apply the skills that you have learned to process data using efficient coding practices.
  • Expand upon your skills in working with remote sensing data through collaborative peer-learning.
  • Apply your understanding of remote sensing data and use it to address a science question of your choice
  • Implement version control and collaborate with your colleagues through the GitHub platform.

Throughout the day teams will be working on their capstone projects. Data Institute instructors and other NEON project scientists will be available to answer questions and assist as needed with the projects.

Time Topic Instructor
9:00 Breakout rooms open for teams to work on capstone project
12:00 Lunch
1:00 Breakout rooms open for teams to work
4:45 End of Day Wrap Up & Presentation Sign Ups
6:30 Participants must leave building for night

Additional Resources

If you are interested in creating a reveal.js presentation from R Markdown to present your capstone project, you can find out more information here: RStudio's reveal.js documentation.

Tutorial: NEON Data Institute Capstone Project

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Saturday: Data Institute Capstone Project Presentations

Time Topic Instructor
9:00 Presentations Start
12:00 Lunch
1:00 Final Questions & Institute Debrief
2:00 End

Communal Notetaking

During the Data Institute we will use Etherpad as a source for communal notetaking. 2016 Data Institute Etherpad

Add new comment

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Find more workshops
Institute Overview
Workshop Materials
Dialog content.