NEON Data Institute 2017: Remote Sensing with Reproducible Workflows using Python

June 19, 2017 - June 24, 2017
NEON

NEON Data Institutes provide critical skills and foundational knowledge for graduate students and early career scientists working with heterogeneous spatio-temporal data to address ecological questions.

Data Institute Overview

The 2017 Institute focuses on remote sensing of vegetation using open source tools and reproducible science workflows -- the primary programming language will be Python.

This Institute will be held at NEON headquarters in June 2017. In addition to the six days of in-person training, there are three weeks of pre-institute materials is to ensure that everyone comes to the Institute ready to work in a collaborative research environment. Pre-institute materials are online & individually paced, expect to spend 1-5 hrs/week depending on your familiarity with the topic.

Schedule

Time Day Description
-- Computer Setup Materials
-- 25 May - 1 June Intro to NEON & Reproducible Science
-- 2-8 June Version Control & Collaborative Science with Git & GitHub
-- 9-15 June Documentation of Your Workflow with iPython/Jupyter Notebooks
-- 19-24 June Data Institute
7:50am - 6:30 pm Monday Intro to NEON, Intro to HDF5 & Hyperspectral Remote Sensing
8:00am - 6:30pm Tuesday Reproducible & Automated Workflows, Intro to LiDAR data
8:00am - 6:30pm Wednesday Remote Sensing Uncertainty
8:00am - 6:30pm Thursday Hyperspectral Data & Vegetation
8:00am - 6:30pm Friday Individual/Group Applications
9:00am - 1:00pm Saturday Group Application Presentations

Key 2017 Dates

  • Applications Open: 17 January 2017
  • Application Deadline: 10 March 2017
  • Notification of Acceptance: late March 2017
  • Tuition payment due by: mid April 2017
  • Pre-institute online activities: June 1-17, 2017
  • Institute Dates: June 19-24, 2017

Instructors

Dr. Tristan Goulden, Associate Scientist-Airborne Platform, Battelle-NEON: Tristan is a remote sensing scientist with NEON specializing in LiDAR. He also co-lead NEON’s Remote Sensing IPT (integrated product team) which focusses on developing algorithms and associated documentation for all of NEON’s remote sensing data products. His past research focus has been on characterizing uncertainty in LiDAR observations/processing and propagating the uncertainty into downstream data products. During his PhD, he focused on developing uncertainty models for topographic attributes (elevation, slope, aspect), hydrological products such as watershed boundaries, stream networks, as well as stream flow and erosion at the watershed scale. His past experience in LiDAR has included all aspects of the LIDAR workflow including; mission planning, airborne operations, processing of raw data, and development of higher level data products. During his graduate research he applied these skills on LiDAR flights over several case study watersheds of study as well as some amazing LiDAR flights over the Canadian Rockies for monitoring change of alpine glaciers. His software experience for LiDAR processing includes Applanix’s POSPac MMS, Optech’s LMS software, Riegl’s LMS software, LAStools, Pulsetools, TerraScan, QT Modeler, ArcGIS, QGIS, Surfer, and self-written scripts in Matlab for point-cloud, raster, and waveform processing.

Bridget Hass, Remote Sensing Data Processing Technician, Battelle-NEON: Bridget’s daily work includes processing LiDAR and hyperspectral data collected by NEON's Aerial Observation Platform (AOP). Prior to joining NEON, Bridget worked in marine geophysics as a shipboard technician and research assistant. She is excited to be a part of producing NEON's AOP data and to share techniques for working with this data during the 2017 Data Institute.

Dr. Naupaka Zimmerman, Assistant Professor of Biology, University of San Francisco: Naupaka’s research focuses on the microbial ecology of plant-fungal interactions. Naupaka brings to the course experience and enthusiasm for reproducible workflows developed after discovering how challenging it is to keep track of complex analyses in his own dissertation and postdoctoral work. As a co-founder of the International Network of Next-Generation Ecologists and an instructor and lesson maintainer for Software Carpentry and Data Carpentry, Naupaka is very interested in providing and improving training experiences in open science and reproducible research methods.

Dr. Paul Gader, Professor, University of Florida: Paul is a Professor of Computer & Information Science & Engineering (CISE) at the Engineering School of Sustainable Infrastructure and the Environment (ESSIE) at the University of Florida(UF). Paul received his Ph.D. in Mathematics for parallel image processing and applied mathematics research in 1986 from UF, spent 5 years in industry, and has been teaching at various universities since 1991. His first research in image processing was in 1984 focused on algorithms for detection of bridges in Forward Looking Infra-Red (FLIR) imagery. He has investigated algorithms for land mine research since 1996, leading a team that produced new algorithms and real-time software for a sensor system currently operational in Afghanistan. His landmine detection projects involve algorithm development for data generated from hand-held, vehicle-based, and airborne sensors, including ground penetrating radar, acoustic/seismic, broadband IR (emissive and reflective bands), emissive and reflective hyperspectral imagery, and wide-band electro-magnetic sensors. In the past few years, he focused on
algorithms for imaging spectroscopy. He is currently researching nonlinear unmixing for object and material detection, classification and segmentation, and estimating plant traits. He has given tutorials on nonlinear unmixing at International Conferences. He is a Fellow of the Institute of Electrical and Electronic Engineers, an Endowed Professor at the University of Florida, was selected for a 3-year term as a UF Research Foundation Professor, and has over 100 refereed journal articles and over 300 conference articles.

Registration & Logistics

For information on how to register, please see the event registration page

Read here for more information on the logistics of the Data Institute. 

This page includes all of the materials needed for the Data Institute including the pre-institute materials. Please use the sidebar menu to find the appropriate week or day.  If you have problems with any of the materials please email us or use the comments section at the bottom of the appropriate page.  


Pre-Institute: Computer Set Up Materials 

It is important that you have your computer setup, prior to diving into the pre-institute materials in week 2! Please review the links below to setup the laptop you will be bringing to the Data Institute.

Let's Get Your Computer Setup!

Go to each of the following tutorials and complete the directions to set your computer up for the Data Institute. 

Tutorial: Install Git, Bash Shell, Python

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Set up GitHub Working Directory - Quick Intro to Bash

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Install QGIS & HDF5View

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Data Institute 2017: Download the Data

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre-Institute Week 1: Introduction to NEON & Reproducible Science

In the first week of the pre-institute activities, we will review the NEON project. We will also provide you with a general overview of reproducible science. Over the next few weeks will we ask you to review materials and submit something that demonstrates you have mastered the materials.

Learning Objectives

After completing these activities, you will be able to:

  • Explain sources of uncertainty in remote sensing data.
  • Measure the differences between a metric derived from remote sensing data and the same metric derived from data collected on the ground.

Week 1 Assignment

After reviewing the materials below, please write up a summary of a project that you are interested working on at the Data Institute. Be sure to consider what data you will need (NEON or other). You will have time to refine your idea over the next few weeks. Save this document as you will submit it next week as a part of week 2 materials!

Deadline: Please complete this by Thursday June 1st @ 11:59 MDT.

Week 1 Materials 

Please carefully read and review the materials below:

Tutorial: Introduction to the National Ecological Observatory Network (NEON)

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: The Importance of Reproducible Science

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre-Institute Week 2: Version Control & Collaborative Science with Git & Git Hub

The goal of the pre-institute materials is to ensure that everyone comes to the Institute ready to work in a collaborative research environment. If you recall, from last week, the four facets of reproducibility are documentationorganizationautomation, and dissemination.

This week we will focus on learning to use tools to help us with these facets: Git and GitHub. The Git Hub environment supports both a collaborative approach to science through code sharing and dissemination, and a powerful version control system that supports both efficient project organization, and an effective way to save your work.

Learning Objectives

After completing these activities, you will be able to:

  • Summarize the key components of a version control system
  • Know how to setup a GitHub account
  • Know how to setup Git locally 
  • Work in a collaborative workflow on GitHub

Week 2 Assignment

The assignment for this week is to revise the Data Institute capstone project summary that you developed last week. You will submit your project summary, with a brief biography to introduce yourself, to a shared GitHub repository.

Please complete this assignment by Thursday June 8th @ 11:59 PM MDT.

If you are familiar with forked repos and pull requests GitHub, and the use of Git in the command line, you may be able to complete the assignment without viewing the
tutorials.

 

Tutorial: Assignment: Version Control with GitHub

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Week 2 Materials 

Please complete each of the short tutorials in this series. 

Series: Version Control with GitHub

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre-Institute Week 3: Documentation of Your Workflow with iPython/Jupyter Notebooks

In week 3, you will use Jupyter Notebooks (formerly iPython Notebooks) to document code and efficiently publish code results & outputs. You will practice your Git skills by publishing your work in the NEON-WorkWithData/DI-NEON-participants GitHub repository.

In addition, you will watch a video that provides an overview of the NEON Vegitation Indices that are available as data products in preparation for Monday's materials. 

Learning Objectives

After completing these activities, you will be able to:

  • Use Jupyter Notebooks to create code with formatted context text
  • Describe the value of documented workflows

Week 3 Assignment, Part 1

Please complete the activity and submit your work to the GitHub repo by Thursday June 17th at 11:59 MDT.

If you are familiar with using Jupyter Notebooks to document your workflow and knitting to HTML then you may be able to complete the assignment without viewing the tutorials.

Tutorial: Assignment: Reproducible Workflows with Jupyter Notebooks

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Week 3 Materials 

Please complete each of the short tutorials in this series. 

Series: Document Your Code with Jupyter Notebooks

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Week 3 Assignment, Part 2

On the first day of the course, we will be working with hyperspectral data. Various indices, including the Normalized Difference Vegetation Index (NDVI), are common data products from hyperspectral data. In preparation for this content, please watch this video of David Hulslander discussing NEON remote sensing vegetation indices & data products.


Monday: Intro to NEON, HDF5, & Hyperspectral Data

Learning Objectives

After completing these activities, you will be able to:

  • Open and work with raster data stored in HDF5 format in Python
  • Explain the key components of the HDF5 data structure (groups, datasets and attributes)
  • Open and use attribute data (metadata) from an HDF5 file in Python

Morning: Intro to NEON & HDF5

All activities are held in the the Classroom unless otherwise noted.

Time Topic Instructor/Location
8:00 Welcome & Introductions
8:30 Introduction to the National Ecological Observatory Network Megan Jones
9:00 Introduction to NEON AOP (download presentation) Nathan Leisso
9:30 NEON RGB Imagery (download presentation) Bill Gallery
10:00 Introduction to the HDF5 File Format (download PDF) Ted Haberman
10:30 BREAK
10:45 NEON Tour
12:00 LUNCH Classroom/Patio

Afternoon: Hyperspectral Remote Sensing

Time Topic Instructor/Location
13:00 An Introduction to Hyperspectral Remote Sensing (related video) Tristan Goulden
13:20 Work with Hyperspectral Remote Sensing data & HDF5
Explore NEON HDF5 format with Viewer Tristan
NEON AOP Hyperspectral Data in HDF5 format with Python Bridget Hass
Band Stacking, RGB & False Color Images, and Interactive Widgets in Python Bridget
15:00 BREAK
Plot Spectral Signatures Bridget
Calculate NDVI Bridget
Calculate Other Indices; Small Group Coding Megan
17:30 GitHub Workflow Naupaka Zimmerman
18:00 End of Day Wrap Up Megan

Additional Information

This morning, we will be touring the NEON facilities including several labs. Please wear long pants and close-toed shoes to conform to lab safety standards. Many individuals find the temperature of the classroom where the Data Institute is held to be cooler than they prefer. We recommend you bring a sweater or light jacket with you. You will have the opportunity to eat your lunch on an outdoor patio - hats, sunscreen, and sunglasses may be appreciated.

Additional Resources

Participants looking for more background on the HDF5 format may find these tutorials useful.

During the 2016 Data Institute, Dr. Dave Schimel gave a presentation on the importance of "Big Data, Open Data, and Biodiversity" and is very much related to the themes of this Data Institute. If interested you can watch the video here.


Tuesday: Lidar Data & Reproducible Workflows

In the morning, we will focus on data workflows, organization and automation as a means to write more efficient, usable code. Later, we will review the basics of discrete return and full waveform lidar data. We will then work with some NEON lidar derived raster data products.

Learning Objectives

After completing these activities, you will be able to:

  • Explain the difference between active and passive sensors.
  • Explain the difference between discrete return and full waveform LiDAR.
  • Describe applications of LiDAR remote sensing data in the natural sciences.
  • Describe several NEON LiDAR remote sensing data products.
  • Explain why modularization is important and supports efficient coding practices.
  • How to modularize code using functions.
  • Integrate basic automation into your existing data workflow.

Morning: Reproducible Workflows

Time Topic Instructor/Location
8:00 Automate & Modularize Workflows Naupaka Zimmerman
10:30 BREAK
10:45 Automate & Modularize Workflows, cont.
12:00 LUNCH Classroom/Patio

Afternoon: Lidar

Time Topic Instructor/Location
13:00 An Introduction to Discrete Lidar (video) Tristan Goulden
An Introduction to Waveform Lidar (related video) Keith Krause
OpenTopography as a Data Source (download PDF) Benjamin Gross
14:00 Rasters & TIFF tags Tristan
14:15 Classify a Raster using Threshold Values Bridget
Mask a Raster using Threshold Values Bridget
Create a Hillshade from a Terrain Raster in Python Bridget
15:00 BREAK
15:15 Lidar Small Group Coding Activity Tristan & Bridget
18:00 End of Day Wrap Up Megan Jones

Wednesday: Comparing Ground to Airborne – Uncertainty

Today, we will focus on the importance of uncertainty when using remote sensing data.

Learning Objectives

After completing these activities, you will be able to:

  • Measure the differences between a metric derived from remote sensing data and the same metric derived from data collected on the ground.
Time Topic Instructor/Location
8:00 Uncertainty & Lidar Data Presentation (video) Tristan Gouldan
8:40 Exploring Uncertainty in LiDAR Data Tristan
10:30 BREAK
10:45 Lidar Uncertainty cont. Tristan
12:00 LUNCH Classroom/Patio
13:00 Spectral Calibration & Uncertainty Presentation (video) Nathan Leisso
13:30 Hyperspectral Variation Uncertainty Analysis in Python Tristan
Assessing Spectrometer Accuracy using Validation Tarps with Python Tristan
15:00 BREAK
15:50 Uncertainty in BRDF Flight Data Products at Three Locations presentation Amanda Roberts
Hyperspectral Uncertainty cont. Tristan
18:00 End of Day Wrap Up Megan Jones

Thursday: Combining External Sensors & Applications

On Thursday, we will begin to think about the different types of analysis that we can do by fusing LiDAR and hyperspectral data products.

Learning Objectives

After completing these activities, you will be able to:

  • Classify different spectra from a hyperspectral data product
  • Map the crown of trees from hyperspectral & lidar data
  • Calculate biomass of vegetation
Time Topic Instructor/Location
8:00 Applications of Remote Sensing Paul Gader
9:00 NEON Vegetation Data (related video, download PDF ) Katie Jones
NEON Foliar Chemistry & Soil Chemistry Data and Microbial Data (video) Samantha Weintraub
9:40 Classification of Spectra Paul
Classification of Hyperspectral Data with Ordinary Least Squares in Python (download PDF ) Paul
Classification of Hyperspectral Data with Principal Components Analysis in Python (download PDF ) Paul
Classification of Hyperspectral Data with SciKit & SVM in Python (download PDF ) Paul
10:30 BREAK
10:45 Classification of Spectra, cont. Paul
12:00 LUNCH Classroom/Patio
13:00 Tree Crown Mapping Paul
15:00 BREAK
15:15 Biomass Calculations Tristan Goulden
17:30 Capstone Brainstorm & Group Selection Megan Jones

Friday: Applications in Remote Sensing

Today, you will use all of the skills you've learned at the Institute to work on a group project that uses NEON or related data!

Learning Objectives

During this activity you will:

  • Apply the skills that you have learned to process data using efficient coding practices.
  • Apply your understanding of remote sensing data and use it to address a science question of your choice.
  • Implement version control and collaborate with your colleagues through the GitHub platform.
Time Topic Location
9:00 Groups begin work on capstone projects Breakout rooms
Instructors available on an as needed basis for consultation & help
12:00 Lunch Classroom/Patio
Groups continue to work on capstone projects Breakout rooms
16:30 End of day wrap up Classroom
18:00 Time to leave the building (if group opts to work after wrap up)

Additional Resources


Saturday: Data Institute Capstone Project Presentations

Time Topic Instructor
9:00 Presentations Start
11:30 Final Questions & Institute Debrief
12:00 Lunch
13:00 End

Add new comment

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Find more workshops
Institute Overview
Workshop Materials
Dialog content.