NEON Data Institute 2018: Remote Sensing with Reproducible Workflows using Python

July 09, 2018 - July 14, 2018
National Ecological Observatory Network

Data Institute Overview

The 2018 Institute focuses on remote sensing of vegetation using open source tools and reproducible science workflows -- the primary programming language will be Python.

Through data intensive live-coding, short presentations, and small group work, we will cover topics including:

  • Background theoretical concepts related to LiDAR and hyperspectral remote sensing
  • Fundamental concepts required to ingest, visualize, process, and analyze NEON hyperspectral and LiDAR data.
  • Best practices on reproducible research workflows: the importance of documentation, organization, version control, and automation.
  • Scientific spatio-temporal applications of remote sensing data using open-source tools, namely Python and Jupyter Notebooks.
  • Machine learning for prediction of biophysical variables such as above-ground biomass using NEON LiDAR and ground measurements.
  • Classification of hyperspectral data using deep-learning approaches.
  • Using remote sensing data products with in situ data to quantify uncertainty associated with remote sensing observations.

This Institute will be held at the NEON project headquarters 9-14 July 2018. In addition to the six days of in-person training, there are three weeks of pre-institute materials is to ensure that everyone comes to the Institute ready to work in a collaborative research environment. Pre-institute materials are online & individually paced, expect to spend 1-5 hrs/week depending on your familiarity with the topic.

Schedule

Please note that slight changes may be made to the schedule of the Data Institute.

Time Day Description
-- 1 - 7 June Computer Setup Materials
-- 8 -14 June Intro to NEON & Reproducible Science
-- 15 - 21 June Version Control & Collaborative Science with Git & GitHub
-- 22 June - 5 July Documentation of Your Workflow with Jupyter Notebooks – due to 4 July holiday this will be a 2-week interval
-- 9 - 14 July Data Institute
7:50am - 6:30 pm Monday Working with HDF5 & Hyperspectral Remote Sensing
8:00am - 6:30pm Tuesday Reproducible & Automated Workflows, Working with LiDAR data
8:00am - 6:30pm Wednesday Remote Sensing Uncertainty
8:00am - 6:30pm Thursday Measuring Vegetation & Working at Scale with CyVerse
8:00am - 6:30pm Friday Hyperspectral Classification/Waveform Lidar & Applications in Remote Sensing
9:00am - 5:00pm Saturday Applications cont. & Presentations

Instructors

Dr. Tristan Goulden, Associate Scientist-Airborne Platform, Battelle NEON Project: Tristan is a Remote Sensing Scientist at NEON specializing in LiDAR. He also co-leads NEON’s Remote Sensing IPT (integrated product team) which focusses on developing algorithms and associated documentation for all of NEON’s remote sensing data products.  His past research focus has been on characterizing uncertainty in LiDAR observations/processing and propagating the uncertainty into downstream data products. During his PhD, he focused on developing uncertainty models for topographic attributes (elevation, slope, aspect), hydrological products such as watershed boundaries, stream networks, as well as stream flow and erosion at the watershed scale. His past experience in LiDAR has included all aspects of the LIDAR workflow including; mission planning, airborne operations, processing of raw data, and development of higher level data products. During his graduate research he applied these skills on LiDAR flights over several case study watersheds of study as well as some amazing LiDAR flights over the Canadian Rockies for monitoring change of alpine glaciers. His software experience for LiDAR processing includes Applanix’s POSPac MMS, Optech’s LMS software, Riegl’s LMS software, LAStools, Pulsetools,  TerraScan, QT Modeler, ArcGIS, QGIS, Surfer, and self-written scripts in Matlab for point-cloud, raster, and waveform processing.
 

Bridget Hass, Remote Sensing Data Processing Technician, Battelle NEON Project: Bridget’s daily work includes processing LiDAR and hyperspectral data collected by NEON's Aerial Observation Platform (AOP).  Prior to joining NEON, Bridget worked in marine geophysics as a shipboard technician and research assistant. She is excited to be a part of producing NEON's AOP data and to share techniques for working with this data during the 2018 Data Institute.

Dr. Naupaka Zimmerman, Assistant Professor of Biology, University of San Francisco: Naupaka’s research focuses on the microbial ecology of plant-fungal interactions. Naupaka brings to the course experience and enthusiasm for reproducible workflows developed after discovering how challenging it is to keep track of complex analyses in his own dissertation and postdoctoral work. As a co-founder of the International Network of Next-Generation Ecologists and an instructor and lesson maintainer for Software Carpentry and Data Carpentry, Naupaka is very interested in providing and improving training experiences in open science and reproducible research methods.

Dr. Tyson Swetnam, Science Informatician with CyVerse and Research Associate with Bio5 Institute at the University of Arizona: Tyson’s recent research focuses on geomorphology and biogeochemical cycling and involves collaborations with many groups including the University of Utah, the Agricultural Research Service Southwest Watershed Research Center, the Arizona Remote Sensing Center, and Santa Rita Experimental Range. With CyVerse, Tyson is working to deploy the Spatial Data Infrastructure (SDI) for life science and agricultural research. He also works closely with the NSF Critical Zone Observatory Network, OpenTopography, and XSEDE deploying scalable GIS applications running on CyVerse resources. In the past he has collaborated with numerous state and federal agencies on geospatial research projects and is always on the lookout for new collaborations in both the public and private sector.  You can follow Tyson on YouTube and Twitter!

Registration & Logistics

Registration for the Data Institute is now closed. Applications for future institutes open in January or February and are annouced in the Upcoming Events section. 

Read here for more information on the logistics of the Data Institute. 

This page includes all of the materials needed for the Data Institute including the pre-institute materials. Please use the sidebar menu to find the appropriate week or day.  If you have problems with any of the materials please email us or use the comments section at the bottom of the appropriate page.  

Please note that slight changes may be made to the schedule of the Data Institute.


Pre-Institute: Computer Set Up Materials 

It is important that you have your computer setup, prior to diving into the pre-institute materials in week 2 on 15 June, 2018! Please review the links below to setup the laptop you will be bringing to the Data Institute.

Let's Get Your Computer Setup!

Go to each of the following tutorials and complete the directions to set your computer up for the Data Institute.  On Thursday of the Data Institue, we will focus on high performance/cloud computing with Tyson Swetnam from CyVerse.  In order to have access to these materials, please also complete the necessary downloads/account creations in the section, Prerequisites -> Downloads, access, and services (only), found in Tyson's materials

Tutorial: Install Git, Bash Shell, Python

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Set up GitHub Working Directory - Quick Intro to Bash

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Install QGIS & HDF5View

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: Data Institute 2018: Download the Data

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre Week 1: Introduction to NEON & Reproducible Science

In the first week of the pre-institute activities, we will review the NEON project. We will also provide you with a general overview of reproducible science. Over the next few weeks will we ask you to review materials and submit something that demonstrates you have mastered the materials.

Learning Objectives

After completing these activities, you will be able to:

  • Describe the NEON project, the data collected, and where to access more information about the project. 
  • Know how to access other code resources for working with NEON data. 
  • Explain why reproducible workflows are useful and important in your research. 

Week 1 Assignment

After reviewing the materials below, please write up a summary of a project that you are interested working on at the Data Institute. Be sure to consider what data you will need (NEON or other). You will have time to refine your idea over the next few weeks. Save this document as you will submit it next week as a part of week 2 materials!

Deadline: Please complete this by Thursday June 14th @ 11:59 MDT.

Week 1 Materials 

Please carefully read and review the materials below:

NEON Introduction will be posted by 7 June

Tutorial: Introduction to the National Ecological Observatory Network (NEON)

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Tutorial: The Importance of Reproducible Science

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre Week 2: Version Control & Collaborative Science  

The goal of the pre-institute materials is to ensure that everyone comes to the Institute ready to work in a collaborative research environment. If you recall, from last week, the four facets of reproducibility are documentationorganizationautomation, and dissemination.

This week we will focus on learning to use tools to help us with these facets: Git and GitHub. The Git Hub environment supports both a collaborative approach to science through code sharing and dissemination, and a powerful version control system that supports both efficient project organization, and an effective way to save your work.

Learning Objectives

After completing these activities, you will be able to:

  • Summarize the key components of a version control system
  • Know how to setup a GitHub account
  • Know how to setup Git locally 
  • Work in a collaborative workflow on GitHub

Week 2 Assignment

The assignment for this week is to revise the Data Institute capstone project summary that you developed last week. You will submit your project summary, with a brief biography to introduce yourself, to a shared GitHub repository.

Please complete this assignment by Thursday June 21st @ 11:59 PM MDT.

If you are familiar with forked repos and pull requests GitHub, and the use of Git in the command line, you may be able to complete the assignment without completing all tutorials in the series.

 

Tutorial: Assignment: Version Control with GitHub

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.

Series: Version Control with GitHub

You can choose to do this optional data activity (opens in a new window), or continue to the next lesson plan.


Pre-Institute Week 3: Documentation of Your Workflow 

In week 3, you will use Jupyter Notebooks (formerly iPython Notebooks) to document code and efficiently publish code results & outputs. You will practice your Git skills by publishing your work in the NEONScience/DI-NEON-participants GitHub repository.  While working with the notebooks, you will also learn about NEON RGB imagery. 

In addition, you will watch a video that provides an overview of the NEON Vegitation Indices that are available as data products in preparation for Monday's materials. 

Learning Objectives

After completing these activities, you will be able to:

  • Use Jupyter Notebooks to create code with formatted context text
  • Describe the value of documented workflows
  • Plot a NEON RGB Camera Tile
  • Plot a Histogram of a Single Band of an RGB Camera Tile

NEON RGB Camera Imagery

To prepare for the introduction to working with NEON's RGB camera imagery data in this week's lessons. Please watch this 18 minute long video by NEON project scientist Bill Gallery about these data. 

Jupyter Notebooks 

Please go through the following series to learn to work with Jupyter Notebooks. 

Document Your Code with Jupyter Notebooks Tutorial Series

Assignment

The assignment this week is to create a Jupyter Notebook that show's how to plot NEON RGB camera data.  Use the following tutorial (along with the previous series on Notebooks) to create this Notebook. Once done you will convert your Notebook to a PDF for easy dissemination of your results. If you have challenges converting to PDF, try to convert to html.  

When done, submit your Notebook and PDF/html files to the GitHub participants/2018-RemoteSensing/pre-institute3-Jupyter directory. Review the week 2 materials if you have would like a refresher on GitHub and the commands associated with adding, committing, pushing, and making a pull request.

The week 3 assignment is due at 11:59pm on 5 July 2018. 

Plotting a NEON RGB Camera Image (GeoTIFF) in Python

Trouble starting Python? In your Command Prompt/Terminal window, activate the your desired Python version. For the 2018 Data Institute, we will use Python 3.5 and in the set up created this environment as "p35".  

Windows:
activate p35
Mac:
source activate p35
 
Then start the Jupyter Notebook session. 
jupyter notebook
 
You should now be able to navigate to where you want to save your new Jupyter Notebook for the tutorial. This tutorial is set up to use your Python 3.5 kernel.  Directions for setting up this kernel were in the Introduction to Using Jupyter Notebooks tutorial.
 

Remote Sensing Indicies

On Monday of the Data Institute, we will work with hyperspectral remote sensing data.  In the afternoon, we will work in groups to create scripts to calculate various indices from the NEON hyperspectral data.  To prepare for this section, you may want to watch the following presentation (29 minutes) by NEON project scientist David Hulslander on the remote sensing indices and those calculated as part of NEON Data Products. 


Monday: HDF5 & Hyperspectral Data

Learning Objectives

After completing these activities, you will be able to:

  • Open and work with raster data stored in HDF5 format in Python
  • Explain the key components of the HDF5 data structure (groups, datasets and attributes)
  • Open and use attribute data (metadata) from an HDF5 file in Python

All activities are held in the the Classroom unless otherwise noted.

Time Topic Instructor/Location
7:45 Arrive at NEON to be ready for start at 8:00
8:00 Welcome & Introductions
8:45 NEON AOP Logistics Tristan Goulden
9:15 BREAK
9:30 NEON Tour
11:00 SHORT BREAK
11:05 Fundamentals of Hyperspectral Remote Sensing & HDF5 format (related video) David Hulslander
11:30 Explore NEON HDF5 format with Viewer Tristan Goulden
12:00 LUNCH Classroom/Patio
13:00 Work with Hyperspectral Remote Sensing data & HDF5 Bridget Hass
NEON AOP Hyperspectral Data in HDF5 format with Python - Tiled Data
Band Stacking, RGB & False Color Images, and Interactive Widgets in Python - Tiled Data
Plot a Spectral Signature in Python - Tiled Data
Calculate NDVI & Extract Spectra Using Masks in Python - Tiled Data
15:15 BREAK
15:30 Calculate Other Indices; Small Group Coding Megan Jones
16:30 Reproducible Workflows, part I (associated GitHub repo) Naupaka Zimmerman
17:30 End of Day Wrap Up Megan Jones

Additional Information

  • In the morning, we will be touring the NEON facilities including several labs. Please wear long pants and close-toed shoes to conform to lab safety standards.
  • Many individuals find the temperature of the classroom where the Data Institute is held to be cooler than they prefer. We recommend you bring a sweater or light jacket with you.
  • You will have the opportunity to eat your lunch on an outdoor patio - hats, sunscreen, and sunglasses may be appreciated.

Additional Resources

  • Participants looking for more background on the HDF5 format may find this tutorial useful: Hierarchical Data Formats - What is HDF5? tutorial
  • During the 2016 Data Institute, Dr. Dave Schimel gave a presentation on the importance of "Big Data, Open Data, and Biodiversity" and is very much related to the themes of this Data Institute. If interested, you can watch the video here.

Tuesday: Reproducible Workflows & Lidar

In the morning, we will focus on data workflows, organization and automation as a means to write more efficient, usable code. Later, we will review the basics of discrete return and full waveform lidar data. We will then work with some NEON lidar derived raster data products.

Learning Objectives

After completing these activities, you will be able to:

  • Explain the difference between active and passive sensors.
  • Explain the difference between discrete return and full waveform LiDAR.
  • Describe applications of LiDAR remote sensing data in the natural sciences.
  • Describe several NEON LiDAR remote sensing data products.
  • Explain why modularization is important and supports efficient coding practices.
  • How to modularize code using functions.
  • Integrate basic automation into your existing data workflow.
Time Topic Instructor/Location
8:00 Reproducible Workflows, part II (associated GitHub repo) Naupaka Zimmerman
10:00 BREAK
10:15 Reproducible Workflows, cont. Naupaka Zimmerman
12:00 LUNCH Classroom/Patio
13:00 An Introduction to Discrete Lidar (related video) Tristan Goulden
13:20 An Introduction to Waveform Lidar (related video) Keith Krause
13:40 Rasters & TIFF tags Tristan Goulden
14:00 Working with Lidar Data Bridget Hass
Classify a Raster using Threshold Values
Merge GeoTIFF Files
Extra: Mask a Raster Using Threshold Values in Python
Extra: Create a Hillshade from a Terrain Raster in Python
15:00 BREAK
15:00 Lidar, cont.
16:30 Lidar Small Group Coding Activity Tristan & Bridget
17:30 End of Day Wrap Up Megan Jones

Wednesday: Uncertainty

Today, we will focus on the importance of uncertainty when using remote sensing data.

Learning Objectives

After completing these activities, you will be able to:

  • Measure the differences between a metric derived from remote sensing data and the same metric derived from data collected on the ground.
Time Topic Instructor/Location
8:00 Uncertainty & Lidar Data Presentation (related video) Tristan Gouldan
8:30 Exploring Uncertainty in LiDAR Data Tristan Goulden
10:30 BREAK
10:45 Lidar Uncertainty cont. Tristan Goulden
12:00 LUNCH Classroom/Patio
13:00 Spectral Calibration & Uncertainty Presentation (video)
13:30 Hyperspectral Variation Uncertainty Analysis in Python Tristan Goulden
Assessing Spectrometer Accuracy using Validation Tarps with Python Tristan Goulden
15:00 BREAK
15:15 Hyperspectral Validation Tristan Goulden
16:15 Small group coding activity
17:30 End of Day Wrap Up Megan Jones

Thursday: Vegetation & HPC

On Thursday, we will begin to think about the different types of analysis that we can do by fusing LiDAR and hyperspectral data products.

Learning Objectives

After completing these activities, you will be able to:

  • Classify different spectra from a hyperspectral data product
  • Map the crown of trees from hyperspectral & lidar data
  • Calculate biomass of vegetation
Time Topic Instructor/Location
8:00 Vertical scaling of remote sensing at NEON core sites: from handheld cameras to EOS Tyson Swetnam
9:15 Phenocam Network & Data Bijan Seyednasrollah
9:40 NEON Vegetation Structure Data Natalie Robinson
10:15 BREAK
10:30 Biomass Calc. & Tree crown mapping Tristan Goulden
12:00 LUNCH Classroom/Patio
13:00 Scaling Up: Using HPC/CyVerse platforms Tyson Swetnam
15:00 BREAK
15:15 Scaling Up, cont. Tyson Swetnam
17:45 Capstone Brainstorm & Group Selection Megan Jones

Additional Resources

This year we will primarily focus on NEON Vegetation Structure data, however, NEON offers a wide variety of data on vegetation. For an overview of the observational sampling of vegetation data watch these videos:


Friday: Applications in Remote Sensing

Today, we will break into two group in the morning to work with waveform lidar or hyperspectral classification. Then in the afternoon, you will use all of the skills you've learned at the Institute to work on a group project that uses NEON or related data!

Learning Objectives

During this activity you will:

  • Apply the skills that you have learned to process data using efficient coding practices.
  • Apply your understanding of remote sensing data and use it to address a science question of your choice.
  • Implement version control and collaborate with your colleagues through the GitHub platform.
Time Topic Location
8:00 Concurrent sessions
Working with waveform lidar Tristan Goulden - Longs Peak
Hyperspectral Classification Bridget Hass - Classroom
Unsupervised Spectral Classification in Python: KMeans & PCA
Unsupervised Spectral Classification in Python: Endmember Extraction
10:30 BREAK
10:45 Accessing NEON data Megan Jones
12:00 LUNCH Classroom/Patio
13:00 Groups begin work on capstone projects Breakout rooms
Instructors available on an as needed basis for consultation & help
16:30 End of day wrap up Classroom
18:00 Time to leave the building (if group opts to work after wrap up)

Additional Resources

For additional information on Classification of Hyperspectral Data with a variety of methods, please refer to the content taught by Dr. Paul Gader during the 2017 Data Institute.


Saturday: Applications in Remote Sensing

Time Topic Instructor
9:00 Groups continue on capstone projects Breakout rooms
Instructors available on an as needed basis for consultation & help
12:00 LUNCH Classroom/Patio
13:00 Presentations Start Classroom
16:30 Final Questions & Institute Debrief Classroom
17:00 End

Comments

Will a similar course using R be offered in the future?

Yes, we do intend to offer a similar course in R in future years. We do offer materials from a previous R based Remote Sensing with Reproducible Workflows Data Institute ( http://www.neonscience.org/neon-data-institute-2016 ). Please note that the materials using data in an HDF5 format are using a previous NEON format. We are in the process of updating these materials to the current HDF5 format. Wade, thanks for your interest.

Thanks for the reply. I found the NEON data skills (raster data with R ) very useful. A workshop would be great.

Leave a comment

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Find more workshops
Institute Overview
Workshop Materials
Dialog content.