NSF Biocenters unite to close the scientific data skills gap, with a focus on phenology

November 13, 2015

In October, a group of talented education, bioinformatics and science experts from across the U.S. gathered to support the first data lesson hackathon hosted by NEON, in collaboration with Data Carpentry, National Socio-Environmental Synthesis Center (SESYNC) and iPlant Collaborative/CyVerse. These NSF-funded groups coalesced to build a set of workshop materials that teach reproducible scientific workflows and methods using NEON Data, with an overarching goal of supporting open science.

2015 Geospatial hackathon: focus on phenology

Hackathon participants were tasked with creating data lessons that support development of essential coding and data management skills: the science focus of the geospatial hackathon was phenology. Supervising Scientist Leah Wasser, the event’s main organizer and facilitator, kicked it off by pulling a unique dataset that could answer questions about Plant Phenology, the study of year-to-year timing of plant life cycle events such as leaf flush (leaves emerging from leaf buds) in the spring and leaf senescence (leaf color) in the fall. The hackathon dataset combined a Landsat satellite derived NDVI time series (a greenness index), hyperspectral, LiDAR and imagery data collected by the NEON AOP with micrometeorology flux tower data (temperature, precipitation and Photosynthetically Active Radiation - PAR) for D01’s Harvard Forest with data from D17’s San Joaquin Experimental Range for comparison. Leah states, “The idea is to provide a foundation to help people understand how to work with NEON data through reproducible open science workflows in order to support asking broader-scale ecological questions using spatio-temporal data".

It’s all about Jessica, key user of pheno data

If you heard people around the hackathon talking about “Jessica’s” every science-related desire, this was because the hackathon participants took the assigned persona to heart. Personas are fictional characters based on observed or known behavior patterns. When creating education modules, or digital applications in general, developers use a persona to guide the creative process: this tactic helps developers adopt the perspective of a user, thinking through what they might look for, the questions they might ask and the outcomes or products they are seeking. For this hackathon the persona, Jessica, was a first-year graduate student.  

During the hackathon, the group determined that “Jessica” was interested in exploring field site data to understand how phenology of vegetation (greening in the spring summer and senescence in the winter) varies across multiple sites and through time. Learn more about the hackathon.

Evidence of a growing interest to collaborate in the science community

Hackathon participant Marisa Guarinello, from Northwest Knowledge Network at the University of Idaho, loves teaching; while she is new to R, she became interested in data skill building after attending a workshop hosted by Data Carpentry. Hackathon participant Courtney Soderberg, from the Center for Open Science, is interested in the methodological components of building data skills, as well as efforts to incentivize open and reproducible science. Both participants acknowledged that the drive to make the scientific research process, methods and data more transparent and reproducible is increasing: journals are requiring open data, open process, open code and open methodology prior to publication, rather than just providing metadata standards. And the interest and need to collaborate among scientists is growing.

NSF Biocenters unite to address data skills gap

The NSF Biocenters worked together to create, drive and host this hackathon. Mike Smorul at SESYNC says, “Biocenters all have the same challenge of training users on using data. How do we get people out of excel?” In addition, many scientists are accustomed to working with spatial data in a Graphical User Interface (GUI) environment, presenting scaling limitations, as datasets get larger. Hence, scientists develop a need to code to accommodate large spatial datasets. Some overall benefits of hackathons include:

  • Crowdsourcing lessons
  • Post-hackathon data is quality controlled, sized for accessibility, prepared and ready for use relative to the learning objectives
  • Public and free data are hosted on open source platforms like Figshare as a valuable and unique resource for the public and educators
  • Helps new users get started and experienced users ask bigger questions

Lessons created are multi-purpose: they will be taught through Data Carpentry workshops around the world and they will be hosted on the NEON portal in a self-paced format and via NEON workshops.

Learn more

Data lessons are being refined currently. When complete, all data lessons will be available on both the DataCarpentry.org and NEONdataskills.org websites. Data lessons are publicly available in draft, in a shared github repository. Learn more about the lessons under development on the NEON Data Skills portal.