Skip to main content
NSF NEON, Operated by Battelle

Main navigation

  • About
    • NEON Overview
      • Vision and Management
      • Spatial and Temporal Design
      • History
    • About the NEON Biorepository
      • ASU NEON Biorepository Staff
      • Contact the NEON Biorepository
    • Observatory Blog
    • Newsletters
    • Staff
    • FAQ
    • User Accounts
    • Contact Us

    About

  • Data
    • Data Portal
      • Data Availability Charts
      • API & GraphQL
      • Prototype Data
      • Externally Hosted Data
    • Data Collection Methods
      • Airborne Observation Platform (AOP)
      • Instrument System (IS)
        • Instrumented Collection Types
        • Aquatic Instrument System (AIS)
        • Terrestrial Instrument System (TIS)
      • Observational System (OS)
        • Observation Types
        • Observational Sampling Design
        • Sampling Schedules
        • Taxonomic Lists Used by Field Staff
        • Optimizing the Observational Sampling Designs
      • Protocols & Standardized Methods
    • Getting Started with NEON Data
      • neonUtilities for R and Python
      • Learning Hub
      • Code Hub
    • Using Data
      • Data Formats and Conventions
      • Released, Provisional, and Revised Data
      • Data Product Bundles
      • Usage Policies
      • Acknowledging and Citing NEON
      • Publishing Research Outputs
    • Data Notifications
    • NEON Data Management
      • Data Availability
      • Data Processing
      • Data Quality

    Data

  • Samples & Specimens
    • Biorepository Sample Portal at ASU
    • About Samples
      • Sample Types
      • Sample Repositories
      • Megapit and Distributed Initial Characterization Soil Archives
    • Finding and Accessing Sample Data
      • Species Checklists
      • Sample Explorer - Relationships and Data
      • Biorepository API
    • Requesting and Using Samples
      • Loans & Archival Requests
      • Sample Guidelines and Policies

    Samples & Specimens

  • Field Sites
    • Field Site Map and Info
    • Spatial Data Layers & Maps

    Field Sites

  • Resources
    • Getting Started with NEON Data
    • Research Support Services
      • Field Site Coordination
      • Letters of Support
      • Permits and Permissions
      • AOP Flight Campaigns
      • Research Support FAQs
      • Research Support Projects
    • Code Hub
      • neonUtilities for R and Python
      • Code Resources Guidelines
      • Code Resources Submission
      • NEON's GitHub Organization Homepage
    • Learning Hub
      • Tutorials
      • Workshops & Courses
      • Science Videos
      • Teaching Modules
    • Science Seminars and Data Skills Webinars
    • Document Library
    • Funding Opportunities

    Resources

  • Impact
    • Research Highlights
    • Papers & Publications
    • NEON in the News

    Impact

  • Get Involved
    • Upcoming Events
    • Research and Collaborations
      • Environmental Data Science Innovation and Inclusion Lab
      • Collaboration with DOE BER User Facilities and Programs
      • EFI-NEON Ecological Forecasting Challenge
      • NEON Great Lakes User Group
      • NCAR-NEON-Community Collaborations
    • Advisory Groups
      • Science, Technology & Education Advisory Committee (STEAC)
      • Innovation Advisory Committee (IAC)
      • Technical Working Groups (TWG)
    • NEON Ambassador Program
      • Exploring NEON-Derived Data Products Workshop Series
    • Partnerships
    • Community Engagement
    • Work Opportunities

    Get Involved

  • My Account
  • Search

Search

Learning Hub

  • Tutorials
  • Workshops & Courses
  • Science Videos
  • Teaching Modules

Breadcrumb

  1. Resources
  2. Learning Hub
  3. Workshops & Courses
  4. NEON Brownbag: Intro to HDF5 at NEON

Workshop

NEON Brownbag: Intro to HDF5 at NEON

NEON

June 4, 2015

Share

This NEON internal brownbag introduces the concept of Hierarchical Data Formats in the context of developing the NEON HDF5 operational file format. Look here to discover resources on HDF5, code snippets in R, Python and Matlab to use H5 files and some example H5 files for Remote Sensing Hyperspectral data and time series temperature data.

Things to do before the workshop

Please review, download and setup the following, prior to attending the brownbag.

###Data to Download

[[nid:6329]] [[nid:6330]]

Download the Free H5 Viewer

The free H5 viewer will allow you to explore H5 data, using a graphic interface. HDF5 viewer can be downloaded from this page. More details on how to set up HDF5Viewer are at the end of this page.

Background Information

  • What is HDF5? A general overview.

Schedule

Time Topic
12:00 Hand-on exploration of the HDF5 Data Format
12:20 Working with HDF5 in Python - live demo.
12:30 NEON HDF5 Format - what's next

Instructors

  • David Hulslander
  • Josh Elliot
  • Leah A. Wasser
  • Tristan Goulden

Additional Set Up Instructions

Install HDFView

The free HDFView application allows you to explore the contents of an HDF5 file.

To install HDFView:

  1. Click to go to the download page.

  2. From the section titled HDF-Java 2.1x Pre-Built Binary Distributions select the HDFView download option that matches the operating system and computer setup (32 bit vs 64 bit) that you have. The download will start automatically.

  3. Open the downloaded file.

  • Mac - You may want to add the HDFView application to your Applications directory.
  • Windows - Unzip the file, open the folder, run the .exe file, and follow directions to complete installation.
  1. Open HDFView to ensure that the program installed correctly.
**Data Tip:** The HDFView application requires Java to be up to date. If you are having issues opening HDFView, try to update Java first!
Time Topic
12:00 Hand-on exploration of the HDF5 Data Format
12:20 Working with HDF5 in Python - live demo.
12:30 NEON HDF5 Format - what's next

Hierarchical Data Formats - What is HDF5?

Learning Objectives

After completing this tutorial, you will be able to:

  • Explain what the Hierarchical Data Format (HDF5) is.
  • Describe the key benefits of the HDF5 format, particularly related to big data.
  • Describe both the types of data that can be stored in HDF5 and how it can be stored/structured.

About Hierarchical Data Formats - HDF5

The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data. HDF5 uses a "file directory" like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer. The HDF5 format also allows for embedding of metadata making it self-describing.

**Data Tip:** HDF5 is one hierarchical data format, that builds upon both HDF4 and NetCDF (two other hierarchical data formats). Read more about HDF5 here.
Organizations use HDF5 for various data, access, computing, and networking needs
Why Use HDF5. Source: The HDF5 Group

Hierarchical Structure - A file directory within a file

The HDF5 format can be thought of as a file system contained and described within one single file. Think about the files and folders stored on your computer. You might have a data directory with some temperature data for multiple field sites. These temperature data are collected every minute and summarized on an hourly, daily and weekly basis. Within one HDF5 file, you can store a similar set of data organized in the same way that you might organize files and folders on your computer. However in a HDF5 file, what we call "directories" or "folders" on our computers, are called groups and what we call files on our computer are called datasets.

2 Important HDF5 Terms

  • Group: A folder like element within an HDF5 file that might contain other groups OR datasets within it.
  • Dataset: The actual data contained within the HDF5 file. Datasets are often (but don't have to be) stored within groups in the file.
An illustration of a HDF5 file structure which contains groups, datasets and associated metadata
An example HDF5 file structure which contains groups, datasets and associated metadata.

An HDF5 file containing datasets, might be structured like this:

The HDF5 illustration from above but the groups are NEON sites and sensor types and datasets are included under sensor types
An example HDF5 file structure containing data for multiple field sites and also containing various datasets (averaged at different time intervals).

HDF5 is a Self Describing Format

HDF5 format is self describing. This means that each file, group and dataset can have associated metadata that describes exactly what the data are. Following the example above, we can embed information about each site to the file, such as:

  • The full name and X,Y location of the site
  • Description of the site.
  • Any documentation of interest.

Similarly, we might add information about how the data in the dataset were collected, such as descriptions of the sensor used to collect the temperature data. We can also attach information, to each dataset within the site group, about how the averaging was performed and over what time period data are available.

One key benefit of having metadata that are attached to each file, group and dataset, is that this facilitates automation without the need for a separate (and additional) metadata document. Using a programming language, like R or Python, we can grab information from the metadata that are already associated with the dataset, and which we might need to process the dataset.

An illustration of a HDF5 file structure with a group that contains two datasets and all associated metadata
HDF5 files are self describing - this means that all elements (the file itself, groups and datasets) can have associated metadata that describes the information contained within the element.

Compressed & Efficient subsetting

The HDF5 format is a compressed format. The size of all data contained within HDF5 is optimized which makes the overall file size smaller. Even when compressed, however, HDF5 files often contain big data and can thus still be quite large. A powerful attribute of HDF5 is data slicing, by which a particular subsets of a dataset can be extracted for processing. This means that the entire dataset doesn't have to be read into memory (RAM); very helpful in allowing us to more efficiently work with very large (gigabytes or more) datasets!

Heterogeneous Data Storage

HDF5 files can store many different types of data within in the same file. For example, one group may contain a set of datasets to contain integer (numeric) and text (string) data. Or, one dataset can contain heterogeneous data types (e.g., both text and numeric data in one dataset). This means that HDF5 can store any of the following (and more) in one file:

  • Temperature, precipitation and PAR (photosynthetic active radiation) data for a site or for many sites
  • A set of images that cover one or more areas (each image can have specific spatial information associated with it - all in the same file)
  • A multi or hyperspectral spatial dataset that contains hundreds of bands.
  • Field data for several sites characterizing insects, mammals, vegetation and meteorology.
  • A set of images that cover one or more areas (each image can have unique spatial information associated with it)
  • And much more!

Open Format

The HDF5 format is open and free to use. The supporting libraries (and a free viewer), can be downloaded from the HDF Group website. As such, HDF5 is widely supported in a host of programs, including open source programming languages like R and Python, and commercial programming tools like Matlab and IDL. Spatial data that are stored in HDF5 format can be used in GIS and imaging programs including QGIS, ArcGIS, and ENVI.

Summary Points - Benefits of HDF5

  • Self-Describing The datasets with an HDF5 file are self describing. This allows us to efficiently extract metadata without needing an additional metadata document.
  • Supporta Heterogeneous Data: Different types of datasets can be contained within one HDF5 file.
  • Supports Large, Complex Data: HDF5 is a compressed format that is designed to support large, heterogeneous, and complex datasets.
  • Supports Data Slicing: "Data slicing", or extracting portions of the dataset as needed for analysis, means large files don't need to be completely read into the computers memory or RAM.
  • Open Format - wide support in the many tools: Because the HDF5 format is open, it is supported by a host of programming languages and tools, including open source languages like R and Python and open GIS tools like QGIS.

HDFView: Exploring HDF5 Files in the Free HDFview Tool

In this tutorial you will use the free HDFView tool to explore HDF5 files and the groups and datasets contained within. You will also see how HDF5 files can be structured and explore metadata using both spatial and temporal data stored in HDF5!

Learning Objectives

After completing this activity, you will be able to:

  • Explain how data can be structured and stored in HDF5 format.
  • Navigate to metadata in an HDF5 file, making it "self describing".
  • Explore HDF5 files using the free HDFView application.

Tools You Will Need

Install the free HDFView application. This application allows you to explore the contents of an HDF5 file easily. Click here to go to the download page.

Data to Download

Download NEON Imaging Spectrometer Data at SJER (2024) - NEON_D17_SJER_DP3_254000_4108000_bidirectional_reflectance.h5

These hyperspectral remote sensing data provide information on the National Ecological Observatory Network's San Joaquin Exerimental Range field site. The data were collected over the San Joaquin field site located in California (Domain 17) and processed at NEON headquarters. The entire dataset can be accessed from the Spectrometer orthorectified surface bidirectional reflectance - mosaic page on the NEON data portal.

Download Reflectance Dataset

Download NEON Eddy Covariance Data at SJER (2024-04-01)

The SAE data were collected by the National Ecological Observatory Network's flux towers at field sites across the US. The entire dataset can be accessed from the Bundled data products - eddy covariance page on the NEON data portal.

Download Eddy Covariance Dataset

Installing HDFView

Select the HDFView download option that matches the operating system (Mac OS X, Windows, or Linux) and computer setup (32 bit vs 64 bit) that you have.

Hierarchical Data Format 5 - HDF5

Hierarchical Data Format version 5 (HDF5), is an open file format that supports large, complex, heterogeneous data. Some key points about HDF5:

  • HDF5 uses a "file directory" like structure.
  • The HDF5 data models organizes information using Groups. Each group may contain one or more datasets.
  • HDF5 is a self describing file format. This means that the metadata for the data contained within the HDF5 file, are built into the file itself.
  • One HDF5 file may contain several heterogeneous data types (e.g. images, numeric data, data stored as strings).

For more introduction to the HDF5 format, see our About Hierarchical Data Formats - What is HDF5? tutorial.

In this tutorial, we will explore two different types of data saved in HDF5. This will allow us to better understand how one file can store multiple different types of data, in different ways.

Part 1: Exploring Hyperspectral Imagery stored in HDF5

Illustration of a NEON site with field scientists on the ground and an airborne observation plane flying above
NEON airborne observation platform.

First, we will explore a hyperspectral dataset, collected by the NEON Airborne Observation Platform (AOP) and saved in HDF5 format. In the hyperpsectral data cubes, each pixel in the dataset contains reflectance values for hundreds of bands (426) collected by the sensor.

A few notes about hyperspectral imagery:

  • An imaging spectrometer, which collects hyperspectral imagery, records light energy reflected off objects on the earth's surface.
  • The data are inherently spatial. Each pixel in the image is located spatially and represents an area of ground on the earth.
  • Similar to an RGB (Red, Green, Blue) camera, an imaging spectrometer records reflected light energy. Each pixel contain several hundred bands of reflectance data.
A hyperspectral resolution graph and a landsat TM resolution graph each showing different reflectance values across wavelengths for five differnt plants
A hyperspectral instrument records reflected light energy across very narrow bands. The NEON Imaging Spectrometer collects 426 bands of information for each pixel on the ground.

Read more about hyperspectral remote sensing data:

  • About Hyperspectral Remote Sensing Data tutorial on this site.

Let's open some hyperspectral imagery stored in HDF5 format to see what the file structure can like for a different type of data.

SJER bidirectional reflectance tile (DP3.30006.002 opened in HDFView)
HDFView for a bidirectional reflectance hdf5 file for SJER

Open the Reflectance H5 file in HDFView

To begin, open the HDFView application.

Within the HDFView application, select File --> Open and navigate to the folder where you saved the NEON_D17_SJER_DP3_254000_4108000_bidirectional_reflectance.h5 file on your computer. Open this file in HDFView.

Open the file and expand the sub-folders. This file is composed of a Reflectance dataset (called Reflectance_Data) along with additional Metadata containing the following sub-folders:

  • Ancillary_Imagery: Datasets including ATCOR inputs and other Quality indicators such as the Weather_Quality_Indicator, containing information about the cloud conditions during the flight (for each pixel).
  • Coordinate_System: geographic information for the dataset.
  • Logs: Log files for each flight line containing ATCOR processing information and inputs, BRDF correction parameters, and the solar azimuth and zenith angles.
  • Spectral_Data: Full Width Half Max (FWHM) and Wavelength for each of the 426 spectral bands.

Let's first look at the metadata stored in the Coordinate_System folder. This group contains all of the spatial information that a GIS program would need to project the data spatially.

Next, double click on the Wavelength dataset. Note that this dataset contains the central wavelength value for each band in the dataset.

Finally, click on the Reflectance_Data dataset. Note that in the metadata for the dataset that the structure of the dataset is 426 x 1000 x 1000 (wavelength, x, y), as indicated in the metadata. Right click on the reflectance dataset and select Open As. Click Image in the "display as" settings on the left hand side of the popup.

SJER bidirectional reflectance tile Dataset Selection
HDFView Reflectance Dataset Selection

Notice an image preview appears on the left of the pop-up window. Click OK to open the image. You may have to play with the brightness and contrast settings in the viewer to see the data properly.

SJER bidirectional reflectance greyscale preview
HDFView Reflectance Preview

Explore the spectral dataset in the HDFViewer taking note of the metadata and data stored within the file.

Part 2: Exploring Surface Atmosphere Exchange (SAE) Data in HDFView

Next, we will look at the SAE bundled eddy covariance h5 data. As in the first part, we will start by opening the h5 file (download from the link at the top of this tutorial) in the viewer to get a better idea of how this data is structured.

Open the Bundled Eddy Covariance H5 file in HDFView

Open the HDFView application. Within the application, select File --> Open and navigate to the folder where you saved the SAE hdf5 file on your computer. Open this file in HDFView.

If you click on the name of the HDF5 file in the left hand window of HDFView, you can view metadata for the file. This will be located in the bottom window of the application.

SJER Eddy Covariance HDF5 File
HDFView Reflectance Dataset Selection

Explore File Structure in HDFView

Next, explore the structure of this bundled eddy covariane file.

Notice at the bottom there is a readMe attribute. If you double click on this, you'll see the text "Net Surface Atmosphere Exchange (NSAE) HDF5 File Structure Description. The NSAE file you downloaded from NEON data portal is in the HDF5 format. This document describes the HDF5 file structure. This file will provide the HDF5 hierarchical layout of the file and a description of each HDF5 group level. The full descriptions of objects can be found in the objDesc data table provided within the HDF5 file. The 'Exploring NEON Eddy-Covariance Data Products in HDF5 file format' document provides a greater level of detail ..."

Documentation for each NEON data product is contained on the respective data product page. It is strongly recommended to peruse the relevant documentation, starting with the Quick Start Guides. The document referenced above in the readMe is linked here: Exploring NEON Eddy-Covariance Data Products in HDF5 file format .

Now that you've read the readMe, and referencing the document above, take a look at the structure of the data in HDFView.

Notice that there are multiple groups (folders) under the SJER root folder starting with dp. Expand these folders by double clicking on the folder icons. These represent the different data product levels, from 01 to 04, as well as level 0 prime.

  • dp01: Level 1
  • dp02: Level 2
  • dp03: Level 3
  • dp04: Level 4
  • dp0p: Level 0 prime

Under each of the levels there is a data folder with subfolders labeled by the data product identification codes as well as quality information (qfqm) and uncertainty (ucrt). Notice that there is also metadata associated with each group.

Within the dp04/data group there are five more groups: fluxCo2, fluxH2o, fluxMome, fluxTemp, and foot. What data are contained within these groups?

**Note:** The data used in this activity were collected by sensors mounted on a National Ecological Observatory Network (NEON) flux tower. Read more about NEON towers here.
Illustration of a NEON tower with arms containing sensors extending horizontally off of the tower structure
A NEON flux tower contains booms or arms that house sensors at varying heights along the tower.

So this is another example of how a NEON HDF5 file is structured. Take some time to explore this HDF5 dataset within the HDFViewer, using the reference document as needed.

Share

NSF NEON, Operated by Battelle

Follow Us:

Join Our Newsletter

Get updates on events, opportunities, and how NEON is being used today.

Subscribe Now

Footer

  • About Us
  • Contact Us
  • Terms & Conditions
  • Careers
  • Code of Conduct

Copyright © Battelle, 2026

The National Ecological Observatory Network is a major facility fully funded by the U.S. National Science Foundation.

Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the U.S. National Science Foundation.