Skip to main content
NSF NEON, Operated by Battelle

Main navigation

  • About
    • NEON Overview
      • Vision and Management
      • Spatial and Temporal Design
      • History
    • About the NEON Biorepository
      • ASU Biorepository Staff
      • Contact the NEON Biorepository
    • Observatory Blog
    • Newsletters
    • Staff
    • FAQ
    • User Accounts
    • Contact Us

    About

  • Data
    • Data Portal
      • Data Availability Charts
      • API & GraphQL
      • Prototype Data
      • Externally Hosted Data
    • Data Collection Methods
      • Airborne Observation Platform (AOP)
      • Instrument System (IS)
        • Instrumented Collection Types
        • Aquatic Instrument System (AIS)
        • Terrestrial Instrument System (TIS)
      • Observational System (OS)
        • Observation Types
        • Observational Sampling Design
        • Sampling Schedules
        • Taxonomic Lists Used by Field Staff
        • Optimizing the Observational Sampling Designs
      • Protocols & Standardized Methods
    • Getting Started with NEON Data
      • neonUtilities for R and Python
      • Learning Hub
      • Code Hub
    • Using Data
      • Data Formats and Conventions
      • Released, Provisional, and Revised Data
      • Data Product Bundles
      • Usage Policies
      • Acknowledging and Citing NEON
      • Publishing Research Outputs
    • Data Notifications
    • NEON Data Management
      • Data Availability
      • Data Processing
      • Data Quality

    Data

  • Samples & Specimens
    • Biorepository Sample Portal at ASU
    • About Samples
      • Sample Types
      • Sample Repositories
      • Megapit and Distributed Initial Characterization Soil Archives
    • Finding and Accessing Sample Data
      • Species Checklists
      • Sample Explorer - Relationships and Data
      • Biorepository API
    • Requesting and Using Samples
      • Loans & Archival Requests
      • Usage Policies

    Samples & Specimens

  • Field Sites
    • Field Site Map and Info
    • Spatial Data Layers & Maps

    Field Sites

  • Resources
    • Getting Started with NEON Data
    • Research Support Services
      • Field Site Coordination
      • Letters of Support
      • Mobile Deployment Platforms
      • Permits and Permissions
      • AOP Flight Campaigns
      • Research Support FAQs
      • Research Support Projects
    • Code Hub
      • neonUtilities for R and Python
      • Code Resources Guidelines
      • Code Resources Submission
      • NEON's GitHub Organization Homepage
    • Learning Hub
      • Tutorials
      • Workshops & Courses
      • Science Videos
      • Teaching Modules
    • Science Seminars and Data Skills Webinars
    • Document Library
    • Funding Opportunities

    Resources

  • Impact
    • Research Highlights
    • Papers & Publications
    • NEON in the News

    Impact

  • Get Involved
    • Upcoming Events
    • Research and Collaborations
      • Environmental Data Science Innovation and Inclusion Lab
      • Collaboration with DOE BER User Facilities and Programs
      • EFI-NEON Ecological Forecasting Challenge
      • NEON Great Lakes User Group
      • NCAR-NEON-Community Collaborations
    • Advisory Groups
      • Science, Technology & Education Advisory Committee (STEAC)
      • Innovation Advisory Committee (IAC)
      • Technical Working Groups (TWG)
    • NEON Ambassador Program
      • Exploring NEON-Derived Data Products Workshop Series
    • Partnerships
    • Community Engagement
    • Work Opportunities

    Get Involved

  • My Account
  • Search

Search

Posey Creek NEON

Lewis Run NEON

Blandy Experimental Farm NEON

Lower Hop Brook NEON

Harvard Forest & Quabbin Watershed NEON

Bartlett Experimental Forest NEON

NSF Joint NCAR/NEON Workshop: Predicting life in the Earth system – linking the geosciences and ecology

The NSF sponsored joint NCAR/NEON workshop, Predicting life in the Earth system – linking the geosciences and ecology, is an opportunity to bring together members of the atmospheric science and ecological communities to advance the capability of Earth system prediction to include terrestrial ecosystems and biological resources. The workshop’s overarching theme will focus on convergent research between the geosciences and ecology for ecological forecasting and prediction at subseasonal to seasonal, seasonal to decadal, and centennial timescales, including use of observations, required data services infrastructure, and models.

2020 SESYNC Pursuit Proposal Deadline

The National Socio-Environmental Synthesis Center (SESYNC) announces its Spring 2020 Request for Proposals for collaborative team-based research projects that synthesize existing data, methods, theories, and tools to address a pressing socio-environmental problem. The request includes a research topic focused on NEON-enabled Socio-Environmental Synthesis. Proposals are due March 30, 2020 at 5 p.m. ET.

Select pixels and compare spectral signatures in R

In this tutorial, we will learn how to plot spectral signatures of several different land cover types using an interactive click feature of the terra package.

Learning Objectives

After completing this activity, you will be able to:

  • Extract and plot spectra from an HDF5 file.
  • Work with groups and datasets within an HDF5 file.
  • Use the terra::click() function to interact with an RGB raster image

Things You’ll Need To Complete This Tutorial

To complete this tutorial you will need the most current version of R and, preferably, RStudio loaded on your computer.

R Libraries to Install:

  • rhdf5: install.packages("BiocManager"), BiocManager::install("rhdf5")
  • terra: install.packages('terra')
  • plyr: install.packages('plyr')
  • reshape2: install.packages('reshape2')
  • ggplot2: install.packages('ggplot2')
  • neonUtilities: install.packages('neonUtilities')

More on Packages in R - Adapted from Software Carpentry.

Data to Download

These hyperspectral remote sensing data provide information on the National Ecological Observatory Network's San Joaquin Experimental Range (SJER) field site in March of 2021. The data used in this lesson is the 1km by 1km mosaic tile named NEON_D17_SJER_DP3_257000_4112000_reflectance.h5. If you already completed the previous lesson in this tutorial series, you do not need to download this data again. The entire SJER reflectance dataset can be accessed from the NEON Data Portal.

Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded data.

An overview of setting the working directory in R can be found here.

Recommended Skills

This tutorial will require that you be comfortable navigating HDF5 files, and have an understanding of what spectral signatures are. For additional information on these topics, we highly recommend you work through the earlier tutorials in this Introduction to Hyperspectral Remote Sensing Data series before starting on this tutorial.

Getting Started

First, we need to load our required packages and set the working directory.

# load required packages

library(rhdf5)

library(reshape2)

library(terra)

library(plyr)

library(ggplot2)

library(grDevices)



# set working directory, you can change this if desired

wd <- "~/data/" 

setwd(wd)

Download the reflectance tile, if you haven't already, using neonUtilities::byTileAOP:

byTileAOP(dpID = 'DP3.30006.001',

          site = 'SJER',

          year = '2021',

          easting = 257500,

          northing = 4112500,

          savepath = wd)

And then we can read in the hyperspectral hdf5 data. We will also collect a few other important pieces of information (band wavelengths and scaling factor) while we're at it.

# define filepath to the hyperspectral dataset

h5_file <- paste0(wd,"DP3.30006.001/neon-aop-products/2021/FullSite/D17/2021_SJER_5/L3/Spectrometer/Reflectance/NEON_D17_SJER_DP3_257000_4112000_reflectance.h5")



# read in the wavelength information from the HDF5 file

wavelengths <- h5read(h5_file,"/SJER/Reflectance/Metadata/Spectral_Data/Wavelength")



# grab scale factor from the Reflectance attributes

reflInfo <- h5readAttributes(h5_file,"/SJER/Reflectance/Reflectance_Data" )



scaleFact <- reflInfo$Scale_Factor

Now, we will read in the RGB image that we created in an earlier tutorial and plot it.

# read in RGB image as a 'stack' rather than a plain 'raster'

rgbStack <- rast(paste0(wd,"NEON_hyperspectral_tutorial_example_RGB_image.tif"))



# plot as RGB image, with a linear stretch

plotRGB(rgbStack,
        r=1,g=2,b=3, scale=300, 
        stretch = "lin")

RGB image of a portion of the SJER field site using 3 bands from the raster stack. Brightness values have been stretched using the stretch argument to produce a natural looking image.

Interactive click Function from the terra Package

Next, we use an interactive clicking function to identify the pixels that we want to extract spectral signatures for. To follow along with this tutorial, we suggest the following six cover types (exact locations shown in the image below).

  1. Water
  2. Tree canopy (avoid the shaded northwestern side of the tree)
  3. Irrigated grass
  4. Bare soil (baseball diamond infield)
  5. Building roof (blue)
  6. Road

As shown here:

RGB image of a portion of the SJER field site using 3 bands from the raster stack. Also displayed are points labeled with numbers one through six, representing six land cover types selected using the interactive click function from the raster package. These are: 1. Water, 2. Tree Canopy, 3. Grass, 4. Soil (Baseball Diamond), 5. Building Roof, and 6. Road. Plotting parameters have been changed to enhance visibility.
Six different land cover types chosen for this study in the order listed above (red numbers). This image is displayed with a histogram stretch.

Data Tip: Note from the terra::click Description (which you can read by typing help("click"): click "does not work well on the default RStudio plotting device. To work around that, you can first run dev.new(noRStudioGD = TRUE) which will create a separate window for plotting, then use plot() followed by click() and click on the map."

For this next part, if you are following along in RStudio, you will need to enter these line below directly in the Console. dev.new(noRStudioGD = TRUE) will open up a separate window for plotting, which is where you will click the pixels to extract spectra, using the terra::click functionality.

dev.new(noRStudioGD = TRUE)

Now we can create our RGB plot, and start clicking on this in the pop-out Graphics window.

# change plotting parameters to better see the points and numbers generated from clicking

par(col="red", cex=2)



# use a histogram stretch in order to provide more contrast for selecting pixels

plotRGB(rgbStack, r=1, g=2, b=3, scale=300, stretch = "hist") 



# use the 'click' function

c <- click(rgbStack, n = 6, id=TRUE, xy=TRUE, cell=TRUE, type="p", pch=16, col="red", col.lab="red")

Once you have clicked your six points, the graphics window should close. If you want to choose new points, or if you accidentally clicked a point that you didn't intend to, run the previous 2 chunks of code again to re-start.

The click() function identifies the cell number that you clicked, but in order to extract spectral signatures, we need to convert that cell number into a row and column, as shown here:

# convert raster cell number into row and column (used to extract spectral signature below)

c$row <- c$cell%/%nrow(rgbStack)+1 # add 1 because R is 1-indexed

c$col <- c$cell%%ncol(rgbStack)

Extract Spectral Signatures from HDF5 file

Next, we will loop through each of the cells that and use the h5read() function to extract the reflectance values of all bands from the given pixel (row and column).

# create a new dataframe from the band wavelengths so that we can add the reflectance values for each cover type

pixel_df <- as.data.frame(wavelengths)

# loop through each of the cells that we selected

for(i in 1:length(c$cell)){
# extract spectral values from a single pixel
aPixel <- h5read(h5_file,"/SJER/Reflectance/Reflectance_Data",
                 index=list(NULL,c$col[i],c$row[i]))

# scale reflectance values from 0-1
aPixel <- aPixel/as.vector(scaleFact)

# reshape the data and turn into dataframe
b <- adply(aPixel,c(1))

# rename the column that we just created
names(b)[2] <- paste0("Point_",i)

# add reflectance values for this pixel to our combined data.frame called pixel_df
pixel_df <- cbind(pixel_df,b[2])
}

Plot Spectral signatures using ggplot2

Finally, we have everything that we need to plot the spectral signatures for each of the pixels that we clicked. In order to color our lines by the different land cover types, we will first reshape our data using the melt() function, then plot the spectral signatures.

# Use the melt() function to reshape the dataframe into a format that ggplot prefers

pixel.melt <- reshape2::melt(pixel_df, id.vars = "wavelengths", value.name = "Reflectance")

# Now, let's plot the spectral signatures!

ggplot()+
  geom_line(data = pixel.melt, mapping = aes(x=wavelengths, y=Reflectance, color=variable), lwd=1.5)+
  scale_colour_manual(values = c("blue3","green4","green2","tan4","grey50","black"),
                      labels = c("Water","Tree","Grass","Soil","Roof","Road"))+
  labs(color = "Cover Type")+
  ggtitle("Land cover spectral signatures")+
  theme(plot.title = element_text(hjust = 0.5, size=20))+
  xlab("Wavelength")

Plot of spectral signatures for the six different land cover types: Water, Tree, Grass, Soil, Roof, and Road. The x-axis is wavelength in nanometers and the y-axis is reflectance.

Nice! However, there seems to be something weird going on in the wavelengths near ~1400nm and ~1850 nm...

Atmospheric Absorption Bands

Those irregularities around 1400nm and 1850 nm are two major atmospheric absorption bands - regions where gasses in the atmosphere (primarily carbon dioxide and water vapor) absorb radiation, and therefore, obscure the reflected radiation that the imaging spectrometer measures. Fortunately, the lower and upper bound of each of those atmospheric absorption bands is specified in the HDF5 file. Let's read those bands and plot rectangles where the reflectance measurements are obscured by atmospheric absorption.

# grab reflectance metadata (which contains absorption band limits)

reflMetadata <- h5readAttributes(h5_file,"/SJER/Reflectance" )

ab1 <- reflMetadata$Band_Window_1_Nanometers

ab2 <- reflMetadata$Band_Window_2_Nanometers

# Plot spectral signatures again with grey rectangles highlighting the absorption bands

ggplot()+
  geom_line(data = pixel.melt, mapping = aes(x=wavelengths, y=Reflectance, color=variable), lwd=1.5)+
  geom_rect(mapping=aes(ymin=min(pixel.melt$Reflectance),ymax=max(pixel.melt$Reflectance), xmin=ab1[1], xmax=ab1[2]), color="black", fill="grey40", alpha=0.8)+
  geom_rect(mapping=aes(ymin=min(pixel.melt$Reflectance),ymax=max(pixel.melt$Reflectance), xmin=ab2[1], xmax=ab2[2]), color="black", fill="grey40", alpha=0.8)+
  scale_colour_manual(values = c("blue3","green4","green2","tan4","grey50","black"),
                      labels = c("Water","Tree","Grass","Soil","Roof","Road"))+
  labs(color = "Cover Type")+
  ggtitle("Land cover spectral signatures with\n atmospheric absorption bands greyed out")+
  theme(plot.title = element_text(hjust = 0.5, size=20))+
  xlab("Wavelength")

Plot of spectral signatures for the six different land cover types: Water, Tree, Grass, Soil, Roof, and Road. This plot includes two greyed-out rectangles in regions near 1400nm and 1850nm where the reflectance measurements are obscured by atmospheric absorption. The x-axis is wavelength in nanometers and the y-axis is reflectance.

Now we can clearly see that the noisy sections of each spectral signature are within the atmospheric absorption bands. For our final step, let's take all reflectance values from within each absorption band and set them to NA to remove the noisiest sections from the plot.

# Duplicate the spectral signatures into a new data.frame

pixel.melt.masked <- pixel.melt

# Mask out all values within each of the two atmospheric absorption bands

pixel.melt.masked[pixel.melt.masked$wavelengths>ab1[1]&pixel.melt.masked$wavelengths<ab1[2],]$Reflectance <- NA

pixel.melt.masked[pixel.melt.masked$wavelengths>ab2[1]&pixel.melt.masked$wavelengths<ab2[2],]$Reflectance <- NA



# Plot the masked spectral signatures

ggplot()+
  geom_line(data = pixel.melt.masked, mapping = aes(x=wavelengths, y=Reflectance, color=variable), lwd=1.5)+
  scale_colour_manual(values = c("blue3","green4","green2","tan4","grey50","black"),
                      labels = c("Water","Tree","Grass","Soil","Roof","Road"))+
  labs(color = "Cover Type")+
  ggtitle("Land cover spectral signatures with\n atmospheric absorption bands removed")+
  theme(plot.title = element_text(hjust = 0.5, size=20))+
  xlab("Wavelength")

Plot of spectral signatures for the six different land cover types. Values falling within the atmospheric absorption bands have been set to NA and ommited from the plot. The x-axis is wavelength in nanometers and the y-axis is reflectance.

There you have it, spectral signatures for six different land cover types, with the regions from the atmospheric absorption bands removed.

Challenge: Compare Spectral Signatures

There are many interesting comparisons to make with spectral signatures. Try these challenges to explore hyperspectral data further:

  1. Compare six different types of vegetation, and pick an appropriate color for each of their lines. A nice guide to the many different color options in R can be found here.

  2. What happens if you only click five points? What about ten? How does this change the spectral signature plots, and can you fix any errors that occur?

  3. Does shallow water have a different spectral signature than deep water?

Subsetting NEON HDF5 hyperspectral files to reduce file size

In this tutorial, we will subset an existing HDF5 file containing NEON hyperspectral data. The purpose of this exercise is to generate a smaller file for use in subsequent analyses to reduce the file transfer time and processing power needed.

Learning Objectives

After completing this activity, you will be able to:

  • Navigate an HDF5 file to identify the variables of interest.
  • Generate a new HDF5 file from a subset of the existing dataset.
  • Save the new HDF5 file for future use.

Things You’ll Need To Complete This Tutorial

To complete this tutorial you will need the most current version of R and, preferably, RStudio loaded on your computer.

R Libraries to Install:

  • rhdf5: install.packages("BiocManager"), BiocManager::install("rhdf5")
  • raster: install.packages('raster')

More on Packages in R - Adapted from Software Carpentry.

Data to Download

The purpose of this tutorial is to reduce a large file (~652Mb) to a smaller size. The download linked here is the original large file, and therefore you may choose to skip this tutorial and download if you are on a slow internet connection or have file size limitations on your device.

Download NEON Teaching Dataset: Full Tile Imaging Spectrometer Data - HDF5 (652Mb)

These hyperspectral remote sensing data provide information on the National Ecological Observatory Network's San Joaquin Exerimental Range field site.

These data were collected over the San Joaquin field site located in California (Domain 17) in March of 2019 and processed at NEON headquarters. This particular mosaic tile is named NEON_D17_SJER_DP3_257000_4112000_reflectance.h5. The entire dataset can be accessed by request from the NEON Data Portal.

Download Dataset

Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.

An overview of setting the working directory in R can be found here.

R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.


Recommended Skills

For this tutorial, we recommend that you have some basic familiarity with the HDF5 file format, including knowing how to open HDF5 files (in Rstudio or HDF5Viewer) and how groups and metadata are structured. To brush up on these skills, we suggest that you work through the Introduction to Working with HDF5 Format in R series before moving on to this tutorial.

Why subset your dataset?

There are many reasons why you may wish to subset your HDF5 file. Primarily, HDF5 files may contain a large amount of information that is not necessary for your purposes. By subsetting the file, you can reduce file size, thereby shrinking your storage needs, shortening file transfer/download times, and reducing your processing time for analysis. In this example, we will take a full HDF5 file of NEON hyperspectral reflectance data from the San Joaquin Experimental Range (SJER) that has a file size of ~652 Mb and make a new HDF5 file with a reduced spatial extent, and a reduced spectral resolution, yielding a file of only ~50.1 Mb. This reduction in file size will make it easier and faster to conduct your analysis and to share your data with others. We will then use this subsetted file in the Introduction to Hyperspectral Remote Sensing Data series.

If you find that downloading the full 652 Mb file takes too much time or storage space, you will find a link to this subsetted file at the top of each lesson in the Introduction to Hyperspectral Remote Sensing Data series.

Exploring the NEON hyperspectral HDF5 file structure

In order to determine what information that we want to put into our subset, we should first take a look at the full NEON hyperspectral HDF5 file structure to see what is included. To do so, we will load the required package for this tutorial (you can un-comment the middle two lines to load 'BiocManager' and 'rhdf5' if you don't already have it on your computer).

# Install rhdf5 package (only need to run if not already installed)
# install.packages("BiocManager")
# BiocManager::install("rhdf5")

# Load required packages
library(rhdf5)

Next, we define our working directory where we have saved the full HDF5 file of NEON hyperspectral reflectance data from the SJER site. Note, the filepath to the working directory will depend on your local environment. Then, we create a string (f) of the HDF5 filename and read its attributes.

# set working directory to ensure R can find the file we wish to import and where
# we want to save our files. Be sure to move the download into your working directory!
wd <- "~/Documents/data/" # This will depend on your local environment
setwd(wd)

# Make the name of our HDF5 file a variable
f_full <- paste0(wd,"NEON_D17_SJER_DP3_257000_4112000_reflectance.h5")

Next, let's take a look at the structure of the full NEON hyperspectral reflectance HDF5 file.

View(h5ls(f_full, all=T))

Wow, there is a lot of information in there! The majority of the groups contained within this file are Metadata, most of which are used for processing the raw observations into the reflectance product that we want to use. For demonstration and teaching purposes, we will not need this information. What we will need are things like the Coordinate_System information (so that we can georeference these data), the Wavelength dataset (so that we can match up each band with its appropriate wavelength in the electromagnetic spectrum), and of course the Reflectance_Data themselves. You can also see that each group and dataset has a number of associated attributes (in the 'num_attrs' column). We will want to copy over those attributes into the data subset as well. But first, we need to define each of the groups that we want to populate in our new HDF5 file.

Create new HDF5 file framework

In order to make a new subset HDF5 file, we first must create an empty file with the appropriate name, then we will begin to fill in that file with the essential data and attributes that we want to include. Note that the function h5createFile() will not overwrite an existing file. Therefore, if you have already created or downloaded this file, the function will throw an error! Each function should return 'TRUE' if it runs correctly.

# First, create a name for the new file
f <- paste0(wd, "NEON_hyperspectral_tutorial_example_subset.h5")

# create hdf5 file
h5createFile(f)

## [1] TRUE

# Now we create the groups that we will use to organize our data
h5createGroup(f, "SJER/")

## [1] TRUE

h5createGroup(f, "SJER/Reflectance")

## [1] TRUE

h5createGroup(f, "SJER/Reflectance/Metadata")

## [1] TRUE

h5createGroup(f, "SJER/Reflectance/Metadata/Coordinate_System")

## [1] TRUE

h5createGroup(f, "SJER/Reflectance/Metadata/Spectral_Data")

## [1] TRUE

Adding group attributes

One of the great things about HDF5 files is that they can contain data and attributes within the same group. As explained within the Introduction to Working with HDF5 Format in R series, attributes are a type of metadata that are associated with an HDF5 group or dataset. There may be multiple attributes associated with each group and/or dataset. Attributes come with a name and an associated array of information. In this tutorial, we will read the existing attribute data from the full hyperspectral tile using the h5readAttributes() function (which returns a list of attributes), then we loop through those attributes and write each attribute to its appropriate group using the h5writeAttribute() function.

First, we will do this for the low-level "SJER/Reflectance" group. In this step, we are adding attributes to a group rather than a dataset. To do so, we must first open a file and group interface using the H5Fopen and H5Gopen functions, then we can use h5writeAttribute() to edit the group that we want to give an attribute.

a <- h5readAttributes(f_full,"/SJER/Reflectance/")
fid <- H5Fopen(f)
g <- H5Gopen(fid, "SJER/Reflectance")

for(i in 1:length(names(a))){
  h5writeAttribute(attr = a[[i]], h5obj=g, name=names(a[i]))
}

# It's always a good idea to close the file connection immidiately
# after finishing each step that leaves an open connection.
h5closeAll()

Next, we will loop through each of the datasets within the Coordinate_System group, and copy those (and their attributes, if present) from the full tile to our subset file. The Coordinate_System group contains many important pieces of information for geolocating our data, so we need to make sure that the subset file has that information.

# make a list of all groups within the full tile file
ls <- h5ls(f_full,all=T)

# make a list of all of the names within the Coordinate_System group
cg <- unique(ls[ls$group=="/SJER/Reflectance/Metadata/Coordinate_System",]$name)

# Loop through the list of datasets that we just made above
for(i in 1:length(cg)){
  print(cg[i])
  
  # Read the inividual dataset within the Coordinate_System group
  d=h5read(f_full,paste0("/SJER/Reflectance/Metadata/Coordinate_System/",cg[i]))

  # Read the associated attributes (if any)
  a=h5readAttributes(f_full,paste0("/SJER/Reflectance/Metadata/Coordinate_System/",cg[i]))
    
  # Assign the attributes (if any) to the dataset
  attributes(d)=a
  
  # Finally, write the dataset to the HDF5 file
  h5write(obj=d,file=f,
          name=paste0("/SJER/Reflectance/Metadata/Coordinate_System/",cg[i]),
          write.attributes=T)
}

## [1] "Coordinate_System_String"
## [1] "EPSG Code"
## [1] "Map_Info"
## [1] "Proj4"

Spectral Subsetting

The goal of subsetting this dataset is to substantially reduce the file size, making it faster to download and process these data. While each AOP mosaic tile is not particularly large in terms of its spatial scale (1km by 1km at 1m resolution= 1,000,000 pixels, or about half as many pixels at shown on a standard 1080p computer screen), the 426 spectral bands available result in a fairly large file size. Therefore, we will reduce the spectral resolution of these data by selecting every fourth band in the dataset, which reduces the file size to 1/4 of the original!

Some circumstances demand the full spectral resolution file. For example, if you wanted to discern between the spectral signatures of similar minerals, or if you were conducting an analysis of the differences in the 'red edge' between plant functional types, you would want to use the full spectral resolution of the original hyperspectral dataset. Still, we can use far fewer bands for demonstration and teaching purposes, while still getting a good sense of what these hyperspectral data can do.

# First, we make our 'index', a list of number that will allow us to select every fourth band, using the "sequence" function seq()
idx <- seq(from = 1, to = 426, by = 4)

# We then use this index to select particular wavelengths from the full tile using the "index=" argument
wavelengths <- h5read(file = f_full, 
             name = "SJER/Reflectance/Metadata/Spectral_Data/Wavelength", 
             index = list(idx)
            )

# As per above, we also need the wavelength attributes
wavelength.attributes <- h5readAttributes(file = f_full, 
                       name = "SJER/Reflectance/Metadata/Spectral_Data/Wavelength")
attributes(wavelengths) <- wavelength.attributes

# Finally, write the subset of wavelengths and their attributes to the subset file
h5write(obj=wavelengths, file=f,
        name="SJER/Reflectance/Metadata/Spectral_Data/Wavelength",
        write.attributes=T)

Spatial Subsetting

Even after spectral subsetting, our file size would be greater than 100Mb. herefore, we will also perform a spatial subsetting process to further reduce our file size. Now, we need to figure out which part of the full image that we want to extract for our subset. It takes a number of steps in order to read in a band of data and plot the reflectance values - all of which are thoroughly described in the Intro to Working with Hyperspectral Remote Sensing Data in HDF5 Format in R tutorial. For now, let's focus on the essentials for our problem at hand. In order to explore the spatial qualities of this dataset, let's plot a single band as an overview map to see what objects and land cover types are contained within this mosaic tile. The Reflectance_Data dataset has three dimensions in the order of bands, columns, rows. We want to extract a single band, and all 1,000 columns and rows, so we will feed those values into the index= argument as a list. For this example, we will select the 58th band in the hyperspectral dataset, which corresponds to a wavelength of 667nm, which is in the red end of the visible electromagnetic spectrum. We will use NULL in the column and row position to indicate that we want all of the columns and rows (we agree that it is weird that NULL indicates "all" in this circumstance, but that is the default behavior for this, and many other, functions).

# Extract or "slice" data for band 58 from the HDF5 file
b58 <- h5read(f_full,name = "SJER/Reflectance/Reflectance_Data",
             index=list(58,NULL,NULL))
h5closeAll()

# convert from array to matrix
b58 <- b58[1,,]

# Make a plot to view this band
image(log(b58), col=grey(0:100/100))

As we can see here, this hyperspectral reflectance tile contains a school campus that is under construction. There are many different land cover types contained here, which makes it a great example! Perhaps the most unique feature shown is in the bottom right corner of this image, where we can see the tip of a small reservoir. Let's be sure to capture this feature in our spatial subset, as well as a few other land cover types (irrigated grass, trees, bare soil, and buildings).

While raster images count their pixels from the top left corner, we are working with a matrix, which counts its pixels from the bottom left corner. Therefore, rows are counted from the bottom to the top, and columns are counted from the left to the right. If we want to sample the bottom right quadrant of this image, we need to select rows 1 through 500 (bottom half), and columns 501 through 1000 (right half). Note that, as above, the index= argument in h5read() requires a list of three dimensions for this example - in the order of bands, columns, rows.

subset_rows <- 1:500
subset_columns <- 501:1000
# Extract or "slice" data for band 44 from the HDF5 file
b58 <- h5read(f_full,name = "SJER/Reflectance/Reflectance_Data",
             index=list(58,subset_columns,subset_rows))
h5closeAll()

# convert from array to matrix
b58 <- b58[1,,]

# Make a plot to view this band
image(log(b58), col=grey(0:100/100))

Perfect - now we have a spatial subset that includes all of the different land cover types that we are interested in investigating.

### Challenge: Pick your subset
  1. Pick your own area of interest for this spatial subset, and find the rows and columns that capture that area. Can you include some solar panels, as well as the water body?

  2. Does it make a difference if you use a band from another part of the electromagnetic spectrum, such as the near-infrared? Hint: you can use the 'wavelengths' function above while removing the index= argument to get the full list of band wavelengths.

Extracting a subset

Now that we have determined our ideal spectral and spatial subsets for our analysis, we are ready to put both of those pieces of information into our h5read() function to extract our example subset out of the full NEON hyperspectral dataset. Here, we are taking every fourth band (using our idx variabe), columns 501:1000 (the right half of the tile) and rows 1:500 (the bottom half of the tile). The results in us extracting every fourth band of the bottom-right quadrant of the mosaic tile.

# Read in reflectance data.
# Note the list that we feed into the index argument! 
# This tells the h5read() function which bands, rows, and 
# columns to read. This is ultimately how we reduce the file size.
hs <- h5read(file = f_full, 
             name = "SJER/Reflectance/Reflectance_Data", 
             index = list(idx, subset_columns, subset_rows)
            )

As per the 'adding group attributes' section above, we will need to add the attributes to the hyperspectral data (hs) before writing to the new HDF5 subset file (f). The hs variable already has one attribute, $dim, which contains the actual dimensions of the hs array, and will be important for writing the array to the f file later. We will want to combine this attribute with all of the other Reflectance_Data group attributes from the original HDF5 file, f. However, some of the attributes will no longer be valid, such as the Dimensions and Spatial_Extent_meters attributes, so we will need to overwrite those before assigning these attributes to the hs variable to then write to the f file.

# grab the '$dim' attribute - as this will be needed 
# when writing the file at the bottom
hsd <- attributes(hs)

# We also need the attributes for the reflectance data.
ha <- h5readAttributes(file = f_full, 
                       name = "SJER/Reflectance/Reflectance_Data")

# However, some of the attributes are not longer valid since 
# we changed the spatial extend of this dataset. therefore, 
# we will need to overwrite those with the correct values.
ha$Dimensions <- c(500,500,107) # Note that the HDF5 file saves dimensions in a different order than R reads them
ha$Spatial_Extent_meters[1] <- ha$Spatial_Extent_meters[1]+500
ha$Spatial_Extent_meters[3] <- ha$Spatial_Extent_meters[3]+500
attributes(hs) <- c(hsd,ha)

# View the combined attributes to ensure they are correct
attributes(hs)

## $dim
## [1] 107 500 500
## 
## $Cloud_conditions
## [1] "For cloud conditions information see Weather Quality Index dataset."
## 
## $Cloud_type
## [1] "Cloud type may have been selected from multiple flight trajectories."
## 
## $Data_Ignore_Value
## [1] -9999
## 
## $Description
## [1] "Atmospherically corrected reflectance."
## 
## $Dimension_Labels
## [1] "Line, Sample, Wavelength"
## 
## $Dimensions
## [1] 500 500 107
## 
## $Interleave
## [1] "BSQ"
## 
## $Scale_Factor
## [1] 10000
## 
## $Spatial_Extent_meters
## [1]  257500  258000 4112500 4113000
## 
## $Spatial_Resolution_X_Y
## [1] 1 1
## 
## $Units
## [1] "Unitless."
## 
## $Units_Valid_range
## [1]     0 10000

# Finally, write the hyperspectral data, plus attributes, 
# to our new file 'f'.
h5write(obj=hs, file=f,
        name="SJER/Reflectance/Reflectance_Data",
        write.attributes=T)

## You created a large dataset with compression and chunking.
## The chunk size is equal to the dataset dimensions.
## If you want to read subsets of the dataset, you should testsmaller chunk sizes to improve read times.

# It's always a good idea to close the HDF5 file connection
# before moving on.
h5closeAll()

That's it! We just created a subset of the original HDF5 file, and included the most essential groups and metadata for exploratory analysis. You may consider adding other information, such as the weather quality indicator, when subsetting datasets for your own purposes.

If you want to take a look at the subset that you just made, run the h5ls() function:

View(h5ls(f, all=T))

Pagination

  • First page
  • Previous page
  • …
  • Page 47
  • Page 48
  • Page 49
  • Page 50
  • Current page 51
  • Page 52
  • Page 53
  • Page 54
  • Page 55
  • …
  • Next page
  • Last page
Subscribe to
NSF NEON, Operated by Battelle

Follow Us:

Join Our Newsletter

Get updates on events, opportunities, and how NEON is being used today.

Subscribe Now

Footer

  • About Us
  • Contact Us
  • Terms & Conditions
  • Careers
  • Code of Conduct

Copyright © Battelle, 2026

The National Ecological Observatory Network is a major facility fully funded by the U.S. National Science Foundation.

Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the U.S. National Science Foundation.