Skip to main content
NSF NEON | Open Data to Understand our Ecosystems logo

Main navigation

  • About Us
    • Overview
      • Spatial and Temporal Design
      • History
    • Vision and Management
    • Advisory Groups
      • Advisory Committee: STEAC
      • Technical Working Groups (TWGs)
    • FAQ
    • Contact Us
      • Field Offices
    • User Accounts
    • Staff

    About Us

  • Data & Samples
    • Data Portal
      • Explore Data Products
      • Data Availability Charts
      • Spatial Data & Maps
      • Document Library
      • API & GraphQL
      • Prototype Data
      • External Lab Data Ingest (restricted)
    • Samples & Specimens
      • Discover and Use NEON Samples
        • Sample Types
        • Sample Repositories
        • Sample Explorer
        • Megapit and Distributed Initial Characterization Soil Archives
        • Excess Samples
      • Sample Processing
      • Sample Quality
      • Taxonomic Lists
    • Collection Methods
      • Protocols & Standardized Methods
      • AIrborne Remote Sensing
        • Flight Box Design
        • Flight Schedules and Coverage
        • Daily Flight Reports
        • Camera
        • Imaging Spectrometer
        • Lidar
      • Automated Instruments
        • Site Level Sampling Design
        • Sensor Collection Frequency
        • Instrumented Collection Types
          • Meteorology
          • Phenocams
          • Soil Sensors
          • Ground Water
          • Surface Water
      • Observational Sampling
        • Site Level Sampling Design
        • Sampling Schedules
        • Observation Types
          • Aquatic Organisms
            • Aquatic Microbes
            • Fish
            • Macroinvertebrates & Zooplankton
            • Periphyton, Phytoplankton, and Aquatic Plants
          • Terrestrial Organisms
            • Birds
            • Ground Beetles
            • Mosquitoes
            • Small Mammals
            • Soil Microbes
            • Terrestrial Plants
            • Ticks
          • Hydrology & Geomorphology
            • Discharge
            • Geomorphology
          • Biogeochemistry
          • DNA Sequences
          • Pathogens
          • Sediments
          • Soils
            • Soil Descriptions
    • Data Notifications
    • Data Guidelines and Policies
      • Acknowledging and Citing NEON
      • Publishing Research Outputs
      • Usage Policies
    • Data Management
      • Data Availability
      • Data Formats and Conventions
      • Data Processing
      • Data Quality
      • Data Product Revisions and Releases
        • Release 2021
        • Release 2022
        • Release 2023
      • NEON and Google
      • Externally Hosted Data

    Data & Samples

  • Field Sites
    • About Field Sites and Domains
    • Explore Field Sites
    • Site Management Data Product

    Field Sites

  • Impact
    • Observatory Blog
    • Case Studies
    • Spotlights
    • Papers & Publications
    • Newsroom
      • NEON in the News
      • Newsletter Archive

    Impact

  • Resources
    • Getting Started with NEON Data & Resources
    • Documents and Communication Resources
      • Papers & Publications
      • Document Library
      • Outreach Materials
    • Code Hub
      • Code Resources Guidelines
      • Code Resources Submission
      • NEON's GitHub Organization Homepage
    • Learning Hub
      • Science Videos
      • Tutorials
      • Workshops & Courses
      • Teaching Modules
      • Faculty Mentoring Networks
      • Data Education Fellows
    • Research Support and Assignable Assets
      • Field Site Coordination
      • Letters of Support
      • Mobile Deployment Platforms
      • Permits and Permissions
      • AOP Flight Campaigns
      • Excess Samples
      • Assignable Assets FAQs
    • Funding Opportunities

    Resources

  • Get Involved
    • Advisory Groups
    • Upcoming Events
    • Past Events
    • NEON Ambassador Program
    • Collaborative Works
      • EFI-NEON Ecological Forecasting Challenge
      • NCAR-NEON-Community Collaborations
      • NEON Science Summit
      • NEON Great Lakes User Group
    • Community Engagement
    • Science Seminars and Data Skills Webinars
    • Work Opportunities
      • Careers
      • Seasonal Fieldwork
      • Postdoctoral Fellows
      • Internships
        • Intern Alumni
    • Partners

    Get Involved

  • My Account
  • Search

Search

Learning Hub

  • Science Videos
  • Tutorials
  • Workshops & Courses
  • Teaching Modules
  • Faculty Mentoring Networks
  • Data Education Fellows

Breadcrumb

  1. Resources
  2. Learning Hub
  3. Tutorials
  4. Calculate Vegetation Biomass from LiDAR Data in Python

Tutorial

Calculate Vegetation Biomass from LiDAR Data in Python

Authors: Tristan Goulden

Last Updated: Apr 8, 2021

In this tutorial, we will calculate the biomass for a section of the SJER site. We will be using the Canopy Height Model discrete LiDAR data product as well as NEON field data on vegetation data. This tutorial will calculate Biomass for individual trees in the forest.

Objectives

After completing this tutorial, you will be able to:

  • Learn how to apply a guassian smoothing fernal for high-frequency spatial filtering
  • Apply a watershed segmentation algorithm for delineating tree crowns
  • Calculate biomass predictor variables from a CHM
  • Setup training data for Biomass predictions
  • Apply a Random Forest machine learning approach to calculate biomass

Install Python Packages

  • numpy
  • gdal
  • matplotlib
  • matplotlib.pyplot
  • os

Download Data

If you have already downloaded the data set for the Data Institute, you have the data for this tutorial within the SJER directory. If you would like to just download the data for this tutorial use the following link.

Download the Biomass Calculation teaching data subset

In this tutorial, we will calculate the biomass for a section of the SJER site. We will be using the Canopy Height Model discrete LiDAR data product as well as NEON field data on vegetation data. This tutorial will calculate Biomass for individual trees in the forest.

The calculation of biomass consists of four primary steps:

  1. Delineating individual tree crowns
  2. Calculating predictor variables for all individuals
  3. Collecting training data
  4. Applying a regression model to estiamte biomass from predictors

In this tutorial we will use a watershed segmentation algorithm for delineating tree crowns (step 1) and and a Random Forest (RF) machine learning algorithm for relating the predictor variables to biomass (part 4). The predictor variables were selected following suggestions by Gleason et al. (2012) and biomass estimates were determined from DBH (diamter at breast height) measurements following relationships given in Jenkins et al. (2003).

Get Started

First, we need to specify the directory where we will find and save the data needed for this tutorial. You will need to change this line to suit your local machine. I have decided to save my data in the following directory:

data_path = '/Users/olearyd/Git/data/'

Next, we will import several of the typical libraries.

import numpy as np
import os
import gdal, osr
import matplotlib.pyplot as plt
import sys
from scipy import ndimage as ndi
%matplotlib inline 

Next, we will add libraries from skilearn which will help with the watershed delination, determination of predictor variables and random forest algorithm

#Import biomass specific libraries
from skimage.morphology import watershed
from skimage.feature import peak_local_max
from skimage.measure import regionprops
from sklearn.ensemble import RandomForestRegressor

Define functions

Now we will define a few functions that allow us to more easily work with the NEON data.

  • plot_band_array: function to plot NEON spatial data.
#Define plot band array function

def plot_band_array(band_array,image_extent,title,cmap_title,colormap,colormap_limits):
    plt.imshow(band_array,extent=image_extent)
    cbar = plt.colorbar(); plt.set_cmap(colormap); plt.clim(colormap_limits)
    cbar.set_label(cmap_title,rotation=270,labelpad=20)
    plt.title(title); ax = plt.gca()
    ax.ticklabel_format(useOffset=False, style='plain') 
    rotatexlabels = plt.setp(ax.get_xticklabels(),rotation=90)

  • array2raster: function to output geotiff files.
def array2raster(newRasterfn,rasterOrigin,pixelWidth,pixelHeight,array,epsg):

    cols = array.shape[1]
    rows = array.shape[0]
    originX = rasterOrigin[0]
    originY = rasterOrigin[1]

    driver = gdal.GetDriverByName('GTiff')
    outRaster = driver.Create(newRasterfn, cols, rows, 1, gdal.GDT_Float32)
    outRaster.SetGeoTransform((originX, pixelWidth, 0, originY, 0, pixelHeight))
    outband = outRaster.GetRasterBand(1)
    outband.WriteArray(array)
    outRasterSRS = osr.SpatialReference()
    outRasterSRS.ImportFromEPSG(epsg)
    outRaster.SetProjection(outRasterSRS.ExportToWkt())
    outband.FlushCache()
    
  • raster2array: function to conver rasters to an array.
def raster2array(geotif_file):
    metadata = {}
    dataset = gdal.Open(geotif_file)
    metadata['array_rows'] = dataset.RasterYSize
    metadata['array_cols'] = dataset.RasterXSize
    metadata['bands'] = dataset.RasterCount
    metadata['driver'] = dataset.GetDriver().LongName
    metadata['projection'] = dataset.GetProjection()
    metadata['geotransform'] = dataset.GetGeoTransform()

    mapinfo = dataset.GetGeoTransform()
    metadata['pixelWidth'] = mapinfo[1]
    metadata['pixelHeight'] = mapinfo[5]

    metadata['ext_dict'] = {}
    metadata['ext_dict']['xMin'] = mapinfo[0]
    metadata['ext_dict']['xMax'] = mapinfo[0] + dataset.RasterXSize/mapinfo[1]
    metadata['ext_dict']['yMin'] = mapinfo[3] + dataset.RasterYSize/mapinfo[5]
    metadata['ext_dict']['yMax'] = mapinfo[3]

    metadata['extent'] = (metadata['ext_dict']['xMin'],metadata['ext_dict']['xMax'],
                          metadata['ext_dict']['yMin'],metadata['ext_dict']['yMax'])

    if metadata['bands'] == 1:
        raster = dataset.GetRasterBand(1)
        metadata['noDataValue'] = raster.GetNoDataValue()
        metadata['scaleFactor'] = raster.GetScale()

        # band statistics
        metadata['bandstats'] = {} # make a nested dictionary to store band stats in same 
        stats = raster.GetStatistics(True,True)
        metadata['bandstats']['min'] = round(stats[0],2)
        metadata['bandstats']['max'] = round(stats[1],2)
        metadata['bandstats']['mean'] = round(stats[2],2)
        metadata['bandstats']['stdev'] = round(stats[3],2)

        array = dataset.GetRasterBand(1).ReadAsArray(0,0,
                                                     metadata['array_cols'],
                                                     metadata['array_rows']).astype(np.float)
        array[array==int(metadata['noDataValue'])]=np.nan
        array = array/metadata['scaleFactor']
        return array, metadata

    elif metadata['bands'] > 1:
        print('More than one band ... need to modify function for case of multiple bands')
  • crown_geometric_volume_pth: function to get tree crown volumn.
def crown_geometric_volume_pth(tree_data,min_tree_height,pth):
    p = np.percentile(tree_data, pth)
    tree_data_pth = [v if v < p else p for v in tree_data]
    crown_geometric_volume_pth = np.sum(tree_data_pth - min_tree_height)
    return crown_geometric_volume_pth, p
  • get_predictors: function to get the trees from the biomass data.
def get_predictors(tree,chm_array, labels):
    indexes_of_tree = np.asarray(np.where(labels==tree.label)).T
    tree_crown_heights = chm_array[indexes_of_tree[:,0],indexes_of_tree[:,1]]
    
    full_crown = np.sum(tree_crown_heights - np.min(tree_crown_heights))
    
    crown50, p50 = crown_geometric_volume_pth(tree_crown_heights,tree.min_intensity,50)
    crown60, p60 = crown_geometric_volume_pth(tree_crown_heights,tree.min_intensity,60)
    crown70, p70 = crown_geometric_volume_pth(tree_crown_heights,tree.min_intensity,70)
    
        
    return [tree.label,
            np.float(tree.area),
            tree.major_axis_length,
            tree.max_intensity,
            tree.min_intensity, 
            p50, p60, p70,
            full_crown, crown50, crown60, crown70]

Canopy Height Data

With everything set up, we can now start working with our data by define the file path to our CHM file. Note that you will need to change this and subsequent filepaths according to your local machine.

chm_file = data_path+'NEON_D17_SJER_DP3_256000_4106000_CHM.tif'

When we output the results, we will want to include the same file information as the input, so we will gather the file name information.

#Get info from chm file for outputting results
just_chm_file = os.path.basename(chm_file)
just_chm_file_split = just_chm_file.split(sep="_")

Now we will get the CHM data...

chm_array, chm_array_metadata = raster2array(chm_file)

..., plot it, and save the figure.

#Plot the original CHM
plt.figure(1)

#Plot the CHM figure
plot_band_array(chm_array,chm_array_metadata['extent'],
                'Canopy height Model',
                'Canopy height (m)',
                'Greens',[0, 9])
plt.savefig(data_path+just_chm_file[0:-4]+'_CHM.png',dpi=300,orientation='landscape',
            bbox_inches='tight',
            pad_inches=0.1)

png

It looks like SJER primarily has low vegetation with scattered taller trees.

Create Filtered CHM

Now we will use a Gaussian smoothing kernal (convolution) across the data set to remove spurious high vegetation points. This will help ensure we are finding the treetops properly before running the watershed segmentation algorithm.

For different forest types it may be necessary to change the input parameters. Information on the function can be found in the SciPy documentation.

Of most importance are the second and fifth inputs. The second input defines the standard deviation of the Gaussian smoothing kernal. Too large a value will apply too much smoothing, too small and some spurious high points may be left behind. The fifth, the truncate value, controls after how many standard deviations the Gaussian kernal will get cut off (since it theoretically goes to infinity).

#Smooth the CHM using a gaussian filter to remove spurious points
chm_array_smooth = ndi.gaussian_filter(chm_array,2,
                                       mode='constant',cval=0,truncate=2.0)
chm_array_smooth[chm_array==0] = 0 

Now save a copy of filtered CHM. We will later use this in our code, so we'll output it into our data directory.

#Save the smoothed CHM
array2raster(data_path+'chm_filter.tif',
             (chm_array_metadata['ext_dict']['xMin'],chm_array_metadata['ext_dict']['yMax']),
             1,-1,
             np.array(chm_array_smooth,dtype=float),
             32611)

Determine local maximums

Now we will run an algorithm to determine local maximums within the image. Setting indices to 'False' returns a raster of the maximum points, as opposed to a list of coordinates. The footprint parameter is an area where only a single peak can be found. This should be approximately the size of the smallest tree. Information on more sophisticated methods to define the window can be found in Chen (2006).

#Calculate local maximum points in the smoothed CHM
local_maxi = peak_local_max(chm_array_smooth,indices=False, footprint=np.ones((5, 5)))

Our new object local_maxi is an array of boolean values where each pixel is identified as either being the local maximum (True) or not being the local maximum (False).

local_maxi
array([[False, False, False, ..., False, False, False],
       [False, False,  True, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

This is very helpful, but it can be difficult to visualizee boolean values using our typical numeric plotting procedures as defined in the plot_band_array function above. Therefore, we will need to convert this boolean array to an numeric format to use this function. Booleans convert easily to integers with values of False=0 and True=1 using the .astype(int) method.

local_maxi.astype(int)
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 1, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

Next ,we can plot the raster of local maximums bo coercing the boolean array into an array ofintegers inline. The following figure shows the difference in finding local maximums for a filtered vs. non-filtered CHM.

We will save the graphics (.png) in an outputs folder sister to our working directory and data outputs (.tif) to our data directory.

#Plot the local maximums
plt.figure(2)
plot_band_array(local_maxi.astype(int),chm_array_metadata['extent'],
                'Maximum',
                'Maxi',
                'Greys',
                [0, 1])

plt.savefig(data_path+just_chm_file[0:-4]+ '_Maximums.png',
            dpi=300,orientation='landscape',
            bbox_inches='tight',pad_inches=0.1)

array2raster(data_path+'maximum.tif',
             (chm_array_metadata['ext_dict']['xMin'],chm_array_metadata['ext_dict']['yMax']),
             1,-1,np.array(local_maxi,dtype=np.float32),32611)

png

If we were to look at the overlap between the tree crowns and the local maxima from each method, it would appear a bit like this raster.

The difference in finding local maximums for a filtered vs. non-filtered CHM. Source: National Ecological Observatory Network (NEON)

Apply labels to all of the local maximum points

#Identify all the maximum points
markers = ndi.label(local_maxi)[0]

Next we will create a mask layer of all of the vegetation points so that the watershed segmentation will only occur on the trees and not extend into the surrounding ground points. Since 0 represent ground points in the CHM, setting the mask to 1 where the CHM is not zero will define the mask

#Create a CHM mask so the segmentation will only occur on the trees
chm_mask = chm_array_smooth
chm_mask[chm_array_smooth != 0] = 1

Watershed segmentation

As in a river system, a watershed is divided by a ridge that divides areas. Here our watershed are the individual tree canopies and the ridge is the delineation between each one.

A raster classified based on watershed segmentation. Source: National Ecological Observatory Network (NEON)

Next, we will perform the watershed segmentation which produces a raster of labels.

#Perfrom watershed segmentation        
labels = watershed(chm_array_smooth, markers, mask=chm_mask)
labels_for_plot = labels.copy()
labels_for_plot = np.array(labels_for_plot,dtype = np.float32)
labels_for_plot[labels_for_plot==0] = np.nan
max_labels = np.max(labels)

/opt/anaconda3/envs/py37/lib/python3.7/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function ``watershed`` is deprecated and will be removed in version 0.19. Use ``skimage.segmentation.watershed`` instead.
  def watershed(image, markers=None, connectivity=1, offset=None, mask=None,
#Plot the segments      
plot_band_array(labels_for_plot,chm_array_metadata['extent'],
                'Crown Segmentation','Tree Crown Number',
                'Spectral',[0, max_labels])

plt.savefig(data_path+just_chm_file[0:-4]+'_Segmentation.png',
            dpi=300,orientation='landscape',
            bbox_inches='tight',pad_inches=0.1)

array2raster(data_path+'labels.tif',
             (chm_array_metadata['ext_dict']['xMin'],
              chm_array_metadata['ext_dict']['yMax']),
             1,-1,np.array(labels,dtype=float),32611)

png

Now we will get several properties of the individual trees will be used as predictor variables.

#Get the properties of each segment
tree_properties = regionprops(labels,chm_array)

Now we will get the predictor variables to match the (soon to be loaded) training data using the function defined above. The first column will be segment IDs, the rest will be the predictor variables.

predictors_chm = np.array([get_predictors(tree, chm_array, labels) for tree in tree_properties])
X = predictors_chm[:,1:]
tree_ids = predictors_chm[:,0]

Training data

We now bring in the training data file which is a simple CSV file with no header. The first column is biomass, and the remaining columns are the same predictor variables defined above. The tree diameter and max height are defined in the NEON vegetation structure data along with the tree DBH. The field validated values are used for training, while the other were determined from the CHM and camera images by manually delineating the tree crowns and pulling out the relevant information from the CHM.

Biomass was calculated from DBH according to the formulas in Jenkins et al. (2003).

If you didn't download this training dataset above, you can Download the training dataset CSV here.

#Define the file of training data  
training_data_file = data_path+'SJER_Biomass_Training.csv'

#Read in the training data from a CSV file
training_data = np.genfromtxt(training_data_file,delimiter=',') 

#Grab the biomass (Y) from the first line
biomass = training_data[:,0]

#Grab the biomass prdeictors from the remaining lines
biomass_predictors = training_data[:,1:12]
    

Random Forest classifiers

We can then define parameters of the Random Forest classifier and fit the predictor variables from the training data to the Biomass estaimtes.

#Define paraemters for Random forest regressor
max_depth = 30

#Define regressor rules
regr_rf = RandomForestRegressor(max_depth=max_depth, random_state=2)

#Fit the biomass to regressor variables
regr_rf.fit(biomass_predictors,biomass)

RandomForestRegressor(max_depth=30, random_state=2)

We now apply the Random Forest model to the predictor variables to retreive biomass

#Apply the model to the predictors
estimated_biomass = regr_rf.predict(X)

For outputting a raster, copy the labels raster to a biomass raster, then cycle through the segments and assign the biomass estimate to each individual tree segment.

#Set an out raster with the same size as the labels
biomass_map =  np.array((labels),dtype=float)
#Assign the appropriate biomass to the labels
biomass_map[biomass_map==0] = np.nan
for tree_id, biomass_of_tree_id in zip(tree_ids, estimated_biomass):
    biomass_map[biomass_map == tree_id] = biomass_of_tree_id  

Calc Biomass

Collect some of the biomass statistics and then plot the results and save an output geotiff.

#Get biomass stats for plotting
mean_biomass = np.mean(estimated_biomass)
std_biomass = np.std(estimated_biomass)
min_biomass = np.min(estimated_biomass)
sum_biomass = np.sum(estimated_biomass)

print('Sum of biomass is ',sum_biomass,' kg')

#Plot the biomass!
plt.figure(5)
plot_band_array(biomass_map,chm_array_metadata['extent'],
                'Biomass (kg)','Biomass (kg)',
                'winter',
                [min_biomass+std_biomass, mean_biomass+std_biomass*3])

plt.savefig(data_path+just_chm_file_split[0]+'_'+just_chm_file_split[1]+'_'+just_chm_file_split[2]+'_'+just_chm_file_split[3]+'_'+just_chm_file_split[4]+'_'+just_chm_file_split[5]+'_'+'Biomass.png',
            dpi=300,orientation='landscape',
            bbox_inches='tight',
            pad_inches=0.1)

array2raster(data_path+'biomass.tif',
             (chm_array_metadata['ext_dict']['xMin'],chm_array_metadata['ext_dict']['yMax']),
             1,-1,np.array(biomass_map,dtype=float),32611)

Sum of biomass is  6785587.567415567  kg

png


Get Lesson Code

calc-biomass_py.ipynb

Questions?

If you have questions or comments on this content, please contact us.

Contact Us
NEON Logo

Follow Us:

Join Our Newsletter

Get updates on events, opportunities, and how NEON is being used today.

Subscribe Now

Footer

  • My Account
  • About Us
  • Newsroom
  • Contact Us
  • Terms & Conditions
  • Careers

Copyright © Battelle, 2019-2020

The National Ecological Observatory Network is a major facility fully funded by the National Science Foundation.

Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the National Science Foundation.