In this tutorial, we will calculate the biomass for a section of the SJER site. We will be using the Canopy Height Model discrete LiDAR data product as well as field data collected by the terrestrial observation group at NEON.
After completing this tutorial, you will be able to:
- Learn how to apply a guassian smoothing fernal for high-frequency spatial filtering
- Apply a watershed segmentation algorithm for delineating tree crowns
- Calculate biomass predictor variables from a CHM
- Setup training data for Biomass predictions
- Apply a Random Forest machine learning approach to calculate biomass
Install Python Packages
If you have already downloaded the data set for the Data Institute, you have the data for this tutorial within the SJER directory. If you would like to just download the data for this tutorial use the following link.
In this tutorial, we will calculate the biomass for a section of the SJER site. We will be using the Canopy Height Model discrete LiDAR data product as well as field data collected by the TOS group at NEON. This tutorial will calculate Biomass for individual trees in the forest.
The calculation of biomass consists of four primary steps:
- Delineating individual tree crowns
- Calculating predictor variables for all individuals
- Collecting training data
- Applying a regression model to estiamte biomass from predictors
In this tutorial we will use a watershed segmentation algorithm for delineating tree crowns (step 1) and and a Random Forest (RF) machine learning algorithm for relating the predictor variables to biomass (part 4). The predictor variables were selected following suggestions by Gleason et al. (2012) and biomass estimates were determined from DBH (diamter at breast height) measurements following relationships given in Jenkins et al. (2003).
First, we will import several of the typical libraries.
import numpy as np import os import gdal, osr import matplotlib.pyplot as plt import sys import matplotlib.pyplot as plt from scipy import ndimage as ndi %matplotlib inline
Next we will add libraries from skilearn which will help with the watershed delineation, determination of predictor variables and random forest algorithm.
#Import biomass specific libraries from skimage.morphology import watershed from skimage.feature import peak_local_max from skimage.measure import regionprops from sklearn.ensemble import RandomForestRegressor
Define a function that will allow us to plot our spatial data.
#Define plot band array function def plot_band_array(band_array,image_extent,title,cmap_title,colormap,colormap_limits): plt.imshow(band_array,extent=image_extent) cbar = plt.colorbar(); plt.set_cmap(colormap); plt.clim(colormap_limits) cbar.set_label(cmap_title,rotation=270,labelpad=20) plt.title(title); ax = plt.gca() ax.ticklabel_format(useOffset=False, style='plain') rotatexlabels = plt.setp(ax.get_xticklabels(),rotation=90)
Define a function that will allow us to output GeoTIFF files.
def array2raster(newRasterfn,rasterOrigin,pixelWidth,pixelHeight,array,epsg): cols = array.shape rows = array.shape originX = rasterOrigin originY = rasterOrigin driver = gdal.GetDriverByName('GTiff') outRaster = driver.Create(newRasterfn, cols, rows, 1, gdal.GDT_Float32) outRaster.SetGeoTransform((originX, pixelWidth, 0, originY, 0, pixelHeight)) outband = outRaster.GetRasterBand(1) outband.WriteArray(array) outRasterSRS = osr.SpatialReference() outRasterSRS.ImportFromEPSG(epsg) outRaster.SetProjection(outRasterSRS.ExportToWkt()) outband.FlushCache()
Now we will define the file path to our CHM file.
chm_file = 'C:/RSDI_2017/Day4/Biomass/NEON_D17_SJER_DP3_256000_4106000_CHM.tif'
We will want to output the results with the same file information as the input, so we will gather the file name information.
#Get info from chm file for outputting results just_chm_file = os.path.basename(chm_file) just_chm_file_split = just_chm_file.split(sep="_")
#Open the CHM file with GDAL chm_dataset = gdal.Open(chm_file) #Get the raster band object chm_raster = chm_dataset.GetRasterBand(1) #Get the NO DATA value noDataVal_chm = chm_raster.GetNoDataValue() #Get required metadata from CHM file cols_chm = chm_dataset.RasterXSize rows_chm = chm_dataset.RasterYSize bands_chm = chm_dataset.RasterCount mapinfo_chm =chm_dataset.GetGeoTransform() xMin = mapinfo_chm yMax = mapinfo_chm xMax = xMin + chm_dataset.RasterXSize/mapinfo_chm yMin = yMax + chm_dataset.RasterYSize/mapinfo_chm image_extent = (xMin,xMax,yMin,yMax)
Now we will get the CHM data, plot it and save the figure.
#Plot the original CHM plt.figure(1) chm_array = chm_raster.ReadAsArray(0,0,cols_chm,rows_chm).astype(np.float) #PLot the CHM figure plot_band_array(chm_array,image_extent,'Canopy height Model','Canopy height (m)','Greens',[0, 9]) plt.savefig(just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+'CHM.png',dpi=300,orientation='landscape',bbox_inches='tight',pad_inches=0.1)
Now we will run a Gaussian smoothing kernal (convolution) across the dataset to remove spurious high vegetation points. This will help ensure we are finding the treetops properly before running the watershed segmentation algorithm. For different forest types it may be necessary to change the input parameters. Information on the function can be found at SciPy Docs.
Of most importance are the second and fourth inputs. The second input defines the standard deviation of the Gaussian smoothing kernal. Too large a value will apply too much smoothing, to small and some spurious high points may be left behind. The truncate value controls after how many standard deviations the Gaussian kernal will get cut off (since it theoretically goes to infinity).
#Smooth the CHM using a gaussian filter to remove spurious points chm_array_smooth = ndi.gaussian_filter(chm_array,2,mode='constant',cval=0,truncate=2.0) chm_array_smooth[chm_array==0] = 0
Now save a copy of filtered CHM.
#Save the smoothed CHM array2raster('C:/RSDI_2017/Day4/Biomass/chm_filter.tif',(xMin,yMax),1,-1,np.array(chm_array_smooth/10000,dtype=float),32611)
Now we will run an algorithm to determine local maximums within the image. Setting indices to 'False' returns a raster of the maximum points, as opposed to a list of coordinates. The footprint parameter is an area where only a single peak can be found. This should be approximately the size of the smallest tree. Information on more sophisticated methods to define the window can be found in Chen (2006).
#Calculate local maximum points in the smoothed CHM local_maxi = peak_local_max(chm_array_smooth,indices=False, footprint=np.ones((5, 5)))
Plot the raster of local maximums. The following figure shows the difference in finding local maximums for a filtered vs. non-filtered CHM.
#Plot the local maximums plt.figure(2) plot_band_array(local_maxi,image_extent,'Maximum','Maxi','Greys',[0, 1]) plt.savefig(just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+'Maximums.png',dpi=300,orientation='landscape',bbox_inches='tight',pad_inches=0.1)
Apply labels to all of the local maximum points
#Identify all the maximum points markers = ndi.label(local_maxi)
Next we will create a mask layer of all of the vegettion points so that the watershed segmentation will only occur on the trees and not extend into the surrounding ground points. Since 0 represent ground points in the CHM, setting the mask to 1 where the CHM is not zero will define the mask.
#Create a CHM mask so the segmentation will only occur on the trees chm_mask = chm_array_smooth chm_mask[chm_array_smooth != 0] = 1
Next we will perform the watershed segmentation, which produces a raster of labels.
#Perfrom watershed segmentation labels = watershed(chm_array_smooth, markers, mask=chm_mask)
Now we will get several properties of the individual trees which are used as predictor variables.
#Get the properties of each segment tree_properties = regionprops(labels,chm_array, ['Area','BoundingBox','Centroid','Orientation','MajorAxisLength','MinorAxisLength','MaxIntensity','MinIntensity'])
It was found that occasionally the segmenting skippen an integer number. We want to be able to match our segments to the trees in later steps, so we will create an array with only the segment numbers used.
#Determine how many individual trees were identified max_labels = labels.max() segment_labels = np.zeros(max_labels+1) segment_id = np.zeros(max_labels+1) for counter in range (1,max_labels+1): segment_labels[counter] = len(labels[labels==counter]) segment_id[counter]=counter #Remove the non-zero elements segment_id = segment_id[np.nonzero(segment_labels)]
Next we will save the segments as a GeoTIFF and plot them.
#Change the lebels to flow and plot them and save as raster labels = np.array((labels),dtype=float) plt.figure(3) array2raster('C:/RSDI_2017/Day4/Biomass/SegmentedData.tif',(xMin,yMax),1,-1,labels,32611) #Change the zero labels to nans so they won't show up in the plot labels[labels==0] = np.nan #Plot the segments plot_band_array(labels,image_extent,'Crown Segmentation','Tree Crown Number','Spectral',[0, max_labels]) plt.savefig(just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+'Segmentation.png',dpi=300,orientation='landscape',bbox_inches='tight',pad_inches=0.1)
Now we will define the predictor variables and begin to fill out their values.
#Define several of the predictor variables area=np.zeros(len(tree_properties)) diameter=np.zeros(len(tree_properties)) max_tree_height=np.zeros(len(tree_properties)) min_tree_height=np.zeros(len(tree_properties)) #Retreive the predictor variables from the region properties for counter in range(0,len(tree_properties)): area[counter] = tree_properties[counter]['Area'] diameter[counter] = tree_properties[counter]['MajorAxisLength'] max_tree_height[counter] = tree_properties[counter]['MaxIntensity'] min_tree_height[counter] = tree_properties[counter]['MinIntensity']
Now we will define the remaining predictor variables
#Define the remaining predictor variables crown_geometric_volume_full=np.zeros(len(segment_id)) crown_geometric_volume_50th_percentile=np.zeros(len(segment_id)) crown_geometric_volume_60th_percentile=np.zeros(len(segment_id)) crown_geometric_volume_70th_percentile=np.zeros(len(segment_id)) percentile_50th=np.zeros(len(segment_id)) percentile_60th=np.zeros(len(segment_id)) percentile_70th=np.zeros(len(segment_id))
We will now run through a loop of all tree segments and gather the remaining predictor variables which include height percentiles and crown geometric volume percentiles. Inside the loop, we use logical indexing to retrieve each individual tree. We then calculate our predictor variables of interest.
#Cycle through all of the tree segments counter=0 for segment in segment_id: #Pull out the tree of interest indexes_of_tree = np.asarray(np.where(labels==segment)).T tree_data = chm_array[indexes_of_tree[:,0],indexes_of_tree[:,1]] #Calculate the geometric volume crown_geometric_volume_full[counter]=np.sum([tree_data-np.min(tree_data)]) #Pull out 50th percentile stats percentile_50th[counter]=np.percentile(tree_data,50) tree_data_50th = chm_array[indexes_of_tree[:,0],indexes_of_tree[:,1]] tree_data_50th[tree_data_50th>percentile_50th[counter]] = percentile_50th[counter] crown_geometric_volume_50th_percentile[counter]=np.sum([tree_data_50th-min_tree_height[counter]]) #Pull out 60th percentile stats percentile_60th[counter]=np.percentile(tree_data,60) tree_data_60th = chm_array[indexes_of_tree[:,0],indexes_of_tree[:,1]] tree_data_60th[tree_data_60th>percentile_60th[counter]] = percentile_60th[counter] crown_geometric_volume_60th_percentile[counter]=np.sum([tree_data_60th-min_tree_height[counter]]) #Pull out 60th percentile stats percentile_70th[counter]=np.percentile(tree_data,70) tree_data_70th = chm_array[indexes_of_tree[:,0],indexes_of_tree[:,1]] tree_data_70th[tree_data_70th>percentile_70th[counter]] = percentile_70th[counter] crown_geometric_volume_70th_percentile[counter]=np.sum([tree_data_70th-min_tree_height[counter]]) counter=counter+1
We now bring in the training data file which is a simple .csv file with no header. The first column is biomass, and the remaining columns are the same predictor variables defined above. The tree diameter and max height were defined in the TOS data along with the DBH. The field validated values are used for training, while the other were determined from the CHM and camera images by manually delineating the tree crowns and pulling out the relevant information from the CHM. Biomass was calculated from DBH accordaing to the formulas in Jenkins et al. (2003).
#Define the file of training data training_data_file = 'C:/RSDI_2017/Day4/Biomass/training/SJER_Biomass_Training.csv' #Read in the training data from a CSV file training_data = np.genfromtxt(training_data_file,delimiter=',') #Grab the biomass (Y) from the first line biomass = training_data[:,0] #Grab the biomass prdeictors from the remaining lines biomass_predictors = training_data[:,1:12]
We then define paraemters of the Random Forest classifier and fit the predictor variables from the training data to the Biomass estaimtes.
#Define paraemters for Random forest regressor max_depth = 30 #Define regressor rules regr_rf = RandomForestRegressor(max_depth=max_depth, random_state=2) #Fit the biomass to regressor variables regr_rf.fit(biomass_predictors,biomass)
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=30, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, oob_score=False, random_state=2, verbose=0, warm_start=False)
Now we will gather the predictor variables gathered from all the segmented trees into a single array.
#Stack the predictor variables for all the individual trees all_training_data = np.stack([area,diameter,max_tree_height,min_tree_height,percentile_50th,percentile_60th,percentile_70th,crown_geometric_volume_full,crown_geometric_volume_50th_percentile,crown_geometric_volume_60th_percentile,crown_geometric_volume_70th_percentile],axis=-1)
We know apply the Random Forest model to the predictor variables to retrieve biomass.
#Apply the model to the pred_biomass = regr_rf.predict(all_training_data)
For outputting a raster, copy the labels raster to a biomass raster, then cycle through the segments and assign the biomass estaimte to each individual tree segment.
#Set an out raster with the same size as the labels biomass_out = labels #Set counter to zero counter = 0 #Assign each tree by the associated biomass for segment in segment_id: biomass_out[biomass_out==segment] = pred_biomass[counter] counter = counter+1
Collect some of the biomass statistics and then plot the results and save an output GeoTIFF.
#Get biomass stats for plotting mean_biomass = np.mean(pred_biomass) std_biomass = np.std(pred_biomass) min_biomass = np.min(pred_biomass) sum_biomass = np.sum(pred_biomass) print('Sum of biomass is ',sum_biomass,' kg') #Plot the biomass! plt.figure(5) plot_band_array(biomass_out,image_extent,'Biomass (kg)','Biomass (kg)','winter',[min_biomass+std_biomass, mean_biomass+std_biomass*3]) plt.savefig(just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+just_chm_file_split+'_'+'Biomass.png',dpi=300,orientation='landscape',bbox_inches='tight',pad_inches=0.1) array2raster('biomass.tif',(xMin,yMax),1,-1,np.array(biomass_out,dtype=float),32611)
Sum of biomass is 6978251.34548 kg
If you have questions or comments on this content, please contact us.Contact Us