Tutorial
Classification of Hyperspectral Data with Support Vector Machine (SVM) Using SciKit in Python
Authors: Paul Gader
Last Updated: Apr 1, 2021
In this tutorial, we will learn to classify spectral data using the Support Vector Machine (SVM) method.
Objectives
After completing this tutorial, you will be able to:
- Classify spectral remote sensing data using Support Vector Machine (SVM).
Install Python Packages
- numpy
- gdal
- matplotlib
- matplotlib.pyplot
Download Data
Download the spectral classification teaching data subset Download DatasetAdditional Materials
This tutorial was prepared in conjunction with a presentation on spectral classification that can be downloaded.
Download Dr. Paul Gader's Classification 1 PPT Download Dr. Paul Gader's Classification 2 PPT Download Dr. Paul Gader's Classification 3 PPTimport numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy import linalg
from scipy import io
from sklearn import linear_model as lmd
Note that you will need to update these filepaths according to your local machine.
InFile1 = '/Users/olearyd/Git/data/RSDI2017-Data-SpecClass/LinSepC1.mat'
InFile2 = '/Users/olearyd/Git/data/RSDI2017-Data-SpecClass/LinSepC2.mat'
C1Dict = io.loadmat(InFile1)
C2Dict = io.loadmat(InFile2)
C1 = C1Dict['LinSepC1']
C2 = C2Dict['LinSepC2']
NSampsClass = 200
NSamps = 2*NSampsClass
### Set Target Outputs ###
TargetOutputs = np.ones((NSamps,1))
TargetOutputs[NSampsClass:NSamps] = -TargetOutputs[NSampsClass:NSamps]
AllSamps = np.concatenate((C1,C2),axis=0)
AllSamps.shape
(400, 2)
#import sklearn
#sklearn.__version__
LinMod = lmd.LinearRegression.fit?
LinMod = lmd.LinearRegression.fit
LinMod = lmd.LinearRegression.fit
LinMod = lmd.LinearRegression.fit
LinMod = lmd.LinearRegression.fit
M = lmd.LinearRegression()
print(M)
LinearRegression()
LinMod = lmd.LinearRegression.fit(M, AllSamps, TargetOutputs, sample_weight=None)
R = lmd.LinearRegression.score(LinMod, AllSamps, TargetOutputs, sample_weight=None)
print(R)
0.9112691769822485
LinMod
LinearRegression()
w = LinMod.coef_
w
array([[0.81592447, 0.94178188]])
w0 = LinMod.intercept_
w0
array([-0.01663028])
### Question: How would we compute the outputs of the regression model?
Kernels
Now well use support vector models (SVM) for classification.
from sklearn.svm import SVC
### SVC wants a 1d array, not a column vector
Targets = np.ravel(TargetOutputs)
InitSVM = SVC()
InitSVM
SVC()
TrainedSVM = InitSVM.fit(AllSamps, Targets)
y = TrainedSVM.predict(AllSamps)
plt.figure(1)
plt.plot(y)
plt.show()
d = TrainedSVM.decision_function(AllSamps)
plt.figure(1)
plt.plot(d)
plt.show()
Include Outliers
We can also try it with outliers.
Let's start by looking at some spectra.
### Look at some Pine and Oak spectra from
### NEON Site D03 Ordway-Swisher Biological Station
### at UF
### Pinus palustris
### Quercus virginiana
InFile1 = '/Users/olearyd/Git/data/RSDI2017-Data-SpecClass/Pines.mat'
InFile2 = '/Users/olearyd/Git/data/RSDI2017-Data-SpecClass/Oaks.mat'
C1Dict = io.loadmat(InFile1)
C2Dict = io.loadmat(InFile2)
Pines = C1Dict['Pines']
Oaks = C2Dict['Oaks']
WvFile = '/Users/olearyd/Git/data/RSDI2017-Data-SpecClass/NEONWvsNBB.mat'
WvDict = io.loadmat(WvFile)
Wv = WvDict['NEONWvsNBB']
Pines.shape
(809, 346)
Oaks.shape
(1731, 346)
NBands=Wv.shape[0]
print(NBands)
346
Notice that these training sets are unbalanced.
NTrainSampsClass = 600
NTestSampsClass = 200
Targets = np.ones((1200,1))
Targets[range(600)] = -Targets[range(600)]
Targets = np.ravel(Targets)
print(Targets.shape)
(1200,)
plt.figure(111)
plt.plot(Targets)
plt.show()
TrainPines = Pines[0:600,:]
TrainOaks = Oaks[0:600,:]
#TrainSet = np.concatenate?
TrainSet = np.concatenate((TrainPines, TrainOaks), axis=0)
print(TrainSet.shape)
(1200, 346)
plt.figure(3)
### Plot Pine Training Spectra ###
plt.subplot(121)
plt.plot(Wv, TrainPines.T)
plt.ylim((0.0,0.8))
plt.xlim((Wv[1], Wv[NBands-1]))
### Plot Oak Training Spectra ###
plt.subplot(122)
plt.plot(Wv, TrainOaks.T)
plt.ylim((0.0,0.8))
plt.xlim((Wv[1], Wv[NBands-1]))
plt.show()
InitSVM= SVC()
TrainedSVM=InitSVM.fit(TrainSet, Targets)
d = TrainedSVM.decision_function(TrainSet)
print(d)
[-0.26050536 -0.45009774 -0.4508219 ... 1.70930028 1.79781222
1.66711708]
plt.figure(4)
plt.plot(d)
plt.show()
Does this seem to be too good to be true?
TestPines = Pines[600:800,:]
TestOaks = Oaks[600:800,:]
TestSet = np.concatenate((TestPines, TestOaks), axis=0)
print(TestSet.shape)
(400, 346)
dtest = TrainedSVM.decision_function(TestSet)
plt.figure(5)
plt.plot(dtest)
plt.show()
Yeah, too good to be true...What can we do?
Error Analysis
Error analysis can be used to identify characteristics of errors.
You could try different Magic Numbers using Cross Validation, etc. Stay tuned for a tutorial on this topic.