## Support Vector Machine Background for Feature Extraction

Support Vector Machine (SVM) is a supervised classification method derived from statistical learning theory that often yields good classification results from complex and noisy data. It separates the classes with a decision surface that maximizes the margin between the classes. The surface is often called the optimal hyperplane, and the data points closest to the hyperplane are called support vectors. The support vectors are the critical elements of the training set.

You can adapt SVM to become a nonlinear classifier through the use of nonlinear kernels. While SVM is a binary classifier in its simplest form, it can function as a multiclass classifier by combining several binary SVM classifiers (creating a binary classifier for each possible pair of classes). ENVI’s implementation of SVM uses the pairwise classification strategy for multiclass classification.

SVM includes a penalty parameter that allows a certain degree of misclassification, which is particularly important for non-separable training sets. The penalty parameter controls the trade-off between allowing training errors and forcing rigid margins. It creates a soft margin that permits some misclassifications, such as it allows some training points on the wrong side of the hyperplane. Increasing the value of the penalty parameter increases the cost of misclassifying points and forces the creation of a more accurate model that may not generalize well.

1. Select the Kernel Type from the drop-down list. Depending on the option you select, additional fields may appear. All of these options are different ways of mathematically representing a kernel function, which is a function that gives the weights of nearby data points in estimating target classes. See References below for more information. The Radial Basis kernel type (default) works well in most cases.
2. The mathematical representation of each kernel is listed below:

 Linear K(xi,xj) = xiTxj Polynomial K(xi,xj) = (gxiTxj + r)d, g > 0 RBF K(xi,xj) = exp(-g||xi - xj||2), g > 0 Sigmoid K(xi,xj) = tanh(gxiTxj + r)

where:

g is the gamma term in the kernel function for all kernel types except linear.

d is the polynomial degree term in the kernel function for the polynomial kernel.

r is the bias term in the kernel function for the polynomial and sigmoid kernels.

3. If the Kernel Type is Polynomial, set the Degree of Kernel Polynomial to specify the degree used for the SVM classification (the d term used in the above kernel functions). The minimum value is 1, and the maximum value is 6. The default value is 2. Increasing this parameter more accurately delineates the boundary between classes. A value of 1 represents a first-degree polynomial function, which is essentially a straight line between two classes. A value of 1 works well when you have two very distinctive classes. In most cases, however, you will be working with imagery that has a high degree of variation and mixed pixels. Increasing the polynomial value causes the algorithm to more accurately follow the contours between classes, but you risk fitting the classification to noise.
4. If the Kernel Type is Polynomial or Sigmoid, specify the Bias in Kernel Function for the kernel to use in the SVM algorithm. The default value is 1.00. This is the "r" term used in the above kernel functions.
5. If the kernel type is Polynomial, Radial Basis Function, or Sigmoid, use the Gamma in Kernel Function field to set the gamma parameter used in the kernel function (the "g" term used in the above kernel functions). This value is a floating-point value greater than 0.01. The default is the inverse of the number of computed attributes.
6. Specify the Penalty Parameter for the SVM algorithm to use. This value is a floating-point value greater than 0.01. The default value is 100.0. The penalty parameter allows a certain degree of misclassification, which is particularly important for non-separable training sets. It lets you control the trade-off between allowing training errors and forcing rigid margins. Increasing this value also increases the cost of misclassifying points and creates a more accurate model that may not generalize well.
7. Use the Threshold slider to indicate your level of confidence that the closest segments of any given segment (in the segmentation image) represent the same class as that segment. Higher values mean more confidence, so only the nearest segments will be classified. As you increase the value of the Threshold slider, the Preview Window will show more unclassified segments. Lower values mean that you are unsure if the closest neighbors represent the same class, so more distant segments will be classified. As you decrease the value of the Threshold slider, the Preview Window will show fewer unclassified segments.

References

Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1-27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Hsu, C.-W., C.-C. Chang, and C.-J. Lin. (2010). A practical guide to support vector classification. National Taiwan University. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.

Wu, T.-F., C.-J. Lin, and R. C. Weng. (2004). Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 5:975-1005, http://www.csie.ntu.edu.tw/~cjlin/papers/svmprob/svmprob.pdf.