Softmax Regression Background

Softmax Regression is a generalization of logistic regression used for multi-class classification where the classes are mutually exclusive. An example is classifying an image into four different classes such as cloud, water, asphalt, and vegetation.

A logistic regression model is the simplest form of a neural network. It consists of an input layer with multiple attributes and a bias unit, and only one output layer, or class. It is essentially a binary classifier. For each attribute, a weight (θ) is computed using a stochastic gradient descent function. An activation function determines if the attribute belongs to the output class or not.

The Softmax Regression algorithm applies binary logistic regression to multiple classes at once. Here is an example of the Softmax Regression model with five features and three classes:

The weights (θ) were omitted from the diagram for clarity, but weights are computed for each attribute-to-class mapping.

Softmax Regression differs from multiple binary logistic regression classifiers in the following way: Those classifiers are used with multiple classes when some objects naturally belong to more than one class. Examples include Green, Trees, and Grass. With a multiple binary classifier, objects might fall into both the Green and Trees class, or Green and Grass. It evaluates membership in a given class independently from membership in other classes.

Softmax Regression was designed for use with multiple classes that are mutually exclusive from each other. It does not evaluate class membership independently from other classes. An example is Vegetation, Bare Soil, and Impervious.

For each data item, Softmax Regression computes activation values for each output class. It normalizes the values to obtain a set of probabilities that sum to 1. The class with the highest probability wins, and the data item is assigned to that class. An ideal scenario is if one class has a probability close to 1 and the others are close to 0.

Reference: "Softmax Regression", updated March 2013. Unsupervised Feature Learning and Deep Learning. Accessed September 2016.