how to interpret the "soft" and "max" in the SoftMax regression?

Asked 4/6, 2015 at 7:45 Answered 4/6, 2015 at 16:13

I know the form of the softmax regression, but I am curious about why it has such a name? Or just for some historical reasons?

London answered 4/6, 2015 at 7:45 Comment(0)

The maximum of two numbers max(x,y) could have sharp corners / steep edges which sometimes is an unwanted property (e.g. if you want to compute gradients).

To soften the edges of max(x,y), one can use a variant with softer edges: the softmax function. It's still a max function at its core (well, to be precise it's an approximation of it) but smoothed out.

If it's still unclear, here's a good read.

Alfieri answered 4/6, 2015 at 16:13 Comment(0)

Let's say you have a set of scalars xi and you want to calculate a weighted sum of them, giving a weight wi to each xi such that the weights sum up to 1 (like a discrete probability). One way to do it is to set wi=exp(a*xi) for some positive constant a, and then normalize the weights to one. If a=0 you get just a regular sample average. On the other hand, for a very large value of a you get max operator, that is the weighted sum will be just the largest xi. Therefore, varying the value of a gives you a "soft", or a continues way to go from regular averaging to selecting the max. The functional form of this weighted average should look familiar to you if you already know what a SoftMax regression is.

Chore answered 4/6, 2015 at 9:55 Comment(0)

Recommended topics

Hot tags