Softmax function of a numpy array by row

Asked 8/4, 2017 at 4:20 Answered 7/3, 2018 at 20:29

python arrays numpy neural-network softmax

I am trying to apply a softmax function to a numpy array. But I am not getting the desired results. This is the code I have tried:

 import numpy as np
 x = np.array([[1001,1002],[3,4]])
 softmax = np.exp(x - np.max(x))/(np.sum(np.exp(x - np.max(x)))
 print softmax

I think the x - np.max(x) code is not subtracting the max of each row. The max needs to be subtracted from x to prevent very large numbers.

This is supposed to output

 np.array([
    [0.26894142, 0.73105858],
    [0.26894142, 0.73105858]])

But I am getting:

np.array([
    [0.26894142, 0.73105858],
    [0, 0]])

Brammer answered 8/4, 2017 at 4:20 Comment(0)

A convenient way to keep the axes that are consumed by "reduce" operations such as max or sum is the keepdims keyword:

mx = np.max(x, axis=-1, keepdims=True)
mx
# array([[1002],
#        [   4]])
x - mx
# array([[-1,  0],
#        [-1,  0]])
numerator = np.exp(x - mx)
denominator = np.sum(numerator, axis=-1, keepdims=True)
denominator
# array([[ 1.36787944],
#        [ 1.36787944]])
numerator/denominator
# array([[ 0.26894142,  0.73105858],
         [ 0.26894142,  0.73105858]])

Trachytic answered 8/4, 2017 at 4:42 Comment(0)

My 5-liner (which uses scipy logsumexp for the tricky bits):

def softmax(a, axis=None):
    """
    Computes exp(a)/sumexp(a); relies on scipy logsumexp implementation.
    :param a: ndarray/tensor
    :param axis: axis to sum over; default (None) sums over everything
    """
    from scipy.special import logsumexp
    lse = logsumexp(a, axis=axis)  # this reduces along axis
    if axis is not None:
        lse = np.expand_dims(lse, axis)  # restore that axis for subtraction
    return np.exp(a - lse)

You may have to use from scipy.misc import logsumexp if you have an older scipy version.

Amylaceous answered 7/3, 2018 at 20:29 Comment(1)

Just beautiful. – Borrego 28/8, 2021 at 4:43

EDIT. As of version 1.2.0, scipy includes softmax as a special function:

https://scipy.github.io/devdocs/generated/scipy.special.softmax.html

I wrote a very general softmax function operating over an arbitrary axis, including the tricky max subtraction bit. The function is below, and I wrote a blog post about it here.

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p

Waynewayolle answered 19/4, 2017 at 15:49 Comment(0)

The x - np.max(x) code is not doing row-wise subtraction. Let's do it step-wise. First we will make a 'maxes' array by tiling or making a copy of the column:

maxes = np.tile(np.max(x,1), (2,1)).T

This will create a 2X2 matrix which will correspond to the maxes for each row by making a duplicate column(tile). After this you can do:

 x = np.exp(x - maxes)/(np.sum(np.exp(x - maxes), axis = 1))

You should get your result with this. The axis = 1 is for the row-wise softmax you mentioned in the heading of your answer. Hope this helps.

Brammer answered 8/4, 2017 at 4:26 Comment(0)

How about this?

For taking max along the rows just specify the argument as axis=1 and then convert the result as a column vector(but a 2D array actually) using np.newaxis/None.

In [40]: x
Out[40]: 
array([[1001, 1002],
       [   3,    4]])

In [41]: z = x - np.max(x, axis=1)[:, np.newaxis]

In [42]: z
Out[42]: 
array([[-1,  0],
       [-1,  0]])

In [44]: softmax = np.exp(z) / np.sum(np.exp(z), axis=1)[:, np.newaxis]

In [45]: softmax
Out[45]: 
array([[ 0.26894142,  0.73105858],
       [ 0.26894142,  0.73105858]])

In the last step, again when you take sum just specify the argument axis=1 to sum it along the rows.

Wells answered 8/4, 2017 at 17:43 Comment(3)

You have to do the [:, np.newaxis] thing in the softmax line (44), too. With the given example you happen to get the right result, but that's essentially coincidence. (It works because the two row sums happen to have the same value, so it doesn't matter which way they are broadcast.) Try for example x = [[1001, 1002], [1, 4]] instead to get a wrong result. Or x = [[1001, 1002, 1003], [2, 3, 4]] to get an outright error. – Trachytic 8/4, 2017 at 23:37

@PaulPanzer Danke Schön! What's the best way to notice such bugs? It was so subtle for my understanding of NumPy – Wells 10/4, 2017 at 12:41

Don't use square arrays in your toy examples ;-] Seriously, that catches at least half of them for me. – Trachytic 10/4, 2017 at 13:33

Recommended topics

Hot tags