Implementation of softmax function returns nan for high inputs

Asked 26/2, 2019 at 7:28 Answered 26/2, 2019 at 8:47

enter image description here

I am trying to implement softmax at the end of cnn, The output I got is nan and zeros. I am giving high input values to softmax around 10-20k I'm giving an array of X=[2345,3456,6543,-6789,-9234]

My function is

def softmax (X):
    B=np.exp(X)
    C=np.sum(np.exp(X))
    return B/C

I am getting error of true divide and run time error

C:\Anaconda\envs\deep_learning\lib\site-packages\ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in true_divide
  after removing the cwd from sys.path.

Affirm answered 26/2, 2019 at 7:28 Comment(1)

Possible duplicate of How to implement the Softmax function in Python – Balmuth 26/2, 2019 at 8:11

According to softmax function, you need to iterate all elements in the array and compute the exponential for each individual element then divide it by the sum of the exponential of the all elements:

import numpy as np

a = [1,3,5]
for i in a:
    print np.exp(i)/np.sum(np.exp(a))

0.015876239976466765
0.11731042782619837
0.8668133321973349

However if the numbers are too big the exponents will probably blow up (computer can not handle such big numbers):

a = [2345,3456,6543]
for i in a:
    print np.exp(i)/np.sum(np.exp(a))

__main__:2: RuntimeWarning: invalid value encountered in double_scalars
nan
nan
nan

To avoid this, first shift the highest value in array to zero. Then compute the softmax. For example, to compute the softmax of [1, 3, 5] use [1-5, 3-5, 5-5] which is [-4, -2, 0]. Also you may choose the implement it in vectorized way (as you intendet to do in question):

def softmax(x):
    f = np.exp(x - np.max(x))  # shift values
    return f / f.sum(axis=0)

softmax([1,3,5])
# prints: array([0.01587624, 0.11731043, 0.86681333])

softmax([2345,3456,6543,-6789,-9234])
# prints: array([0., 0., 1., 0., 0.])

For detailed information check out the cs231n course page. The Practical issues: Numeric stability. heading is exactly what I'm trying to explain.

Staciestack answered 26/2, 2019 at 8:47 Comment(0)

In case of applying softmax on a large numbers, you can try using max normalization:

import numpy as np

def softmax (x):
    B=np.exp(x)
    C=np.sum(np.exp(x))
    return B/C

arr = np.array([1,2,3,4,5])

softmax(arr)
# array([0.01165623, 0.03168492, 0.08612854, 0.23412166, 0.63640865])

softmax(arr - max(arr))
# array([0.01165623, 0.03168492, 0.08612854, 0.23412166, 0.63640865])

As you can see, this does not affect the result of softmax. Applying this on your softmax:

def softmax(x):
    B = np.exp(x - max(x))
    C = np.sum(B)
    return B/C
op_arr = np.array([2345,3456,6543,-6789,-9234])
softmax(op_arr)
# array([0., 0., 1., 0., 0.])

Farrica answered 26/2, 2019 at 8:15 Comment(0)

When I run the same code, I get:

RuntimeWarning: overflow encountered in exp
RuntimeWarning: overflow encountered in exp
RuntimeWarning: invalid value encountered in true_divide

This is not very surprising since e^(6543) is around 0.39 * 10^2842 probably causing an overflow in the following operations.

To do : normalize your data before giving it to softmax: could you divide it by 1000 before giving it to softmax, so that, instead of having input in [-20000,20000], you would have an input as floats in [-20, 20].

Peregrine answered 26/2, 2019 at 8:5 Comment(0)

Recommended topics

Hot tags