Keras softmax activation, category_crossentropy loss. But output is not 0, 1

Asked 24/8, 2017 at 5:37 Answered 15/5, 2019 at 14:19

I trained CNN model for just one epoch with very little data. I use Keras 2.05.

Here is the CNN model's (partial) last 2 layers, number_outputs = 201. Training data output is one hot encoded 201 output.

model.add(Dense(200, activation='relu', name='full_2'))
model.add(Dense(40, activation='relu',  name='full_3'))
model.add(Dense(number_outputs, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

The model is saved to a h5 file. Then, saved mode is loaded with same model as above. batch_image is an image file.

prediction = loaded_model.predict(batch_image, batch_size=1)

I get prediction like this:

ndarray: [[ 0.00498065  0.00497852  0.00498095  0.00496987  0.00497506  0.00496112
   0.00497585  0.00496474  0.00496769  0.0049708   0.00497027  0.00496049
   0.00496767  0.00498348  0.00497927  0.00497842  0.00497095  0.00496493
   0.00498282  0.00497441  0.00497477  0.00498019  0.00497417  0.00497654
   0.00498381  0.00497481  0.00497533  0.00497961  0.00498793  0.00496556
   0.0049665   0.00498809  0.00498689  0.00497886  0.00498933  0.00498056

Questions:

Prediction array should be 1, 0? Why do I get output like output activate as sigmoid, and loss is binary_crossentropy. What is wrong? I want to emphasize again, the model is not really trained well with data. It's almost just initialized with random weights.
If I don't train the network well (not converge yet), such as just initializing weights with random number, should the prediction still be 1, 0?
If I want to get the probability of prediction, and then, I decide how to interpret it, how to get the probability prediction output after the CNN is trained?

Pettifogging answered 24/8, 2017 at 5:37 Comment(1)

np.argmax(preds, axis=1) is your friend. – Ellata 6/3, 2018 at 21:52

Your number of output is 201 that is why your output comes as (1,201) and not as (1,0). You can easily get which class has the highest value just by using np.argmax and that class is the output for your given input by your model.

And for the fact even when you have trained for 1 epoch only, your model has learned something that may be very lame, but still, it learns something and based on that, it has predicted the output.

You have used softmax as your activation in the last layer. It normalizes your output in a non-linear fashion so that the sum of output for all classes is equals to 1. So the value you get for each class can be interpreted as the probability of that class as output for the given input by the model. (For more clarity, you can look into how softmax function works)

And lastly, each class has values like 0.0049 or similar because the model is not sure which class your input belongs to. So it calculates values for each class and then softmax normalizes it. That is why your output values are in the range 0 to 1.

For example, say I have four class so one of the probable output can be like [0.223 0.344 0.122 0.311] which in the end we look as a confidence score for each class. And by looking at confidence score for each class we can say the predicted class is 2 as it has the highest confidence score of 0.344.

Pushkin answered 24/8, 2017 at 6:25 Comment(3)

Thank you for your reply. Actually, I am trying to figure out why array value is not 1 or 0. For example, [0, 0, 1, .....0, 0], list length= 201 Now it's like 0.00498809. I know total output number is 201. But, value is not what I expected. – Pettifogging 24/8, 2017 at 7:36

this is because the model is not sure which class your input belongs to. So it calculates values for each class and then softmax normalizes it. That is why your output values are in the range 0 to 1. for example, say I have four class so one of the probable output can be like [0.223 0.344 0.122 0.311] which in the end we look as a confidence score for each class. And by looking at confidence score for each class we can say the predicted class is 2 as it has the highest confidence score of 0.344. – Pushkin 24/8, 2017 at 13:4

I got it! 0.005 (array item value) * 201 is about 1. The output is confidence score, and I have to decide how to interpret it. 0.005 is due to model is not trained yet. – Pettifogging 25/8, 2017 at 6:22

The output of a softmax layer is not 0 or 1. It is actually a normalized layer adding up to 1. If you do the sum of all your coefficient, they will add up. To get the prediction, you should get the one with the highest value. You can interpret them as probability even if there are not technically. https://en.wikipedia.org/wiki/Softmax_function for the definition.

This layer is used in the training process in order to be able to compare the prediction of a categorical classification and the true label.

It is required for the optimization because the optimization is done on derivable functions (having a gradient) and a 0,1 output would not be derivable (not even continuous). The optimization is done afterwards on all these values.

An interesting example is the following one: if your true target is [0 0 1 0] and your prediction output [0.1 0.1 0.6 0.2], even if the prediction is correct, it will still be able to learn, because it still give a non zero probabilty to the other classes, on which you can compute a gradient.

Butterworth answered 24/8, 2017 at 12:9 Comment(1)

Another question is little bit off original question. For sample [0 0 1 0], image 1 output is [0.1 0.1 0.6 0.2], image 2 output is [0.25 0.3 0.2 0.25]. Image 1 is image 'close' to ground truth image. Image 2 is not 'close' to ground truth image. Actually, I want CNN to give me such prediction, and I know model is not confident at all, I can throw 2nd prediction away. Take self driving car for example, image 1 is road. Image 2 is image the car is already on grass, which I did not train it to drive on grass yet, the car should stop. I know CNN is not give me good result. pls comment.thx – Pettifogging 25/8, 2017 at 17:57

In order to get the prediction output in form of class in stead of probability, use:

model.predict_classes(x_train,batch_size)

Federative answered 15/7, 2018 at 23:9 Comment(0)

My understanding is, Softmax says the likelihood of the value landing in that bucket out of the 201 buckets. With certainty of the first bucket you would get [1,0,0,0,0........]. Since very little training/learning/weight adjustment has occurred, the 201 values are all about 0.00497 which together sum to 1. A decent description on developers.Google of SoftMax here

The output was specified as 'number_outputs' so you get 201 outputs, each of which tell you the likelihood (as a value between 0 and 1) of your prediction being THAT output.

Reign answered 15/5, 2019 at 14:19 Comment(0)

Recommended topics

Hot tags