from_logits=True and from_logits=False get different training result for tf.losses.CategoricalCrossentropy for UNet
Asked Answered
R

5

21

I am doing the image semantic segmentation job with unet, if I set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) The training will not converge even for only one training image.

But if I do not set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) The training will converge for one training image.

My groundtruth dataset is generated like this:

X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
    mask = cv2.imread(spath, 0)
    seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))

Why? Is there something wrong for my usage?

This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.

Rubrician answered 29/7, 2019 at 12:24 Comment(4)
Calculate the pixel-wise outputs and loss for the single image from both models. The losses should be the same.Aerator
Are you using channels_first or channels_last?Adriannaadrianne
Are your pathes exclusive? (Only one path is correct per pixel?)Adriannaadrianne
channels_last. yes, pathes is exclusive (ground truth is one-hot).@Daniel MöllerRubrician
G
16

Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False option.

You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.

Greed answered 1/8, 2019 at 8:10 Comment(2)
Yes, it is very likely that the numerical stability plays a role here. This also has been mentioned in the source code documentation: Note: Using from_logits=True may be more numerically stable..Citizenship
AFAIK Keras handles this by using an epsilon, which can turn off very-badly classified points.Aerator
K
8

By default, all of the loss function implemented in Tensorflow for classification problem uses from_logits=False. Remember in case of classification problem, at the end of the prediction, usually one wants to produce output in terms of probabilities.

Just look at the image below, the last layer of the network(just before softmax function)

enter image description here

So the sequence is Neural Network ⇒ Last layer output ⇒ Softmax or Sigmoid function ⇒ Probability of each class.

For example in the case of a multi-class classification problem, where output can be y1, y2, ....... yn one wants to produce each output with some probability. (see the output layer). Now, this output layer will get compared in cross-entropy loss function with the true label.

Let us take an example where our network produced the output for the classification task. Assume your Neural Network is producing output, then you convert that output into probabilities using softmax function and calculate loss using a cross-entropy loss function

# output produced by the last layer of NN
nn_output_before_softmax = [3.2, 1.3, 0.2, 0.8]

# converting output of last layer of NN into probabilities by applying softmax
nn_output_after_softmax = tf.nn.softmax(nn_output_before_softmax)

# output converted into softmax after appling softmax
print(nn_output_after_softmax.numpy())
[0.77514964 0.11593805 0.03859243 0.07031998]

y_true = [1.0, 0.0, 0.0, 0.0]

Now there are two scenarios:

  1. One is explicitly using the softmax (or sigmoid) function

  2. One is not using softmax function separately and wants to include in the calculation of loss function

1) One is explicitly using the softmax (or sigmoid) function

When one is explicitly using softmax (or sigmoid) function, then, for the classification task, then there is a default option in TensorFlow loss function i.e. from_logits=False. So here TensorFlow is assuming that whatever the input that you will be feeding to the loss function are the probabilities, so no need to apply the softmax function.

# By default from_logits=False
loss_taking_prob = tf.keras.losses.CategoricalCrossentropy(from_logits=False) 

loss_1 = loss_taking_prob(y_true, nn_output_after_softmax)
print(loss_1)
tf.Tensor(0.25469932, shape=(), dtype=float32)

2) One is not using the softmax function separately and wants to include it in the calculation of the loss function. This means that whatever inputs you are providing to the loss function is not scaled (means inputs are just the number from -inf to +inf and not the probabilities). Here you are letting TensorFlow perform the softmax operation for you.

loss_taking_logits = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

loss_2 = loss_taking_logits(y_true, nn_output_before_softmax)
print(loss_2)
tf.Tensor(0.2546992, shape=(), dtype=float32)

Please do remember that you using from_logits=False when it should be True leads to taking softmax of probabilities and producing incorrect model

Kinesiology answered 5/3, 2022 at 18:47 Comment(0)
B
5

from_logits = True signifies the values of the loss obtained by the model are not normalized and is basically used when we don't have any softmax function in our model. For e.g. https://www.tensorflow.org/tutorials/generative/dcgan in this model they have not used a softmax activation function or in other words we can say it helps in numerical stability.

Bor answered 26/10, 2020 at 6:43 Comment(1)
What I understand is that, input of softmax is known as logits, and out of softmax is multinomial probability. Therefore from_logits=True means the values from output layers are not passed through softmax function (not normalized as you said) and rather treat them as real values, not the probabilities. Am I correct?Paperhanger
J
0

I guess the problem comes from the softmax activation function. Looking at the doc I found that sotmax is applied to the last axis by default. Can you look at model.summary() and check if that is what you want ?

Jannelle answered 31/7, 2019 at 10:4 Comment(1)
From his code it looks like he is stacking binary images along the images channel dimension. Is what CategoricalCrossEntropy would expect.Aerator
A
0

For softmax to work properly, you must make sure that:

  • You are using 'channels_last' as Keras default channel config.

    • This means the shapes in the model will be like (None, height, width, channels)
    • This seems to be your case because you are putting n_classes in the last axis. But it's also strange because you are using Conv2D and your output Y should be (1, height, width, n_classes) and not that strange shape you are using.
  • Your Y has only zeros and ones (not 0 and 255 as usually happens to images)

    • Check that Y.max() == 1 and Y.min() == 0
    • You may need to have Y = Y / 255.
  • Only one class is correct (your data does not have more than one path/channel with value = 1).

    • Check that (Y.sum(axis=-1) == 1).all() is True
Alterant answered 3/8, 2019 at 2:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.