What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

Asked 19/5, 2016 at 1:15 Answered 24/4, 2017 at 0:10

Solved neural-network tensorflow softmax cross-entropy

126

I recently came across tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits.

Is the only difference that training vectors y have to be one-hot encoded when using sparse_softmax_cross_entropy_with_logits?

Reading the API, I was unable to find any other difference compared to softmax_cross_entropy_with_logits. But why do we need the extra function then?

Shouldn't softmax_cross_entropy_with_logits produce the same results as sparse_softmax_cross_entropy_with_logits, if it is supplied with one-hot encoded training data/vectors?

Gadoid answered 19/5, 2016 at 1:15 Comment(2)

I'm interested in seeing a comparison of their performance if both can be used (e.g. with exclusive image labels); I'd expect the sparse version to be more efficient, at least memory-wise. – Bicorn 7/6, 2017 at 20:17

See also this question, which discusses all cross-entropy functions in tensorflow (turns out there are lots of them). – Uproar 11/11, 2017 at 15:26

187

Having two different functions is a convenience, as they produce the same result.

The difference is simple:

For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.

Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.

Another tiny difference is that with sparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss 0 on this label.

Lava answered 19/5, 2016 at 8:3 Comment(3)

Is the -1 correct? As the documentation reads: "Each entry in labels must be an index in [0, num_classes). Other values will raise an exception when this op is run on CPU, and return NaN for corresponding loss and gradient rows on GPU." – Paymar 13/8, 2017 at 5:32

[0, num_classes) = [0, num_classes-1] – Estabrook 10/3, 2019 at 18:47

Is this statement correct? "Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits." Is it backwards? Isn't the sparse loss function the one with int of 0, so isn't the sparse one the one-hot version? – Wickiup 22/9, 2020 at 2:15

I would just like to add 2 things to accepted answer that you can also find in TF documentation.

First:

tf.nn.softmax_cross_entropy_with_logits

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

Second:

tf.nn.sparse_softmax_cross_entropy_with_logits

NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry).

Ronnyronsard answered 29/6, 2016 at 13:57 Comment(3)

What should we use if the classes are not mutually exclusive. I mean if we're combining multiple categorical labels? – Fibrin 23/2, 2017 at 3:3

I also read this. So it means we apply the class probability on the cross entropy rather than taking it as a onehot vector. – Bul 20/3, 2017 at 7:26

@Fibrin - Do you mean you are unable to do one hot encoding? I think you would have to look at a different model. This mentioned something like " it would be more appropriate to build 4 binary logistic regression classifiers" To first make sure you can separate the classes. – Rabia 19/5, 2017 at 14:50

Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding.

You can verify this by running the following program:

import tensorflow as tf
from random import randint

dims = 8
pos  = randint(0, dims - 1)

logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32)
labels = tf.one_hot(pos, dims)

res1 = tf.nn.softmax_cross_entropy_with_logits(       logits=logits, labels=labels)
res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos))

with tf.Session() as sess:
    a, b = sess.run([res1, res2])
    print a, b
    print a == b

Here I create a random logits vector of length dims and generate one-hot encoded labels (where element in pos is 1 and others are 0).

After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output

Vistula answered 24/4, 2017 at 0:10 Comment(0)

Recommended topics

Hot tags