How does TensorFlow SparseCategoricalCrossentropy work?

Asked 17/1, 2020 at 13:0 Answered 23/1, 2020 at 6:12

Solved tensorflow machine-learning deep-learning loss-function cross-entropy

I'm trying to understand this loss function in TensorFlow but I don't get it. It's SparseCategoricalCrossentropy. All other loss functions need outputs and labels of the same shape, this specific loss function doesn't.

Source code:

import tensorflow as tf;

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 1,    1,    1,    2   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32)
);
print("Loss:", Loss.numpy());

The error is:

InvalidArgumentError: Received a label value of 2 which is outside the valid range of [0, 2).  
Label values: 1 1 1 2 [Op:SparseSoftmaxCrossEntropyWithLogits]

How to provide proper params to the loss function SparseCategoricalCrossentropy?

Gloat answered 17/1, 2020 at 13:0 Comment(0)

SparseCategoricalCrossentropy and CategoricalCrossentropy both compute categorical cross-entropy. The only difference is in how the targets/labels should be encoded.

When using SparseCategoricalCrossentropy the targets are represented by the index of the category (starting from 0). Your outputs have shape 4x2, which means you have two categories. Therefore, the targets should be a 4 dimensional vector with entries that are either 0 or 1. For example:

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 0,    0,    0,    1   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

This in contrast to CategoricalCrossentropy where the labels should be one-hot encoded:

cce = tf.keras.losses.CategoricalCrossentropy();
Loss = cce(
  tf.constant([ [1,0]    [1,0],    [1, 0],   [0, 1]   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

SparseCategoricalCrossentropy is more efficient when you have a lot of categories.

Jimmyjimsonweed answered 17/1, 2020 at 13:21 Comment(2)

And I should use the softmax activation function un the last layer, in the same way than categoricalCrossentropy? – Chary 27/3, 2021 at 9:32

@AralRoca Based on the example on the tensorflow page, if you set from_logits=True then you don't need to specify the activation of the last layer (tensorflow.org/tutorials/images/…). It shouldn't matter but it makes it more numerically stable (https://mcmap.net/q/454794/-from_logits-true-and-from_logits-false-get-different-training-result-for-tf-losses-categoricalcrossentropy-for-unet) – Decentralize 1/4, 2021 at 15:23

I wanted to add a few more things that may be confusing. The SparseCategoricalCrossentropy has two arguments which are very important to specify. The first is from_logits; recall logits are the outputs of a network that HASN'T been normalized via a Softmax(or Sigmoid). The second is reduction. It is normally set to 'auto', which computes the categorical cross-entropy as normal, which is the average of label*log(pred). But setting the value to 'none' will actually give you each element of the categorical cross-entropy label*log(pred), which is of shape (batch_size). Computing a reduce_mean on this list will give you the same result as with reduction='auto'.

# Assuming TF2.x
import tensorflow as tf

model_predictions = tf.constant([[1,2], [3,4], [5,6], [7,8]], tf.float32)
labels_sparse = tf.constant([1, 0, 0, 1 ], tf.float32)
labels_dense = tf.constant([[1,0], [1,0], [1,0], [0,1]], tf.float32)

loss_obj_scc = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_scc = loss_obj_scc(
    labels_sparse,
    model_predictions,
  )


loss_obj_cc = tf.keras.losses.CategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_cc = loss_obj_cc(
    labels_dense,
    model_predictions,
  )


print(loss_from_scc, loss_from_cc)
>> (<tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>,
 <tf.Tensor: shape=(), dtype=float32, numpy=1.0632616>)

# With `reduction='none'`
loss_obj_scc_red = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='none')

loss_from_scc_red = loss_obj_scc_red(
    labels_sparse,
    model_predictions,
  )

print(loss_from_scc_red, tf.math.reduce_mean(loss_from_scc_red))

>> (<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.31326166, 1.3132616 , 
1.3132616 , 0.31326166], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>)

Les answered 23/1, 2020 at 6:12 Comment(1)

That third sentence was the reason my NN wasn't working! I've look everywhere and this was the only place that clarified that clearly! +1 – Choctaw 1/10, 2020 at 22:33

Recommended topics

Hot tags