I wanted to add a few more things that may be confusing. The SparseCategoricalCrossentropy
has two arguments which are very important to specify. The first is from_logits; recall logits are the outputs of a network that HASN'T been normalized via a Softmax(or Sigmoid). The second is reduction
. It is normally set to 'auto'
, which computes the categorical cross-entropy as normal, which is the average of label*log(pred)
. But setting the value to 'none'
will actually give you each element of the categorical cross-entropy label*log(pred)
, which is of shape (batch_size)
. Computing a reduce_mean
on this list will give you the same result as with reduction='auto'
.
# Assuming TF2.x
import tensorflow as tf
model_predictions = tf.constant([[1,2], [3,4], [5,6], [7,8]], tf.float32)
labels_sparse = tf.constant([1, 0, 0, 1 ], tf.float32)
labels_dense = tf.constant([[1,0], [1,0], [1,0], [0,1]], tf.float32)
loss_obj_scc = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True,
reduction='auto'
)
loss_from_scc = loss_obj_scc(
labels_sparse,
model_predictions,
)
loss_obj_cc = tf.keras.losses.CategoricalCrossentropy(
from_logits=True,
reduction='auto'
)
loss_from_cc = loss_obj_cc(
labels_dense,
model_predictions,
)
print(loss_from_scc, loss_from_cc)
>> (<tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.0632616>)
# With `reduction='none'`
loss_obj_scc_red = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True,
reduction='none')
loss_from_scc_red = loss_obj_scc_red(
labels_sparse,
model_predictions,
)
print(loss_from_scc_red, tf.math.reduce_mean(loss_from_scc_red))
>> (<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.31326166, 1.3132616 ,
1.3132616 , 0.31326166], dtype=float32)>,
<tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>)