How is the categorical_crossentropy implemented in keras?
Asked Answered
T

2

11

I'm trying to apply the concept of distillation, basically to train a new smaller network to do the same as the original one but with less computation.

I have the softmax outputs for every sample instead of the logits.

My question is, how is the categorical cross entropy loss function implemented? Like it takes the maximum value of the original labels and multiply it with the corresponded predicted value in the same index, or it does the summation all over the logits (One Hot encoding) as the formula says:

enter image description here

Tresa answered 29/5, 2017 at 7:6 Comment(0)
C
9

I see that you used the tensorflow tag, so I guess this is the backend you are using?

def categorical_crossentropy(output, target, from_logits=False):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
    output: A tensor resulting from a softmax
        (unless `from_logits` is True, in which
        case `output` is expected to be the logits).
    target: A tensor of the same shape as `output`.
    from_logits: Boolean, whether `output` is the
        result of a softmax, or is a tensor of logits.
# Returns
    Output tensor.

This code comes from the keras source code. Looking directly at the code should answer all your questions :) If you need more info just ask !

EDIT :

Here is the code that interests you :

 # Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
    # scale preds so that the class probas of each sample sum to 1
    output /= tf.reduce_sum(output,
                            reduction_indices=len(output.get_shape()) - 1,
                            keep_dims=True)
    # manual computation of crossentropy
    epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
    output = tf.clip_by_value(output, epsilon, 1. - epsilon)
    return - tf.reduce_sum(target * tf.log(output),
                          reduction_indices=len(output.get_shape()) - 1)

If you look at the return, they sum it... :)

Craze answered 29/5, 2017 at 19:55 Comment(2)
Do you happen to know what the epsilon and tf.clip_by_value is doing?Unstuck
@Moondra: Most likely they exist for numerical stability. log(0) is undefined and numbers close to 0 become -inf, so you want to avoid that. Not sure about log(1) though?Assign
J
9

As an answer to "Do you happen to know what the epsilon and tf.clip_by_value is doing?",
it is ensuring that output != 0, because tf.log(0) returns a division by zero error.
(I don't have points to comment but thought I'd contribute)

Judiciary answered 12/3, 2019 at 2:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.