Calculating Cross Entropy in TensorFlow
Asked Answered
P

3

9

I am having a hard time with calculating cross entropy in tensorflow. In particular, I am using the function:

tf.nn.softmax_cross_entropy_with_logits()

Using what is seemingly simple code, I can only get it to return a zero

import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

a = tf.placeholder(tf.float32, shape =[None, 1])
b = tf.placeholder(tf.float32, shape = [None, 1])
sess.run(tf.global_variables_initializer())
c = tf.nn.softmax_cross_entropy_with_logits(
    logits=b, labels=a
).eval(feed_dict={b:np.array([[0.45]]), a:np.array([[0.2]])})
print c

returns

0

My understanding of cross entropy is as follows:

H(p,q) = p(x)*log(q(x))

Where p(x) is the true probability of event x and q(x) is the predicted probability of event x.

There if input any two numbers for p(x) and q(x) are used such that

0<p(x)<1 AND 0<q(x)<1

there should be a nonzero cross entropy. I am expecting that I am using tensorflow incorrectly. Thanks in advance for any help.

Pelops answered 1/3, 2017 at 0:58 Comment(1)
So, interestingly I got the idea of using cross entropy from this project: github.com/carpedm20/DCGAN-tensorflow/blob/master/model.py they are using it to identify whether or not a sample comes from a real distribution. However, it seems that a binary softmax regression is the same as a logistic regression.Pelops
C
14

Like they say, you can't spell "softmax_cross_entropy_with_logits" without "softmax". Softmax of [0.45] is [1], and log(1) is 0.

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

If using exclusive labels (wherein one and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

logits and labels must have the same shape [batch_size, num_classes] and the same dtype (either float16, float32, or float64).

Cnemis answered 1/3, 2017 at 1:49 Comment(3)
Aha! So it seems my problems are caused by a misunderstanding of softmax! Thank you for your help!Pelops
@DavidKaftan, if this solves your problem, it would be nice to mark this as accepted answer. :)Cnemis
Thanks! I'm (obviously) new here!Pelops
P
20

In addition to Don's answer (+1), this answer written by mrry may interest you, as it gives the formula to calculate the cross entropy in TensorFlow:

An alternative way to write:

xent = tf.nn.softmax_cross_entropy_with_logits(logits, labels)

...would be:

softmax = tf.nn.softmax(logits)
xent = -tf.reduce_sum(labels * tf.log(softmax), 1)

However, this alternative would be (i) less numerically stable (since the softmax may compute much larger values) and (ii) less efficient (since some redundant computation would happen in the backprop). For real uses, we recommend that you use tf.nn.softmax_cross_entropy_with_logits().

Pillar answered 1/3, 2017 at 1:58 Comment(1)
Thank you for the (no-softmax) cross-entropy formulaFussy
C
14

Like they say, you can't spell "softmax_cross_entropy_with_logits" without "softmax". Softmax of [0.45] is [1], and log(1) is 0.

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

If using exclusive labels (wherein one and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

logits and labels must have the same shape [batch_size, num_classes] and the same dtype (either float16, float32, or float64).

Cnemis answered 1/3, 2017 at 1:49 Comment(3)
Aha! So it seems my problems are caused by a misunderstanding of softmax! Thank you for your help!Pelops
@DavidKaftan, if this solves your problem, it would be nice to mark this as accepted answer. :)Cnemis
Thanks! I'm (obviously) new here!Pelops
S
2

Here is an implementation in Tensorflow 2.0 in case somebody else (me probably) needs it in the future.

@tf.function
def cross_entropy(x, y, epsilon = 1e-9):
    return -2 * tf.reduce_mean(y * tf.math.log(x + epsilon), -1) / tf.math.log(2.)

x = tf.constant([
    [1.0,0],
    [0.5,0.5],
    [.75,.25]
    ]
,dtype=tf.float32)

with tf.GradientTape() as tape:
    tape.watch(x)
    y = entropy(x, x)

tf.print(y)
tf.print(tape.gradient(y, x))

Output

[-0 1 0.811278105]
[[-1.44269502 29.8973541]
 [-0.442695022 -0.442695022]
 [-1.02765751 0.557305]]
Steels answered 8/9, 2020 at 4:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.