Cross Entropy in PyTorch

Asked 20/3, 2018 at 17:39 Answered 22/2, 2021 at 16:46

Solved python machine-learning pytorch loss

Cross entropy formula:

But why does the following give loss = 0.7437 instead of loss = 0 (since 1*log(1) = 0)?

import torch
import torch.nn as nn
from torch.autograd import Variable

output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1)
target = Variable(torch.LongTensor([3]))

criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)

Haleyhalf answered 20/3, 2018 at 17:39 Comment(1)

Just increase the the output tensor to: output = Variable(torch.FloatTensor([0,0,0,100])).view(1, -1) and you get your 0. – Ephemerid 11/12, 2021 at 21:8

104

In your example you are treating output [0, 0, 0, 1] as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to 1, and need to be first converted into probabilities for which it uses the softmax function.

So H(p, q) becomes:

H(p, softmax(output))

Translating the output [0, 0, 0, 1] into probabilities:

softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754]

whence:

-log(0.4754) = 0.7437

Bellman answered 15/4, 2018 at 8:25 Comment(3)

Great answer, but is there any loss functions available in pytorch that does compute the loss like what's described in the original question? – Bellow 13/12, 2019 at 2:29

@KevinLing Yes! NLLLoss does the same thing as CrossEntropyLoss without the softmax. – Hartzog 2/10, 2023 at 3:16

isn't it H(p, log_softmax(output))? Thank you. – Dinerman 5/11, 2023 at 19:15

Your understanding is correct but pytorch doesn't compute cross entropy in that way. Pytorch uses the following formula.

loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
               = -x[class] + log(\sum_j exp(x[j]))

Since, in your scenario, x = [0, 0, 0, 1] and class = 3, if you evaluate the above expression, you would get:

loss(x, class) = -1 + log(exp(0) + exp(0) + exp(0) + exp(1))
               = 0.7437

Pytorch considers natural logarithm.

Dugan answered 21/3, 2018 at 6:13 Comment(2)

Thank you for your answer, this was very helpful to me! So there is no way then reaching zero with CE loss? – Haleyhalf 21/3, 2018 at 9:7

@Haleyhalf you can use NLLLoss instead. – Hartzog 2/10, 2023 at 3:19

I would like to add an important note, as this often leads to confusion.

Softmax is not a loss function, nor is it really an activation function. It has a very specific task: It is used for multi-class classification to normalize the scores for the given classes. By doing so we get probabilities for each class that sum up to 1.

Softmax is combined with Cross-Entropy-Loss to calculate the loss of a model.

Unfortunately, because this combination is so common, it is often abbreviated. Some are using the term Softmax-Loss, whereas PyTorch calls it only Cross-Entropy-Loss.

Ferullo answered 14/12, 2018 at 3:39 Comment(0)

The combination of nn.LogSoftmax and nn.NLLLoss is equivalent to using nn.CrossEntropyLoss. This terminology is a particularity of PyTorch, as the nn.NLLoss [sic] computes, in fact, the cross entropy but with log probability predictions as inputs where nn.CrossEntropyLoss takes scores (sometimes called logits). Technically, nn.NLLLoss is the cross entropy between the Dirac distribution, putting all mass on the target, and the predicted distribution given by the log probability inputs.

Deep Learning with PyTorch

PyTorch's CrossEntropyLoss expects unbounded scores (interpretable as logits / log-odds) as input, not probabilities (as the CE is traditionally defined).

Mushy answered 22/2, 2021 at 16:46 Comment(0)

Recommended topics

Hot tags