I understand that PyTorch's LogSoftmax function is basically just a more numerically stable way to compute Log(Softmax(x))
. Softmax lets you convert the output from a Linear layer into a categorical probability distribution.
The pytorch documentation says that CrossEntropyLoss combines nn.LogSoftmax()
and nn.NLLLoss()
in one single class.
Looking at NLLLoss
, I'm still confused...Are there 2 logs being used? I think of negative log as information content of an event. (As in entropy)
After a bit more looking, I think that NLLLoss
assumes that you're actually passing in log probabilities instead of just probabilities. Is this correct? It's kind of weird if so...