log loss output is greater than 1
Asked Answered
E

1

25

I prepared several models for binary classification of documents in the fraud field. I calculated the log loss for all models. I thought it was essentially measuring the confidence of the predictions and that log loss should be in the range of [0-1]. I believe it is an important measure in classification when the outcome - determining the class is not sufficient for evaluation purposes. So if two models have acc, recall and precision that are quite close but one has a lower log-loss function it should be selected given there are no other parameters/metrics (such as time, cost) in the decision process.

The log loss for the decision tree is 1.57, for all other models it is in the 0-1 range. How do I interpret this score?

Eire answered 26/1, 2016 at 12:25 Comment(0)
Z
63

It's important to remember log loss does not have an upper bound. Log loss exists on the range [0, ∞)

From Kaggle we can find a formula for log loss.

Log Loss

In which yij is 1 for the correct class and 0 for other classes and pij is the probability assigned for that class.

If we look at the case where the average log loss exceeds 1, it is when log(pij) < -1 when i is the true class. This means that the predicted probability for that given class would be less than exp(-1) or around 0.368. So, seeing a log loss greater than one can be expected in the case that your model only gives less than a 36% probability estimate for the actual class.

We can also see this by plotting the log loss given various probability estimates.

Log Loss curve

Zerk answered 26/1, 2016 at 13:41 Comment(1)
very nice answer, especially the pictureKindliness

© 2022 - 2024 — McMap. All rights reserved.