What is the meaning of the word logits in TensorFlow? [duplicate]

Asked 4/1, 2017 at 2:2 Answered 5/3, 2020 at 10:44

tensorflow machine-learning neural-network deep-learning cross-entropy

445

In the following TensorFlow function, we must feed the activation of artificial neurons in the final layer. That I understand. But I don't understand why it is called logits? Isn't that a mathematical function?

loss_function = tf.nn.softmax_cross_entropy_with_logits(
     logits = last_layer,
     labels = target_output
)

Bestial answered 4/1, 2017 at 2:2 Comment(2)

see this: stats.stackexchange.com/questions/52825/… – Hekate 1/12, 2020 at 21:5

comment edited; i'm still learning abou tthis. surprised nobody is mentioning log-odds from logistic regression. the term is shortened to 'logits' in wikipedia, and is the mathematical input to the statistical softmax function that ends neural networks. en.wikipedia.org/wiki/Logistic_regression#Logistic_model – Convexoconcave 9/9, 2021 at 16:6

417

Logits is an overloaded term which can mean many different things:

In Math, Logit is a function that maps probabilities ([0, 1]) to R ((-inf, inf))

Probability of 0.5 corresponds to a logit of 0. Negative logit correspond to probabilities less than 0.5, positive to > 0.5.

In ML, it can be

the vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class.

Logits also sometimes refer to the element-wise inverse of the sigmoid function.

Piccadilly answered 23/4, 2017 at 22:51 Comment(7)

For Tensorflow: It's a name that it is thought to imply that this Tensor is the quantity that is being mapped to probabilities by the Softmax. – Fleece 14/5, 2017 at 21:46

is this just the same as the thing that gets exponentiated before the softmax? i.e. softmax(logit) = exp(logit)/Z(logit) then logit = h_NN(x)? so logit is the same as "score"? – Hekate 22/1, 2018 at 19:29

Personal understanding, in TensorFlow domain, logits are the values to be used as input to softmax. I came to this understanding based on this tensorflow tutorial. – Outguess 26/1, 2018 at 8:37

I am not sure whether this answers the question. Maybe that is why it was never accepted. I understand what the logit function is, but it also puzzles my why Tensorflow calls these arguments logits. It is also the same designation for several of the parameters in Tensorflow's functions – Cinerator 29/1, 2018 at 3:1

Greate!Can you make a simple example? Is this right?[1, 0.5, 0.5] through normalization become [0.5, 0.25, 0.25] and then soft max become[0,] if one hot [1, 0, 0]? or just out put [1, 0, 0] cause the output should be a vector? – Blood 11/7, 2019 at 14:10

In TensorFlow, logit refers to the unscaled output of a layer, which can be input into any activation function, not just Softmax. This is illustrated by tf.nn.sigmoid_cross_entropy_with_logits. The with_logits suffix simply denotes that unscaled logit output should be passed, not the prediction output from the final layer activation function that is typically used by error functions such as tf.keras.losses.MSE or tf.keras.losses.CategoricalCrossentropy – Coaler 25/11, 2020 at 10:39

Example: the last layer of a NN returns a tensor k = [1,2,3,4,1,2,3] of logits, which classifies an input as belonging to each of 7 possible categories. Note that the fourth logit is the biggest, i.e., the fourth category is the most probable, and that it is four times the first one. k is fed to a softmax function that returns the probabilities [0.02,0.06, 0.18,0.48,0.02,0.06,0.18] that the input belongs to each category (softmax is a normalized exponential function). Note that the fourth probability is 24 times the first one (due to the exponential term in softmax) and that all sum 1. – Tundra 7/5, 2021 at 21:20

220

Just adding this clarification so that anyone who scrolls down this much can at least gets it right, since there are so many wrong answers upvoted.

Diansheng's answer and JakeJ's answer get it right.
A new answer posted by Shital Shah is an even better and more complete answer.

Yes, logit as a mathematical function in statistics, but the logit used in context of neural networks is different. Statistical logit doesn't even make any sense here.

I couldn't find a formal definition anywhere, but logit basically means:

The raw predictions which come out of the last layer of the neural network.
1. This is the very tensor on which you apply the argmax function to get the predicted class.
2. This is the very tensor which you feed into the softmax function to get the probabilities for the predicted classes.

Also, from a tutorial on official tensorflow website:

Logits Layer

The final layer in our neural network is the logits layer, which will return the raw values for our predictions. We create a dense layer with 10 neurons (one for each target class 0–9), with linear activation (the default):
logits = tf.layers.dense(inputs=dropout, units=10)

If you are still confused, the situation is like this:

raw_predictions = neural_net(input_layer)
predicted_class_index_by_raw = argmax(raw_predictions)
probabilities = softmax(raw_predictions)
predicted_class_index_by_prob = argmax(probabilities)

where, predicted_class_index_by_raw and predicted_class_index_by_prob will be equal.

Another name for raw_predictions in the above code is logit.

~~As for the why logit... I have no idea. Sorry.~~
[Edit: See this answer for the historical motivations behind the term.]

Trivia

Although, if you want to, you can apply statistical logit to probabilities that come out of the softmax function.

If the probability of a certain class is p,
Then the log-odds of that class is L = logit(p).

Also, the probability of that class can be recovered as p = sigmoid(L), using the sigmoid function.

Not very useful to calculate log-odds though.

Rik answered 24/5, 2018 at 14:19 Comment(0)

157

Summary

In context of deep learning the logits layer means the layer that feeds in to softmax (or other such normalization). The output of the softmax are the probabilities for the classification task and its input is logits layer. The logits layer typically produces values from -infinity to +infinity and the softmax layer transforms it to values from 0 to 1.

Historical Context

Where does this term comes from? In 1930s and 40s, several people were trying to adapt linear regression to the problem of predicting probabilities. However linear regression produces output from -infinity to +infinity while for probabilities our desired output is 0 to 1. One way to do this is by somehow mapping the probabilities 0 to 1 to -infinity to +infinity and then use linear regression as usual. One such mapping is cumulative normal distribution that was used by Chester Ittner Bliss in 1934 and he called this "probit" model, short for "probability unit". However this function is computationally expensive while lacking some of the desirable properties for multi-class classification. In 1944 Joseph Berkson used the function log(p/(1-p)) to do this mapping and called it logit, short for "logistic unit". The term logistic regression derived from this as well.

The Confusion

Unfortunately the term logits is abused in deep learning. From pure mathematical perspective logit is a function that performs above mapping. In deep learning people started calling the layer "logits layer" that feeds in to logit function. Then people started calling the output values of this layer "logit" creating the confusion with logit the function.

TensorFlow Code

Unfortunately TensorFlow code further adds in to confusion by names like tf.nn.softmax_cross_entropy_with_logits. What does logits mean here? It just means the input of the function is supposed to be the output of last neuron layer as described above. The _with_logits suffix is redundant, confusing and pointless. Functions should be named without regards to such very specific contexts because they are simply mathematical operations that can be performed on values derived from many other domains. In fact TensorFlow has another similar function sparse_softmax_cross_entropy where they fortunately forgot to add _with_logits suffix creating inconsistency and adding in to confusion. PyTorch on the other hand simply names its function without these kind of suffixes.

Reference

The Logit/Probit lecture slides is one of the best resource to understand logit. I have also updated Wikipedia article with some of above information.

Gaffer answered 31/8, 2018 at 8:11 Comment(1)

"From pure mathematical perspective logit is a function that performs above mapping." This section is wrong. It's common in statistics to call the logit of a probability itself the "logits". that feeds in to logit function the SoftMax function isn't the logit function, but its inverse, the (multinomial) logistic function. – Solitaire 25/3, 2021 at 19:40

Logit is a function that maps probabilities [0, 1] to [-inf, +inf].

Softmax is a function that maps [-inf, +inf] to [0, 1] similar as Sigmoid. But Softmax also normalizes the sum of the values(output vector) to be 1.

Tensorflow "with logit": It means that you are applying a softmax function to logit numbers to normalize it. The input_vector/logit is not normalized and can scale from [-inf, inf].

This normalization is used for multiclass classification problems. And for multilabel classification problems sigmoid normalization is used i.e. tf.nn.sigmoid_cross_entropy_with_logits

Almsgiver answered 17/12, 2017 at 6:54 Comment(2)

so logit is the same as the "score" – Hekate 22/1, 2018 at 19:29

I suggest adding a line in your answer explicitly differentiating Logit function (statistics) and logits layer (tensorflow) – Rik 24/5, 2018 at 14:40

Personal understanding, in TensorFlow domain, logits are the values to be used as input to softmax. I came to this understanding based on this tensorflow tutorial.

https://www.tensorflow.org/tutorials/layers

Although it is true that logit is a function in maths(especially in statistics), I don't think that's the same 'logit' you are looking at. In the book Deep Learning by Ian Goodfellow, he mentioned,

The function σ⁻¹(x) is called the logit in statistics, but this term is more rarely used in machine learning. σ⁻¹(x) stands for the inverse function of logistic sigmoid function.

In TensorFlow, it is frequently seen as the name of last layer. In Chapter 10 of the book Hands-on Machine Learning with Scikit-learn and TensorFLow by Aurélien Géron, I came across this paragraph, which stated logits layer clearly.

note that logits is the output of the neural network before going through the softmax activation function: for optimization reasons, we will handle the softmax computation later.

That is to say, although we use softmax as the activation function in the last layer in our design, for ease of computation, we take out logits separately. This is because it is more efficient to calculate softmax and cross-entropy loss together. Remember that cross-entropy is a cost function, not used in forward propagation.

Outguess answered 30/10, 2017 at 8:34 Comment(0)

(FOMOsapiens).

If you check math Logit function, it converts real space from [0,1] interval to infinity [-inf, inf].

Sigmoid and softmax will do exactly the opposite thing. They will convert the [-inf, inf] real space to [0, 1] real space.

This is why, in machine learning we may use logit before sigmoid and softmax function (since they match).

And this is why "we may call" anything in machine learning that goes in front of sigmoid or softmax function the logit.

Here is G. Hinton video using this term.

Venality answered 27/6, 2019 at 11:1 Comment(0)

Here is a concise answer for future readers. Tensorflow's logit is defined as the output of a neuron without applying activation function:

logit = w*x + b,

x: input, w: weight, b: bias. That's it.

The following is irrelevant to this question.

For historical lectures, read other answers. Hats off to Tensorflow's "creatively" confusing naming convention. In PyTorch, there is only one CrossEntropyLoss and it accepts un-activated outputs. Convolutions, matrix multiplications and activations are same level operations. The design is much more modular and less confusing. This is one of the reasons why I switched from Tensorflow to PyTorch.

Petry answered 7/9, 2018 at 13:50 Comment(1)

This might be correct but completely misses any reference/source of the information. – Golda 13/6, 2023 at 10:42

logits

The vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class.

In addition, logits sometimes refer to the element-wise inverse of the sigmoid function. For more information, see tf.nn.sigmoid_cross_entropy_with_logits.

official tensorflow documentation

Insole answered 5/3, 2020 at 10:44 Comment(0)

They are basically the fullest learned model you can get from the network, before it's been squashed down to apply to only the number of classes we are interested in. Check out how some researchers use them to train a shallow neural net based on what a deep network has learned: https://arxiv.org/pdf/1312.6184.pdf

It's kind of like how when learning a subject in detail, you will learn a great many minor points, but then when teaching a student, you will try to compress it to the simplest case. If the student now tried to teach, it'd be quite difficult, but would be able to describe it just well enough to use the language.

Klos answered 14/11, 2017 at 5:51 Comment(0)

The logit (/ˈloʊdʒɪt/ LOH-jit) function is the inverse of the sigmoidal "logistic" function or logistic transform used in mathematics, especially in statistics. When the function's variable represents a probability p, the logit function gives the log-odds, or the logarithm of the odds p/(1 − p).

See here: https://en.wikipedia.org/wiki/Logit

Oratory answered 27/10, 2017 at 5:2 Comment(1)

That's in statistics/maths. We are talking machine learning here, where logit has different meaning. See this, this, this. – Rik 16/6, 2018 at 19:54

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Logits Layer

Trivia

logits

Recommended topics

Hot tags