In the deep learning implementations related to object detection and semantic segmentation, I have seen the output layers using either sigmoid or softmax. I am not very clear when to use which? It seems to me both of them can support these tasks. Are there any guidelines for this choice?
softmax()
helps when you want a probability distribution, which sums up to 1. sigmoid
is used when you want the output to be ranging from 0 to 1, but need not sum to 1.
In your case, you wish to classify and choose between two alternatives. I would recommend using softmax()
as you will get a probability distribution which you can apply cross entropy loss function on.
The sigmoid and the softmax function have different purposes. For a detailed explanation of when to use sigmoid vs. softmax in neural network design, you can look at this article: "Classification: Sigmoid vs. Softmax."
Short summary:
If you have a multi-label classification problem where there is more than one "right answer" (the outputs are NOT mutually exclusive) then you can use a sigmoid function on each raw output independently. The sigmoid will allow you to have high probability for all of your classes, some of them, or none of them.
If you instead have a multi-class classification problem where there is only one "right answer" (the outputs are mutually exclusive), then use a softmax function. The softmax will enforce that the sum of the probabilities of your output classes are equal to one, so in order to increase the probability of a particular class, your model must correspondingly decrease the probability of at least one of the other classes.
Object detection is object classification used on a sliding window in the image. In classification it is important to find the correct output in some class space. E.g. you detect 10 different objects and you want to know which object is the most likely one in there. Then softmax is good because of its proberty that the whole layer sums up to 1.
Semantic segmentation on the other hand segments the image in some way. I have done semantic medical segmentation and there the output is a binary image. This means you can have sigmoid as output to predict if this pixel belongs to this specific class, because sigmoid values are between 0 and 1 for each output class.
In general Softmax is used (Softmax Classifier) when ‘n’ number of classes are there. Sigmoid or softmax both can be used for binary (n=2) classification.
Sigmoid: S(x) = 1/ ( 1+ ( e^(-x) ))
Softmax:
σ(x)j = e / **Σ**{k=1 to K} e^zk for(j=1.....K)
Softmax is kind of Multi Class Sigmoid, but if you see the function of Softmax, the sum of all softmax units are supposed to be 1. In sigmoid it’s not really necessary.
Digging deep, you can also use sigmoid for multi-class classification. When you use a softmax, basically you get a probability of each class, (join distribution and a multinomial likelihood) whose sum is bound to be one. In case you use sigmoid for multi class classification, it’d be like a marginal distribution and a Bernoulli likelihood, p(y0/x) , p(y1/x) etc
© 2022 - 2024 — McMap. All rights reserved.