Does a Neural Network with Sigmoid Activation use Thresholds?

N

3

5

I'm a tad confused here. I just started on the subject of Neural Networks and the first one I constructed used the Step-Activation with thresholds on each neuron. Now I wan't to implement the sigmoid activation but it seems that this type of activation doesn't use thresholds, only weights between the neurons. But in the information I find about this there is word of thresholds, only I can't find where they should be in the activation function.

Are thresholds used in a sigmoid activation function in neural networks?

Nystrom answered 13/9, 2012 at 11:43 Comment(1)

It seems to me that the information you've found refers to the bias signal as a threshold. By default, the sigmoid's threshold could be considered to be 0 (i.e. where y=0.5, as mentioned in Kendall Frey's answer). The bias is a constant input to a neuron which effectively shifts the sigmoid function on the x-axis. – Garlinda 13/9, 2012 at 12:48

K

5

There is no discrete jump as in step activation. The threshold could be considered to be the point where the sigmoid function is 0.5. Some sigmoid functions will have this at 0, while some will have it set to a different 'threshold'.

The step function may be thought of as a version of the sigmoid function that has the steepness set to infinity. There is an obvious threshold in this case, and for less steep sigmoid functions, the threshold could be considered to be where the function's value is 0.5, or the point of maximum steepness.

Knife answered 13/9, 2012 at 12:18 Comment(3)

should I edit the steepness factor while changing the weights in the learning process? – Nystrom 13/9, 2012 at 12:23

If you do, you should do it slower on the order of magnitude than you do the weight changing. – Heirdom 13/9, 2012 at 12:25

Thresholding can also be considered as a tool for multi-class or binary classification tasks which you may want to manipulate precision or recall. – Monikamoniker 13/10, 2020 at 15:2

R

5

Sigmoid function's value is in the range [0;1], 0.5 is taken as a threshold, if h(theta) < 0.5 we assume that it's value is 0, if h(theta) >= 0.5 then it's 1.

Thresholds are used only on the output layer of the network and it's only when classifying. So, if you're trying to classify between 4 classes, then the output layer has 4 nodes y = [y1,y2,y3,y4], you'll use this threshold to assign y[i] 1 or 0.

Reinforce answered 13/9, 2012 at 11:56 Comment(2)

Is it always "optimal" to use 0.5 as threshold or should the thresholds be learned on some validation data? In my test 0.5 works better than learning optimal thresholds (taking the middle between average positive and negative scores). But I am not sure if this is always the case. – Hilburn 24/9, 2018 at 9:9

I think you don't need the sigmoid at all if you threshold at 0.5 on the output layer. Suppose the sigmoid is denoted as g(x). Then g(h(x)) <= 0.5 is the same as h(x) <= 0. – Anticholinergic 21/6 at 12:20

K

5

There is no discrete jump as in step activation. The threshold could be considered to be the point where the sigmoid function is 0.5. Some sigmoid functions will have this at 0, while some will have it set to a different 'threshold'.

The step function may be thought of as a version of the sigmoid function that has the steepness set to infinity. There is an obvious threshold in this case, and for less steep sigmoid functions, the threshold could be considered to be where the function's value is 0.5, or the point of maximum steepness.

Knife answered 13/9, 2012 at 12:18 Comment(3)

should I edit the steepness factor while changing the weights in the learning process? – Nystrom 13/9, 2012 at 12:23

If you do, you should do it slower on the order of magnitude than you do the weight changing. – Heirdom 13/9, 2012 at 12:25

Thresholding can also be considered as a tool for multi-class or binary classification tasks which you may want to manipulate precision or recall. – Monikamoniker 13/10, 2020 at 15:2

H

1

It doesn't need to. Sigmoid curve itself partially can act as a threshold.

Heirdom answered 13/9, 2012 at 12:17 Comment(4)

so basically every neuron is the same? just with other connections to other neurons? – Nystrom 13/9, 2012 at 12:22

It can be. And the network can still learn and do whathever you taught it to do. You only apply real, hard threshold in the end when classifying all/nothing. – Heirdom 13/9, 2012 at 12:23

@BorisStitnicky I tried to program a simple perceptron with sigmoid and no threshold but it doesn't seem to learn the XOR. It has one hidden layer. Shall I only introduce the threshold on the classifying neuron? If so how do I update it? – Essayist 21/12, 2012 at 19:45

That might solve your problem, or not, depending on your learning algorithm. Definitely, it's an error, but in this case, it will have more than one solution. Obviously, if you want to teach it XOR, you must apply threshold in the end, as XOR has 0/1 output. But what worries me more, is your learning algorithm. You never said whether you wrote it yourself or what you are using. If you happen to speak Ruby, try ai4r, they have example code, change it slowly to fit your needs. – Heirdom 23/12, 2012 at 10:51

Recommended topics

Hot tags