Neural network bias for each neuron

Asked 12/5, 2016 at 17:7 Answered 6/7, 2020 at 21:16

Solved machine-learning neural-network deep-learning bias-neuron

I have been following Andrew NG's videos on neural networks. In these videos, he doesn't associate a bias to each and every neuron. Instead, he adds a bias unit at the head of every layer after their activations have been computed and uses this bias along with the computations to calculate the activations of the next layer (forward propogation).

However, in some other blogs on machine learning and videos like this, there is a bias being associated with each neuron. What and why is this difference and what are it's implications?

Uphold answered 12/5, 2016 at 17:7 Comment(0)

Both approaches are representing the same bias concept. For each unit (excluding input nodes) you compute the value of activation function of a dot product of weights and activations from previous layers (in case of feed forward network) vectors plus scalar bias value :

 (w * a) + b

In Andrew Ng this value is computed using vectorisation trick in which you concatenate your activations with specified bias constant (usually 1) and that does the job (because this constant has its own weight for different nodes - so this is exactly the same to having another bias value for each node).

Greerson answered 12/5, 2016 at 22:8 Comment(3)

But in Andrew NG's course, if we add a single bias, won't all neurons in the next layer all have the same bias? This would not be the case if we initialized a bias for each neuron because we could initialize different biases for different neurons. – Uphold 13/5, 2016 at 5:35

The bias value is the same - but every node has different weight for it. So if e.g. some node has a bias weight w_0 and bias constant is a_0 then corresponding bias value is equal to w_0 * a_0. You can adjust every bias value simply by learning a correct weight w_0. – Kasandrakasevich 13/5, 2016 at 7:38

why must the bias unit be only added to the start of the neural network? I.e why must the one vector be added to the start. Why not the end? – Offshore 6/3, 2020 at 14:38

Regarding the differences between the two, @Marcin has answered them beautifully.

It's interesting that in his Deep Learning specialization by deeplearning.ai, Andrew takes a different approach from his Machine Learning course (where he took one bias term for every hidden layer) and associates a bias term with each associated neuron.

Though both the approaches try to achieve the same result, in my opinion, the one with associating a bias with each neuron is much more explicit and helps immensely with hyperparameter tuning, especially when you're dealing with large neural network architectures like CNN, Deep Neural Network, etc.

Renzo answered 6/7, 2020 at 21:16 Comment(0)

Recommended topics

Hot tags