Neural network bias for each neuron
Asked Answered
U

2

15

I have been following Andrew NG's videos on neural networks. In these videos, he doesn't associate a bias to each and every neuron. Instead, he adds a bias unit at the head of every layer after their activations have been computed and uses this bias along with the computations to calculate the activations of the next layer (forward propogation).

However, in some other blogs on machine learning and videos like this, there is a bias being associated with each neuron. What and why is this difference and what are it's implications?

Uphold answered 12/5, 2016 at 17:7 Comment(0)
G
11

Both approaches are representing the same bias concept. For each unit (excluding input nodes) you compute the value of activation function of a dot product of weights and activations from previous layers (in case of feed forward network) vectors plus scalar bias value :

 (w * a) + b

In Andrew Ng this value is computed using vectorisation trick in which you concatenate your activations with specified bias constant (usually 1) and that does the job (because this constant has its own weight for different nodes - so this is exactly the same to having another bias value for each node).

Greerson answered 12/5, 2016 at 22:8 Comment(3)
But in Andrew NG's course, if we add a single bias, won't all neurons in the next layer all have the same bias? This would not be the case if we initialized a bias for each neuron because we could initialize different biases for different neurons.Uphold
The bias value is the same - but every node has different weight for it. So if e.g. some node has a bias weight w_0 and bias constant is a_0 then corresponding bias value is equal to w_0 * a_0. You can adjust every bias value simply by learning a correct weight w_0.Kasandrakasevich
why must the bias unit be only added to the start of the neural network? I.e why must the one vector be added to the start. Why not the end?Offshore
R
1

Regarding the differences between the two, @Marcin has answered them beautifully.

It's interesting that in his Deep Learning specialization by deeplearning.ai, Andrew takes a different approach from his Machine Learning course (where he took one bias term for every hidden layer) and associates a bias term with each associated neuron.

Though both the approaches try to achieve the same result, in my opinion, the one with associating a bias with each neuron is much more explicit and helps immensely with hyperparameter tuning, especially when you're dealing with large neural network architectures like CNN, Deep Neural Network, etc.

Renzo answered 6/7, 2020 at 21:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.