Should there be one bias per layer or one bias for each node?
Asked Answered
T

1

29

I am looking to implement a generic neural network, with 1 input layer consisting of input nodes, 1 output layer consisting of output nodes, and N hidden layers consisting of hidden nodes. Nodes are organized into layers, with the rule that nodes in the same layer cannot be connected.

I mostly understand the concept of the bias, but I have a question.

Should there be one bias value per layer (shared by all nodes in that layer) or should each node (except nodes in the input layer) have their own bias value?

I have a feeling it could be done both ways, and would like to understand the trade-offs of each approach, and also know what implementation is most commonly used.

Tiga answered 25/1, 2016 at 18:58 Comment(4)
Usually we have one bias value per neuron (except input layer), i.e. you have to have a bias vector per layer with the length of the vector being the number of neurons in that layer.Leak
The biases are (almost always) individual to each neuron. The exception is in some modern neural networks with weight sharing. Take a look at this answer for an explanation as to why the bias should be unique. TLDR: the biases are used to shift the activation functions. Therefore, it does not necessarily make sense to use the same bias in all the nodes within a layer.Dionysiac
Interesting, thanks for the responses. I will build the option for both individual biases and bias sharing per layer into my system as configurable options upon creation of the Neural NetworkTiga
You can think of the bias as a constant input. I will have a single weight connecting it to every node (assuming full-connected networks) in the layer. After training, that is, after actualization of the weights, you have a constant value 1*weight[i] entering each node.Heterogenetic
R
22

Intuitive View

To answer this question properly, we should first establish exactly what we mean when we say "Bias value" as done in the question. Neural Networks are typically intuitively viewed (and explained to beginners) as a network of nodes (neurons) and weighted, directed connections between nodes. In this view, Biases are very frequently drawn as additional ''input'' nodes, which always have an activation level of exactly 1.0. This value of 1.0 may be what some people think of when they hear "Bias Value". Such a Bias Node would have connections to other nodes, with trainable weights. Other people may think of those weights as "Bias Values". Since the question was tagged with the bias-neuron tag, I'll answer the question under the assumption that we use the first definition, e.g. Bias Value = 1.0 for some Bias Node / neuron.

From this point of view... it absolutely does not matter at all mathematically how many Bias nodes/values we put in our network, as long as we make sure to connect them to the correct nodes. You could intuitively think of the entire network as having only a single bias node with a value of 1.0 that does not belong to any particular layer, and has connections to all nodes other than the input nodes. This may be difficult to draw though, if you want to make a drawing of your neural network it may be more convenient to place a separate bias node (each with a value of 1.0) in every layer except for the output layer, and connect each of those bias nodes to all the nodes in the layer directly after it. Mathematically, these two interpretations are equivalent, since in both cases every non-input node has an incoming weighted connection from a node that always has an activation level of 1.0.

Programming View

When Neural Networks are programmed, there typically aren't any explicit node ''objects'' at all (at least in efficient implementations). There will generally just be matrices for the weights. From this point of view, there is no longer any choice. We'll (almost) always want one ''bias-weight'' (a weight being multiplied by a constant activation level of 1.0) going to every non-input node, and we'll have to make sure all those weights appear in the correct spots in our weight matrices.

Rosierosily answered 13/1, 2018 at 18:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.