I want to make a model which predicts the future response of the input signal, the architecture of my network is [3, 5, 1]:
- 3 inputs,
- 5 neurons in the hidden layer, and
- 1 neuron in output layer.
My questions are:
- Should we have separate BIAS for each hidden and output layer?
- Should we assign weight to BIAS at each layer (as BIAS becomes extra value to our network and cause the over burden the network)?
- Why BIAS is always set to one? If eta has different values, why we don't set the BIAS with different values?
- Why we always use log sigmoid function for non linear functions, can we use tanh ?