Why is ReLU used in regression with Neural Networks?

def build_model(): model = keras.Sequential([ keras.layers.Dense(64, activation=tf.nn.relu, input_shape=(train_data.shape[1],)), keras.layers.Dense(64, activation=tf.nn.relu), keras.layers.Dense(1) ]) optimizer = tf.train.RMSPropOptimizer(0.001) model.compile(loss='mse', optimizer=optimizer, metrics=['mae']) return model

The general reason for using non-linear activation functions in hidden layers is that, without them, no matter how many layers or how many units per layer, the network would behave just like a simple linear unit. This is nicely explained in this short video by Andrew Ng: Why do you need non-linear activation functions?

In your case, looking more closely, you'll see that the activation function of your final layer is not the relu as in your hidden layers, but the linear one (which is the default activation when you don't specify anything, like here):

keras.layers.Dense(1)

From the Keras docs:

Dense

[...]

Arguments

[...]

activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).

which is indeed what is expected for a regression network with a single continuous output.

Recommended topics

Hot tags