Why is ReLU used in regression with Neural Networks?
Asked Answered
M

1

6

I am following the official TensorFlow with Keras tutorial and I got stuck here: Predict house prices: regression - Create the model

Why is an activation function used for a task where a continuous value is predicted?

The code is:

def build_model():
    model = keras.Sequential([
        keras.layers.Dense(64, activation=tf.nn.relu, 
                   input_shape=(train_data.shape[1],)),
        keras.layers.Dense(64, activation=tf.nn.relu),
        keras.layers.Dense(1)
    ])

    optimizer = tf.train.RMSPropOptimizer(0.001)

    model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
    return model
Muir answered 20/7, 2018 at 12:21 Comment(0)
G
8

The general reason for using non-linear activation functions in hidden layers is that, without them, no matter how many layers or how many units per layer, the network would behave just like a simple linear unit. This is nicely explained in this short video by Andrew Ng: Why do you need non-linear activation functions?

In your case, looking more closely, you'll see that the activation function of your final layer is not the relu as in your hidden layers, but the linear one (which is the default activation when you don't specify anything, like here):

keras.layers.Dense(1)

From the Keras docs:

Dense

[...]

Arguments

[...]

activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).

which is indeed what is expected for a regression network with a single continuous output.

Garett answered 20/7, 2018 at 12:37 Comment(1)
Thank you for taking your time to answer my question, that was a great video!Muir

© 2022 - 2024 — McMap. All rights reserved.