Advanced Activation layers in Keras Functional API
Asked Answered
X

1

6

When setting up a Neural Network using Keras you can use either the Sequential model, or the Functional API. My understanding is the the former is easy to set up and manage, and operates as a linear stack of layers, and that the functional approach is useful for more complex architectures, particularly those which involve sharing the output of an internal layer. I personally like using the functional API for versatility, however, am having difficulties with advanced activation layers such as LeakyReLU. When using standard activations, in the sequential model one can write:

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
          loss='categorical_crossentropy',
          metrics=['accuracy'])

Similarly in the functional API one can write the above as:

inpt = Input(shape = (100,))
dense_1 = Dense(32, activation ='relu')(inpt)
out = Dense(10, activation ='softmax')(dense_2)
model = Model(inpt,out)
model.compile(optimizer='rmsprop',
          loss='categorical_crossentropy',
          metrics=['accuracy'])

However, when using advanced activations like LeakyReLU and PReLU, in that sequential model we write them as separate layers. For example:

model = Sequential()
model.add(Dense(32, input_dim=100))
model.add(LeakyReLU(alpha=0.1))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
          loss='categorical_crossentropy',
          metrics=['accuracy'])

Now, I'm assuming one does the equivalent in the functional API approach:

inpt = Input(shape = (100,))
dense_1 = Dense(32)(inpt)
LR = LeakyReLU(alpha=0.1)(dense_1)
out = Dense(10, activation ='softmax')(LR)
model = Model(inpt,out)
model.compile(optimizer='rmsprop',
          loss='categorical_crossentropy',
          metrics=['accuracy'])

My questions are:

  1. Is this correct syntax in the functional approach?
  2. Why does Keras require a new layer for these advanced activation functions rather than allowing us to just replace 'relu'?
  3. Is there something fundamentally different about creating a new layer for the activation function, rather than assigning it to an existing layer definition (as in the first examples where we wrote 'relu'), as I realise you could always write your activation functions, including standard ones, as new layers, although have read that that should be avoided?
Xena answered 16/4, 2018 at 21:59 Comment(0)
C
4
  1. No, you forgot to connect the LeakyReLU to the dense layer:

    LR = LeakyReLU(alpha=0.1)(dense_1)

  2. Usually the advanced activations have tunable or learnable parameters, and these have to stored somewhere, it makes more sense for them to be layers as you can then access and save these parameters.

  3. Do it only if there is an advantage, such as tunable parameters.
Corsetti answered 16/4, 2018 at 22:49 Comment(1)
Thanks, I've edited my questions, to link them. That was just a mistake, but glad the rest is okay. And yes okay, so you can learn the value of alpha for LeakyReLU, presumably, given some activation function from the LeakyReLU layer?Xena

© 2022 - 2024 — McMap. All rights reserved.