When setting up a Neural Network using Keras you can use either the Sequential
model, or the Functional API
. My understanding is the the former is easy to set up and manage, and operates as a linear stack of layers, and that the functional approach is useful for more complex architectures, particularly those which involve sharing the output of an internal layer. I personally like using the functional API for versatility, however, am having difficulties with advanced activation layers such as LeakyReLU. When using standard activations, in the sequential model one can write:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Similarly in the functional API one can write the above as:
inpt = Input(shape = (100,))
dense_1 = Dense(32, activation ='relu')(inpt)
out = Dense(10, activation ='softmax')(dense_2)
model = Model(inpt,out)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
However, when using advanced activations like LeakyReLU and PReLU, in that sequential model we write them as separate layers. For example:
model = Sequential()
model.add(Dense(32, input_dim=100))
model.add(LeakyReLU(alpha=0.1))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Now, I'm assuming one does the equivalent in the functional API approach:
inpt = Input(shape = (100,))
dense_1 = Dense(32)(inpt)
LR = LeakyReLU(alpha=0.1)(dense_1)
out = Dense(10, activation ='softmax')(LR)
model = Model(inpt,out)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
My questions are:
- Is this correct syntax in the functional approach?
- Why does Keras require a new layer for these advanced activation functions rather than allowing us to just replace
'relu'
? - Is there something fundamentally different about creating a new layer for the activation function, rather than assigning it to an existing layer definition (as in the first examples where we wrote
'relu'
), as I realise you could always write your activation functions, including standard ones, as new layers, although have read that that should be avoided?