I've written an LSTM model using Keras, and using LeakyReLU advance activation:
# ADAM Optimizer with learning rate decay
opt = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)
# build the model
model = Sequential()
num_features = data.shape[2]
num_samples = data.shape[1]
model.add(
LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='linear'))
model.add(LeakyReLU(alpha=.001))
model.add(Dropout(0.1))
model.add(LSTM(8, return_sequences=True, activation='linear'))
model.add(Dropout(0.1))
model.add(LeakyReLU(alpha=.001))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=opt,
metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])
My data is a balanced binary labeled set. i.e: 50% labeled 1 50% labeled 0. I've used activation='linear'
for the LSTM layers preceding the LeakyReLU activation, similar to this example I found on GitHub.
The model throws Nan in summary histogram
error in that configuration. Changing the LSTM activations to activation='sigmoid'
works well, but seems like the wrong thing to do.
Reading this StackOverflow question suggested "introducing a small value when computing the loss", I'm just not sure how to do it on a built-in loss function.
Any help/explanation would be appreciated.
Update: I can see that the loss is nan on the first epoch
260/260 [==============================] - 6s 23ms/step -
loss: nan - acc: 0.5000 - precision: 0.5217 - recall: 0.6512 - f1: nan - val_loss: nan - val_acc: 0.0000e+00 - val_precision: -2147483648.0000 - val_recall: -49941480.1860 - val_f1: nan
Update 2 I've upgraded both TensorFlow & Keras to versions 1.12.0 & 2.2.4 . There was no effect.
I also tried adding a loss to the first LSTM layer as suggested by @Oluwafemi Sule, it looks like a step in the right direction, now the loss is not nan on the first epoch, however, I still get the same error ... probably because of other nan values, like the val_loss / val_f1.
[==============================] - 7s 26ms/step -
loss: 1.9099 - acc: 0.5077 - precision: 0.5235 - recall: 0.6544 - f1: 0.5817 - val_loss: nan - val_acc: 0.5172 - val_precision: 35.0000 - val_recall: 0.9722 - val_f1: nan
Update 3 I tried to compile the network with just the accuracy metric, with no success:
Epoch 1/300
260/260 [==============================] - 8s 29ms/step - loss: nan - acc: 0.5538 - val_loss: nan - val_acc: 0.0000e+00
1e-3
,1e-4
or1e-5
as the learning rate. Further, did you try the clipnorm for clipping the gradients. Additionally, please use @user_name at the beginning of your comment when you are replying to a specific user, otherwise that user won't be notified of your comment (I was not notified of your previous comment, I just checked this question by chance and saw that you have answered). – Weisclipnorm=1.0
argument to the optimizer, e.g.Adam(..., clipnorm=1.0)
. – Weisclipnorm=1.0
did not solve my issue when usingactivation='linear'
I still get theNan in summary histogram
error (still same TF & Keras versions) – Intimidateaccuracy
and see if you still get this error. – Weis