Keras - Nan in summary histogram LSTM
Asked Answered
I

1

3

I've written an LSTM model using Keras, and using LeakyReLU advance activation:

    # ADAM Optimizer with learning rate decay
    opt = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)

    # build the model
    model = Sequential()

    num_features = data.shape[2]
    num_samples = data.shape[1]

    model.add(
        LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='linear'))
    model.add(LeakyReLU(alpha=.001))
    model.add(Dropout(0.1))
    model.add(LSTM(8, return_sequences=True, activation='linear'))
    model.add(Dropout(0.1))
    model.add(LeakyReLU(alpha=.001))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=opt,
                  metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])

My data is a balanced binary labeled set. i.e: 50% labeled 1 50% labeled 0. I've used activation='linear' for the LSTM layers preceding the LeakyReLU activation, similar to this example I found on GitHub.

The model throws Nan in summary histogram error in that configuration. Changing the LSTM activations to activation='sigmoid' works well, but seems like the wrong thing to do.

Reading this StackOverflow question suggested "introducing a small value when computing the loss", I'm just not sure how to do it on a built-in loss function.

Any help/explanation would be appreciated.

Update: I can see that the loss is nan on the first epoch

260/260 [==============================] - 6s 23ms/step - 
loss: nan - acc: 0.5000 - precision: 0.5217 - recall: 0.6512 - f1: nan - val_loss: nan - val_acc: 0.0000e+00 - val_precision: -2147483648.0000 - val_recall: -49941480.1860 - val_f1: nan

Update 2 I've upgraded both TensorFlow & Keras to versions 1.12.0 & 2.2.4 . There was no effect.

I also tried adding a loss to the first LSTM layer as suggested by @Oluwafemi Sule, it looks like a step in the right direction, now the loss is not nan on the first epoch, however, I still get the same error ... probably because of other nan values, like the val_loss / val_f1.

[==============================] - 7s 26ms/step - 
loss: 1.9099 - acc: 0.5077 - precision: 0.5235 - recall: 0.6544 - f1: 0.5817 - val_loss: nan - val_acc: 0.5172 - val_precision: 35.0000 - val_recall: 0.9722 - val_f1: nan

Update 3 I tried to compile the network with just the accuracy metric, with no success:

Epoch 1/300
260/260 [==============================] - 8s 29ms/step - loss: nan - acc: 0.5538 - val_loss: nan - val_acc: 0.0000e+00
Intimidate answered 31/10, 2018 at 9:35 Comment(14)
I had a similar issue once but mine was due to Nan values in the data-setHydrozoan
I'm not really sure if your gradients are exploding because LeakyRelu on its own is not enough to make it converge. But there is generally an option called 'clipnorm' or 'clipvalue' that you can pass with all the optimizers. This helps you clip gradients and is generally used to find ways out of local minimas. You could try that over here and see if it makes any difference? SourcePopple
What version of Keras and TensorFlow are you using?Weis
keras 2.2.2 , tf 1.5.0Intimidate
@ShlomiSchwartz Have you tried upgrading the TensorFlow and Keras and see if the issue is still there? If it is, then try using the Adam optimizer with default parameter and just modify the learning rate. Try 1e-3, 1e-4 or 1e-5 as the learning rate. Further, did you try the clipnorm for clipping the gradients. Additionally, please use @user_name at the beginning of your comment when you are replying to a specific user, otherwise that user won't be notified of your comment (I was not notified of your previous comment, I just checked this question by chance and saw that you have answered).Weis
@Weis thanks I'll give it a go, I haven't tried clipnorm, because I'm not sure how exactly, can you please add an answer with a code example?Intimidate
@ShlomiSchwartz Just pass clipnorm=1.0 argument to the optimizer, e.g. Adam(..., clipnorm=1.0).Weis
@Weis clipnorm=1.0 did not solve my issue when using activation='linear' I still get the Nan in summary histogram error (still same TF & Keras versions)Intimidate
What happens when you increase the argument alpha (say to 0.3) from the LeakyReLUs?Autointoxication
@Autointoxication unfortunately it does not helpIntimidate
If the problem is caused by a -Inf from LSTM layers' outputs, changing LeakyReLU to regular ReLU layers might fix it. I would also check the training set for Nan values.Amyamyas
@ShlomiSchwartz Could you try compiling and training the network without those additional metrics? Only use accuracy and see if you still get this error.Weis
@today, please see my editsIntimidate
Hi @ShlomiSchwartz, can you check out how the weights are initialized and rather try printing them as the loss is being computed? Theoretically, LSTMs or any kind of recursive network is prone to NaNs due to a large number of recursive multiplications and bad initialization of weights. So the fact that a skewed dataset might be a reason for NaNs is not probable enough when compared with the recursive nature of LSTMsLavenialaver
F
3

This answers starts from the suggestion to introduce a small value when computing the loss.

keras.layers.LSTM as with all layers that are direct or indirect subclasses of keras.engine.base_layer.Layer has a add_loss method that can be used to set a starting value for the loss.

I suggest to do this for the LSTM layer and see if it makes any difference for your results.

lstm_layer = LSTM(8, return_sequences=True, activation='linear')
lstm_layer.add_loss(1.0)

model.add(lstm_layer)
Fannyfanon answered 6/11, 2018 at 23:31 Comment(1)
thanks for your answer. It looks like a step in the right direction, now on the first epoch I can see 260/260 [==============================] - 7s 26ms/step - loss: 1.9099 - acc: 0.5077 - precision: 0.5235 - recall: 0.6544 - f1: 0.5817 - val_loss: nan - val_acc: 0.5172 - val_precision: 35.0000 - val_recall: 0.9722 - val_f1: nan So loss is no longer nan, however I still get the same error ... probably because of other nan values, like the val_loss / val_f1 ?Intimidate

© 2022 - 2024 — McMap. All rights reserved.