Keras : Shuffling dataset while using LSTM
Asked Answered
R

1

11

Correct me if I am wrong but according to the official Keras documentation, by default, the fit function has the argument 'shuffle=True', hence it shuffles the whole training dataset on each epoch.

However, the point of using recurrent neural networks such as LSTM or GRU is to use the precise order of each data so that the state of the previous data influence the current one.

If we shuffle all the data, all the logical sequences are broken. Thus I don't understand why there are so much examples of LSTM where the argument is not set to False. What is the point of using RNN without sequences ?

Also, when I set the shuffle option to False, my LSTM model is less performant eventhought there are dependencies between the data: I use the KDD99 dataset where the connections are linked.

Robichaux answered 15/7, 2019 at 13:55 Comment(1)
Here is a useful article, it is not to complicated and has examples. It talks mostly about the stateful parameter, but also how training is propagated in an LSTM/RNN and there is a remark of shuffling, too. Also there is a suggestion on how to split your training sequences philipperemy.github.io/keras-stateful-lstm Also, this answer here might be helpful https://mcmap.net/q/381131/-shuffling-training-data-with-lstm-rnnElectrojet
O
20

If we shuffle all the data, all the logical sequences are broken.

No, the shuffling happens on the batches axis, not on the time axis. Usually, your data for an RNN has a shape like this: (batch_size, timesteps, features)

Usually, you give your network not only one sequence to learn from, but many sequences. Only the order in which these many sequences are being trained on gets shuffled. The sequences themselves stay intact. Shuffling is usually always a good idea because your network shall only learn the training examples themselves, not their order.

This being said, there are cases where you have indeed only one huge sequence to learn from. In that case you have the option to still divide your sequence into several batches. If this is the case, you are absolutely right with your concern that shuffling would have a huge negative impact, so don't do that in this case!

Note: RNNs have a stateful parameter that you can set to True. In that case the last state of the previous batch will be passed to the following one which effectively makes your RNN see all batches as one huge sequence. So, absolutely do this, if you have a huge sequence over multiple batches.

Oxcart answered 15/7, 2019 at 14:29 Comment(1)
Thank you very much for your very complete answer. You are right, I was indeed confused about how the shuffling was done. I have looked upon the code of the fit function in Keras [right here] (github.com/keras-team/keras/blob/…) and it does shuffle one batch at the time.Robichaux

© 2022 - 2024 — McMap. All rights reserved.