Correct me if I am wrong but according to the official Keras documentation, by default, the fit function has the argument 'shuffle=True', hence it shuffles the whole training dataset on each epoch.
However, the point of using recurrent neural networks such as LSTM or GRU is to use the precise order of each data so that the state of the previous data influence the current one.
If we shuffle all the data, all the logical sequences are broken. Thus I don't understand why there are so much examples of LSTM where the argument is not set to False. What is the point of using RNN without sequences ?
Also, when I set the shuffle option to False, my LSTM model is less performant eventhought there are dependencies between the data: I use the KDD99 dataset where the connections are linked.
stateful
parameter, but also how training is propagated in an LSTM/RNN and there is a remark of shuffling, too. Also there is a suggestion on how to split your training sequences philipperemy.github.io/keras-stateful-lstm Also, this answer here might be helpful https://mcmap.net/q/381131/-shuffling-training-data-with-lstm-rnn – Electrojet