Padding time-series subsequences for LSTM-RNN training

Asked 23/5, 2017 at 10:4 Answered 12/11, 2021 at 8:47

Solved machine-learning deep-learning padding lstm recurrent-neural-network

I have a dataset of time series that I use as input to an LSTM-RNN for action anticipation. The time series comprises a time of 5 seconds at 30 fps (i.e. 150 data points), and the data represents the position/movement of facial features.

I sample additional sub-sequences of smaller length from my dataset in order to add redundancy in the dataset and reduce overfitting. In this case I know the starting and ending frame of the sub-sequences.

In order to train the model in batches, all time series need to have the same length, and according to many papers in the literature padding should not affect the performance of the network.

Example:

Original sequence:

 1 2 3 4 5 6 7 8 9 10

Subsequences:

4 5 6 7
8 9 10
2 3 4 5 6

considering that my network is trying to anticipate an action (meaning that as soon as P(action) > threshold as it goes from t = 0 to T = tmax, it will predict that action) will it matter where the padding goes?

Option 1: Zeros go to substitute original values

0 0 0 4 5 6 7 0 0 0
0 0 0 0 0 0 0 8 9 10
0 2 3 4 5 6 0 0 0 0

Option 2: all zeros at the end

4 5 6 7 0 0 0 0 0 0 
8 9 10 0 0 0 0 0 0 0
2 3 4 5 0 0 0 0 0 0

Moreover, some of the time series are missing a number of frames, but it is not known which ones they are - meaning that if we only have 60 frames, we don't know whether they are taken from 0 to 2 seconds, from 1 to 3s, etc. These need to be padded before the subsequences are even taken. What is the best practice for padding in this case?

Thank you in advance.

Bobseine answered 23/5, 2017 at 10:4 Comment(0)

The most powerful attribute of LSTMs and RNNs in general is that their parameters are shared along the time frames(Parameters recur over time frames) but the parameter sharing relies upon the assumption that the same parameters can be used for different time steps i.e. the relationship between the previous time step and the next time step does not depend on t as explained here in page 388, 2nd paragraph.

In short, padding zeros at the end, theoretically should not change the accuracy of the model. I used the adverb theoretically because at each time step LSTM's decision depends on its cell state among other factors and this cell state is kind of a short summary of the past frames. As far as I understood, that past frames may be missing in your case. I think what you have here is a little trade-off.

I would rather pad zeros at the end because it doesn't completely conflict with the underlying assumption of RNNs and it's more convenient to implement and keep track of.

On the implementation side, I know tensorflow calculates the loss function once you give it the sequences and the actual sequence size of each sample(e.g. for 4 5 6 7 0 0 0 0 0 0 you also need to give it the actual size which is 4 here) assuming you're implementing the option 2. I don't know whether there is an implementation for option 1, though.

Soberminded answered 23/5, 2017 at 14:17 Comment(1)

Thanks, that's very helpful! – Bobseine 23/5, 2017 at 14:38

Better go for padding zeroes in the beginning, as this paper suggests Effects of padding on LSTMs and CNNs,

Though post padding model peaked it’s efficiency at 6 epochs and started to overfit after that, it’s accuracy is way less than pre-padding.

Check table 1, where the accuracy of pre-padding(padding zeroes in the beginning) is around 80%, but for post-padding(padding zeroes in the end), it is only around 50%

Lalita answered 21/4, 2019 at 9:54 Comment(1)

I can confirm that this has been my experience as well - ie for a forward RNN, padding at the beginning of the sequence is much better than padding at the end. Also the shorter the actual sequence is relative to the sequence length of the input tensor, the more pronounced this effect is. – Adelleadelpho 9/7, 2023 at 18:3

In case you have sequences of variable length, pytorch provides a utility function torch.nn.utils.rnn.pack_padded_sequence. The general workflow with this function is

from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
embedding = nn.Embedding(4, 5)
rnn = nn.GRU(5, 5)

sequences = torch.tensor([[1,2,0], [3,0,0], [2,1,3]])
lens = [2, 1, 3] # indicating the actual length of each sequence

embeddings = embedding(sequences)
packed_seq = pack_padded_sequence(embeddings, lens, batch_first=True, enforce_sorted=False)

e, hn = rnn(packed_seq)

One can collect the embedding of each token by

e = pad_packed_sequence(e, batch_first=True)

Using this function is better than padding by yourself, because torch will limit RNN to only inspecting the actual sequence and stop before the padded token.

Madrigalist answered 12/11, 2021 at 8:47 Comment(0)

Recommended topics

Hot tags