Multivariate LSTM with missing values
Asked Answered
T

2

38

I am working on a Time Series Forecasting problem using LSTM. The input contains several features, so I am using a Multivariate LSTM. The problem is that there are some missing values, for example:

    Feature 1     Feature 2  ...  Feature n
 1    2               4             nan
 2    5               8             10
 3    8               8              5
 4    nan             7              7
 5    6              nan            12

Instead of interpolating the missing values, that can introduce bias in the results, because sometimes there are a lot of consecutive timestamps with missing values on the same feature, I would like to know if there is a way to let the LSTM learn with the missing values, for example, using a masking layer or something like that? Can someone explain to me what will be the best approach to deal with this problem? I am using Tensorflow and Keras.

Tintinnabulation answered 29/9, 2018 at 16:13 Comment(0)
C
57

As suggested by François Chollet (creator of Keras) in his book, one way to handle missing values is to replace them with zero:

In general, with neural networks, it’s safe to input missing values as 0, with the condition that 0 isn’t already a meaningful value. The network will learn from exposure to the data that the value 0 means missing data and will start ignoring the value. Note that if you’re expecting missing values in the test data, but the network was trained on data without any missing values, the network won’t have learned to ignore missing values! In this situation, you should artificially generate training samples with missing entries: copy some training samples several times, and drop some of the features that you expect are likely to be missing in the test data.

So you can assign zero to NaN elements, considering that zero is not used in your data (you can normalize the data to a range, say [1,2], and then assign zero to NaN elements; or alternatively, you can normalize all the values to be in range [0,1] and then use -1 instead of zero to replace NaN elements.)

Another alternative way is to use a Masking layer in Keras. You give it a mask value, say 0, and it would drop any timestep (i.e. row) where all its features are equal to the mask value. However, all the following layers should support masking and you also need to pre-process your data and assign the mask value to all the features of a timestep which includes one or more NaN features. Example from Keras doc:

Consider a Numpy data array x of shape (samples, timesteps,features), to be fed to an LSTM layer. You want to mask timestep #3 and #5 because you lack data for these timesteps. You can:

  • set x[:, 3, :] = 0. and x[:, 5, :] = 0.

  • insert a Masking layer with mask_value=0. before the LSTM layer:

model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(32))

Update (May 2021): According to an updated suggestion from François Cholle, it might be better to use a more meaningful or informative value (instead of using zero) for masking missing values. This value could be computed (e.g. mean, median, etc.) or predicted from the data itself.

Czech answered 29/9, 2018 at 16:25 Comment(5)
Thanks for your answer. Regarding the masking solution, can you also comment on the afterwards padding procedure ? I assume, after masking nan value one need to inform them by padding ? and if so, how one can inform the lstm for example.Compare
I have posted my question. stats.stackexchange.com/questions/445254/…Compare
In the newest edition of that book, Chollet does not advise anymore to use an arbitrary value (like 0). Instead, he suggests imputing a more meaningful one (e.g. mean, median, or based on a prediction).California
@California Thanks a lot for the update. I just added it to the answer.Czech
@Czech You should artificially generate training samples with missing entries: copy some training samples several times, and drop some of the features that you expect are likely to be missing in the test data. I am having confusion understanding that part. Does this mean we are going to create duplicates in the dataset, with a potentical duplicate be the one with a few missing values? If so, wouldn't it create bias towards certain timestamps that have duplicate entries now?Platform
S
-1

In my perspective, the approach to handling missing data depends on the nature of the dataset. From an economic standpoint, variables can be broadly classified into two categories: stock and flow. When dealing with stock variables, it is advisable to address missing values using an exponential function.

Economic variables typically fall into the categories of stock or flow. A stock variable represents a quantity measured at a specific point in time, while a flow variable denotes a quantity measured over a period. In the context of stock variables, values increase over time, as exemplified by the equation v_t = v_(t-1) + I_t. Consequently, when confronted with missing values in such scenarios, employing an exponential function is deemed more effective.

Stephan answered 25/11, 2023 at 20:6 Comment(2)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Recountal
This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From ReviewEthelinda

© 2022 - 2025 — McMap. All rights reserved.