Suppose that we have an LSTM model for time series forecasting. Also, this is a multivariate case, so we're using more than one feature for training the model.
ipt = Input(shape = (shape[0], shape[1])
x = Dropout(0.3)(ipt) ## Dropout before LSTM.
x = CuDNNLSTM(10, return_sequences = False)(x)
out = Dense(1, activation='relu')(x)
We can add Dropout
layer before LSTM (like the above code) or after LSTM.
If we add it before LSTM, is it applying dropout on timesteps (different lags of time series), or different input features, or both of them?
If we add it after LSTM and because
return_sequences
isFalse
, what is dropout doing here?Is there any different between
dropout
option inLSTM
and dropout layer beforeLSTM
layer?
'dropout'
as an argument generates per-gate masks. I wonder whyCuDNN
implementations foundrecurrent_dropout
problematic though, or why there isn't aCuDNNIndRNN
implementation yet. Guess 'funding' could answer both. – Photoplay