How to properly set the input_shape of LSTM layers?
Asked Answered
B

1

5

I have an input data with the following shape:(5395, 69, 1)

Should my input_shape be:

  • (69,1) or

  • (1,69) ?

With 69 neurons in the LSTM layer I get in the first input_shape 19'596 parameters to train, and with the second 38'364 parameters, aren't those the result of get as input 1 and 69 values, respectively? My question is should I have as input 1 because I have 1 feature or 69 because I have 69 timesteps, and why?

Boycie answered 11/4, 2020 at 8:12 Comment(0)
L
15

The input of LSTM layer has a shape of (num_timesteps, num_features), therefore:

  • If each input sample has 69 timesteps, where each timestep consists of 1 feature value, then the input shape would be (69, 1).

  • If each input sample is a single timestep of 69 feature values, then probably it does not make sense to use an RNN layer at all since basically the input is not a sequence. Instead, it's better to flatten the input sample (i.e. reshape (1, 69) to (69,)) and then use other connectivity architectures/layers (e.g. Dense).


As a side note, I might be wrong, but I have a feeling that you are mixing the number of input timesteps and the number of units/neurons in LSTM layer (specifically, I am referring to this sentence of yours: "With 69 neurons in the LSTM layer..."). These two have nothing to do with each other and they should not necessarily be the same number. The number of units/neurons in a LSTM layer determines the representational capability of that layer and should be set accordingly based on experiments/experience. This answer explains this point a bit further if you are interested.

Load answered 11/4, 2020 at 10:19 Comment(7)
Well explained. Due to your side note, I set the number of neurons of the LSTM layer equal to timesteps because I am working on a simple example, where I do this for faster experimentation, letting clear that this aspect need more work.Tetanize
Now due to your comment in the link " Further, when the number of units is 3, it basically means that only 3 features is extracted from each input timestep, i.e. each input timestep will be represented by 3 features, and these 3 features will be fed to the next layer" Does this mean that each timestep in the sequence will have 3 features or that each sequence will have 3 featuresTetanize
@DavidDiaz By having 3 units in LSTM layer, each timestep would be represented as 3-value vector by that LSTM layer; however, you may decide to use the representation of all timesteps (i.e. by passing return_sequences=True argument to LSTM layer) or just the last timestep representation (i.e. return_sequences=False which is the default case).Load
I have a dude if I use a LSTM and a Dense layer as structure passing to the LSTM return_sequences = T should I use a flatter layer? and why?Tetanize
@DavidDiaz It depends on what you want to achieve (i.e. application and problem formulation). For example, if you want to predict something for each timestep (and not just a single prediction for the entire sequence) then it makes sense to use return_sequences=True and a Dense layer. As I said, depending on the scenario, the architecture of the model might vary.Load
Thanks for the explanations, can you help with this questions 1Tetanize
2Tetanize

© 2022 - 2024 — McMap. All rights reserved.