Understanding the structure of my LSTM model
Asked Answered
S

1

1

I'm trying to solve the following problem:

  1. I have time series data from a number of devices.
  2. Each device recording is of length 3000.
  3. Every datapoint captured has 4 measurements.

Therefore, my data is shaped: (number of device recordings, 3000, 4).

I'm trying produce a vector of length 3000 where each data point of is one of 3 labels (y1, y2, y3), so my desired output dim is (number of device recording, 3000, 1). I have labeled data for training.

I'm trying to use an LSTM model for this, as 'classification as I move along time series data' seems like a RNN type of problem.

I have my network set up like this:

model = Sequential()
model.add(LSTM(3, input_shape=(3000, 4), return_sequences=True))
model.add(LSTM(3, activation = 'softmax', return_sequences=True))

model.summary()

and the summary looks like this:

Model: "sequential_23"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_29 (LSTM)               (None, 3000, 3)           96        
_________________________________________________________________
lstm_30 (LSTM)               (None, 3000, 3)           84        
=================================================================
Total params: 180
Trainable params: 180
Non-trainable params: 0
_________________________________________________________________

All looks good and well in the output space, as I can use the result from each unit to determine which of my three categories belongs to that particular time step (I think).

But I only have 180 trainable parameters, so I'm guessing that I am doing something horribly wrong.

Questions:

  1. Can someone help me understand why I have so few trainable parameters?
  2. Am I misinterpreting how to set up this LSTM?
  3. Am I just worrying over nothing?
  4. Does that 3 units mean I only have 3 LSTM 'blocks'?
  5. And that it can only look back 3 observations?
Slough answered 10/4, 2020 at 22:21 Comment(0)
R
4

In a simplistic viewpoint, you can consider a LSTM layer as an augmented Dense layer with a memory (hence enabling efficient processing of sequences). So the concept of "units" is also the same for both: the number of neurons or feature units of these layers, or in other words, the number of distinctive features these layers can extract from the input.

Therefore, when you specify the number of units to 3 for the LSTM layer, more or less it means that this layer can only extract 3 distinctive features from the input timesteps (note that the number of units has nothing to do with the length of input sequence, i.e. the entire input sequence will be processed by the LSTM layer no matter what the number of units or the length of input sequence is).

Usually, this might be sub-optimal (though, it really depends on the difficulty of the specific problem and dataset you are working on; i.e. maybe 3 units might be enough for your problem/dataset, and you should experiment to find out). Therefore, often a higher number is chosen for the number of units (common choices: 32, 64, 128, 256), and also the classification task is delegated to a dedicated Dense layer (or sometimes called "softmax layer") at the top of the model.

For example, considering the description of your problem, a model with 3 stacked LSTM layers and a Dense classification layer at the top might look like this:

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(3000, 4)))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(Dense(3, activation = 'softmax'))
Respect answered 10/4, 2020 at 23:6 Comment(7)
I see, so if I have a sequence of length 3000, and an LSTM layer with 3 units, then it just scans through the network like a sliding window? covering 3 time steps at a time before moving sideways by 1 time step? or does it somehow read the whole sequence before passing the information on to the next layer?Slough
@Slough The LSTM layer processes a sequence sequentially, i.e. one timestep at a time is processed and then the next timestep until all the 3000 timesteps have been processed and then it gives its output to the next layer. However, as I said, it has a memory: when processing the timestep t, it remembers all the (t-1) timesteps it has processed so far. Further, when the number of units is 3, it basically means that only 3 features is extracted from each input timestep, i.e. each input timestep will be represented by 3 features, and these 3 features will be fed to the next layer.Respect
Thanks! That clarifies it greatly!Slough
@Respect just to clarify on that. Would the LSTM layer also show this behaviour even if return_sequences was set to False in your sample code?Chagrin
@AlbertoAgudoDominguez The return_sequences argument does not affect how the input sequence is processed by the LSTM layer; it just determines whether the output for all timesteps should be returned (when set to True) or only the last output (when set to False which is the default) is returned.Respect
@Respect Thank you, that is clear. But I originally was referring to this paragraph in your answer (sorry for not being specific enough): "The LSTM layer processes a sequence sequentially, i.e. one timestep at a time is processed and then the next timestep until all the 3000 timesteps have been processed and then it gives its output to the next layer." Does this only happen in return_sequences = True LSTMs or in any kind of LSTM?Chagrin
@AlbertoAgudoDominguez That's how all LSTM layers work; as I said, return_sequences does NOT affect how the input is processed.Respect

© 2022 - 2024 — McMap. All rights reserved.