What is the architecture behind the Keras LSTM Layer implementation?
Asked Answered
S

3

7

How does the input dimensions get converted to the output dimensions for the LSTM Layer in Keras? From reading Colah's blog post, it seems as though the number of "timesteps" (AKA the input_dim or the first value in the input_shape) should equal the number of neurons, which should equal the number of outputs from this LSTM layer (delineated by the units argument for the LSTM layer).

From reading this post, I understand the input shapes. What I am baffled by is how Keras plugs the inputs into each of the LSTM "smart neurons".

Keras LSTM reference

Example code that baffles me:

model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))

From this, I would think that the LSTM layer has 10 neurons and each neuron is fed a vector of length 64. However, it seems it has 32 neurons and I have no idea what is being fed into each. I understand that for the LSTM to connect to the Dense layer, we can just plug all 32 outputs to each of the 2 neurons. What confuses me is the InputLayer to the LSTM.

(similar SO post but not quite what I need)

Slusher answered 18/4, 2018 at 6:21 Comment(0)
S
8

Revisited and updated in 2020: I was partially correct! The architecture is 32 neurons. The 10 represents the timestep value. Each neuron is being fed a 64 length vector (maybe representing a word vector), representing 64 features (perhaps 64 words that help identify a word) over 10 timesteps.

The 32 represents the number of neurons. It represents how many hidden states there are for this layer and also represents the output dimension (since we output a hidden state at the end of each LSTM neuron).

Lastly, the 32-dimensional output vector generated from the 32 neurons at the last timestep is then fed to a Dense layer of 2 neurons, which basically means plug the 32 length vector to both neurons, with weights on the input and activation.

More reading with somewhat helpful answers:

Slusher answered 24/4, 2018 at 8:45 Comment(2)
@Sticy I'm very surprised with your answer, it's basically against whatever I've learned till now... which makes me very worry. Based on what I have learned, input_shape=(10, 64) means that you have 10 timesteps and 64 features, the number of samples can be anything. So basically it is a 3d: (n_samples=?, timesteps=10, n_features=64). When we have model.add(LSTM(32, input_shape=(10, 64))): this will add a LSTM layer with 32 neurons and all of them will get each of 10 timesteps one by one and they will have their own output, so the output of LSTM will be (?,32). Feel free to correct meRuyle
@Ruyle You're wrong here. Refer hereIndolence
I
2

I dont think you are right. Actually timestep number does not impact the number of parameters in LSTM.

from keras.layers import LSTM
from keras.models import Sequential

time_step = 13
featrue = 5
hidenfeatrue = 10

model = Sequential()
model.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model.summary()

time_step=100
model2 = Sequential()
model2.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model2.summary()

the reuslt:

Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10)                640       
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_2 (LSTM)                (None, 10)                640       
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________
Illailladvised answered 19/3, 2019 at 6:45 Comment(1)
thank you! yep, you're right - updated my original answerSlusher
K
1

@Sticky, you are wrong in your interpretation. Input_shape =(batch_size,sequence_length/timesteps,feature_size).So, your input tensor is 10x64 (like 10 words and its 64 features.Just like word embedding).32 are neurons to make output vector size 32.

The output will have shape structure:

  1. (batch, arbitrary_steps, units) if return_sequences=True.
  2. (batch, units) if return_sequences=False.
  3. The memory states will have a size of "units".
Kwangtung answered 2/1, 2019 at 7:6 Comment(1)
ah yep, agreed updated my original answer - thanks!Slusher

© 2022 - 2024 — McMap. All rights reserved.