LSTM architecture in Keras implementation?
Asked Answered
K

1

2

I am new to Keras and going through the LSTM and its implementation details in Keras documentation. It was going easy but suddenly I came through this SO post and the comment. It has confused me on what is the actual LSTM architecture:

Here is the code:

model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))

As per my understanding, 10 denote the no. of time-steps and each one of them is fed to their respective LSTM cell; 64 denote the no. of features for each time-step.

But, the comment in the above post and the actual answer has confused me about the meaning of 32.

Also, how is the output from LSTM is getting connected to the Dense layer.

A hand-drawn diagrammatic explanation would be quite helpful in visualizing the architecture.

EDIT:

As far as this another SO post is concerned, then it means 32 represents the length of the output vector that is produced by each of the LSTM cells if return_sequences=True.

If that's true then how do we connect each of 32-dimensional output produced by each of the 10 LSTM cells to the next dense layer?

Also, kindly tell if the first SO post answer is ambiguous or not?

Kendrick answered 29/12, 2018 at 3:27 Comment(1)
A
2

how do we connect each of 32-dimensional output produced by each of the 10 LSTM cells to the next dense layer?

It depends on how you want to do it. Suppose you have:

model.add(LSTM(32, input_shape=(10, 64), return_sequences=True))

Then, the output of that layer has shape (10, 32). At this point, you can either use a Flatten layer to get a single vector with 320 components, or use a TimeDistributed to work on each of the 10 vectors independently:

model.add(TimeDistributed(Dense(15))

The output shape of this layer is (10, 15), and the same weights are applied to the output of every LSTM unit.

it's easy to figure out the no. of LSTM cells required for the input(specified in timespan)

How to figure out the no. of LSTM units required in the output?

You either get the output of the last LSTM cell (last timestep) or the output of every LSTM cell, depending on the value of return_sequences. As for the dimensionality of the output vector, that's just a choice you have to make, just like the size of a dense layer, or number of filters in a conv layer.

how each of the 32-dim vector from the 10 LSTM cells get connected to TimeDistributed layer?

Following the previous example, you would have a (10, 32) tensor, i.e. a size-32 vector for each of the 10 LSTM cells. What TimeDistributed(Dense(15)) does, is to create a (15, 32) weight matrix and a bias vector of size 15, and do:

for h_t in lstm_outputs:
    dense_outputs.append(
        activation(dense_weights.dot(h_t) + dense_bias)
    )

Hence, dense_outputs has size (10, 15), and the same weights were applied to every LSTM output, independently.

Note that everything still works when you don't know how many timesteps you need, e.g. for machine translation. In this case, you use None for the timestep; everything that I wrote still applies, with the only difference that the number of timesteps is not fixed anymore. Keras will repeat LSTM, TimeDistributed, etc. for as many times as necessary (which depend on the input).

Alie answered 29/12, 2018 at 10:55 Comment(3)
So, that means the unit means the size of the vector that is being outputted by every time-step LSTM cell. But, how does Keras know how many of these LSTM cells to use OR will be required to train on the data? I mean, it's easy to figure out the no. of LSTM cells required for the input(specified in timespan), but, how to figure out the no. of LSTM units required in the output?Kendrick
Also, can you explain how each of the 32-dim vector from the 10 LSTM cells get connected to TimeDistributed layer?Kendrick
In your answer, you have written: You either get the output of the last LSTM cell (last timestep) or the output of every LSTM cell, but I'm asking about the case such as machine-translation where the translated/output sentence length is variable and cannot be known in advance. So, in that case, how many LSTM cells play their role in outputting the translated sentence?Kendrick

© 2022 - 2024 — McMap. All rights reserved.