LSTM architecture in Keras implementation?

I am new to Keras and going through the LSTM and its implementation details in Keras documentation. It was going easy but suddenly I came through this SO post and the comment. It has confused me on what is the actual LSTM architecture:

Here is the code:

model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))

As per my understanding, 10 denote the no. of time-steps and each one of them is fed to their respective LSTM cell; 64 denote the no. of features for each time-step.

But, the comment in the above post and the actual answer has confused me about the meaning of 32.

Also, how is the output from LSTM is getting connected to the Dense layer.

A hand-drawn diagrammatic explanation would be quite helpful in visualizing the architecture.

EDIT:

As far as this another SO post is concerned, then it means 32 represents the length of the output vector that is produced by each of the LSTM cells if return_sequences=True.

If that's true then how do we connect each of 32-dimensional output produced by each of the 10 LSTM cells to the next dense layer?

Also, kindly tell if the first SO post answer is ambiguous or not?

how do we connect each of 32-dimensional output produced by each of the 10 LSTM cells to the next dense layer?

It depends on how you want to do it. Suppose you have:

model.add(LSTM(32, input_shape=(10, 64), return_sequences=True))

Then, the output of that layer has shape (10, 32). At this point, you can either use a Flatten layer to get a single vector with 320 components, or use a TimeDistributed to work on each of the 10 vectors independently:

model.add(TimeDistributed(Dense(15))

The output shape of this layer is (10, 15), and the same weights are applied to the output of every LSTM unit.

it's easy to figure out the no. of LSTM cells required for the input(specified in timespan)

How to figure out the no. of LSTM units required in the output?

You either get the output of the last LSTM cell (last timestep) or the output of every LSTM cell, depending on the value of return_sequences. As for the dimensionality of the output vector, that's just a choice you have to make, just like the size of a dense layer, or number of filters in a conv layer.

how each of the 32-dim vector from the 10 LSTM cells get connected to TimeDistributed layer?

Following the previous example, you would have a (10, 32) tensor, i.e. a size-32 vector for each of the 10 LSTM cells. What TimeDistributed(Dense(15)) does, is to create a (15, 32) weight matrix and a bias vector of size 15, and do:

for h_t in lstm_outputs:
    dense_outputs.append(
        activation(dense_weights.dot(h_t) + dense_bias)
    )

Hence, dense_outputs has size (10, 15), and the same weights were applied to every LSTM output, independently.

Note that everything still works when you don't know how many timesteps you need, e.g. for machine translation. In this case, you use None for the timestep; everything that I wrote still applies, with the only difference that the number of timesteps is not fixed anymore. Keras will repeat LSTM, TimeDistributed, etc. for as many times as necessary (which depend on the input).

Recommended topics

Hot tags