In Keras, what exactly am I configuring when I create a stateful `LSTM` layer with N `units`?

Asked 30/5, 2017 at 23:5 Answered 15/2, 2020 at 13:56

Solved tensorflow neural-network keras lstm

The first arguments in a normal Dense layer is also units, and is the number of neurons/nodes in that layer. A standard LSTM unit however looks like the following:

(This is a reworked version of "Understanding LSTM Networks")

In Keras, when I create an LSTM object like this LSTM(units=N, ...), am I actually creating N of these LSTM units? Or is it the size of the "Neural Network" layers inside the LSTM unit, i.e., the W's in the formulas? Or is it something else?

For context, I'm working based on this example code.

The following is the documentation: https://keras.io/layers/recurrent/

It says:

units: Positive integer, dimensionality of the output space.

It makes me think it is the number of outputs from the Keras LSTM "layer" object. Meaning the next layer will have N inputs. Does that mean there actually exists N of these LSTM units in the LSTM layer, or maybe that that exactly one LSTM unit is run for N iterations outputting N of these h[t] values, from, say, h[t-N] up to h[t]?

If it only defines the number of outputs, does that mean the input still can be, say, just one, or do we have to manually create lagging input variables x[t-N] to x[t], one for each LSTM unit defined by the units=N argument?

As I'm writing this it occurs to me what the argument return_sequences does. If set to True all the N outputs are passed forward to the next layer, while if it is set to False it only passes the last h[t] output to the next layer. Am I right?

Tyndale answered 30/5, 2017 at 23:5 Comment(3)

Possible duplicate of stats.stackexchange.com/questions/241985/… – Cymric 9/3, 2019 at 4:2

@Cymric I don't think tagging a questions as "duplicate" across stack exchange sites is a thing. This question also pertains to Keras, an abstraction layer on top of Tensorflow. Anyway, the link is helpful and a good reference so thanks. – Cyanamide 17/3, 2019 at 17:1

Check this - zhuanlan.zhihu.com/p/58854907. A pretty good explanation. – Strife 27/2, 2020 at 6:28

You can check this question for further information, although it is based on Keras-1.x API.

Basically, the unit means the dimension of the inner cells in LSTM. Because in LSTM, the dimension of inner cell (C_t and C_{t-1} in the graph), output mask (o_t in the graph) and hidden/output state (h_t in the graph) should have the SAME dimension, therefore you output's dimension should be unit-length as well.

And LSTM in Keras only define exactly one LSTM block, whose cells is of unit-length. If you set return_sequence=True, it will return something with shape: (batch_size, timespan, unit). If false, then it just return the last output in shape (batch_size, unit).

As for the input, you should provide input for every timestamp. Basically, the shape is like (batch_size, timespan, input_dim), where input_dim can be different from the unit. If you just want to provide input at the first step, you can simply pad your data with zeros at other time steps.

Jewbaiting answered 31/5, 2017 at 6:59 Comment(1)

So, that means the unit means the size of the vector that is being outputted by every time-step LSTM cell. But, how does Keras know how many of these LSTM cells to use OR will be required to train on the data? I mean, it's easy to figure out the no. of LSTM cells required for the input(specified in timespan), but, how to figure out the no. of LSTM units required in the output? – Hebetate 29/12, 2018 at 10:30

Does that mean there actually exists N of these LSTM units in the LSTM layer, or maybe that that exactly one LSTM unit is run for N iterations outputting N of these h[t] values, from, say, h[t-N] up to h[t]?

First is true. In that Keras LSTM layer there are N LSTM units or cells.

keras.layers.LSTM(units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False)

If you plan to create simple LSTM layer with 1 cell you will end with this: And this would be your model.

N=1
model = Sequential()
model.add(LSTM(N))

For the other models you would need N>1

Sym answered 24/1, 2019 at 22:54 Comment(7)

why would we use lstm in one-to-one model? – Fayth 9/3, 2019 at 3:0

If neural network is matrix transform followed by non-linearity, there are several neural networks in LSMT. I have no clear idea why would I use just the single LSMT cell in practice. – Sym 9/3, 2019 at 6:15

So N is the number of blue cells? – Pleuro 19/6, 2020 at 1:8

@dvdblk, Yes, N should be the blue cells, or LSTM out space – Sym 19/6, 2020 at 16:6

isn't the blue cells computed from the input time step. for eg we have an LSTM(32) layer and if we input (2,1,24) which corresponds to (batch size, time step, features) then this will have only 1 blue cell. – Spray 22/1, 2021 at 18:17

No explanation of the figures will only confuse people. – Energumen 30/4 at 17:23

This type of answers should be aggressively downvoted. – Energumen 30/4 at 17:25

How many instances of "LSTM chains"

The proper intuitive explanation of the 'units' parameter for Keras recurrent neural networks is that with units=1 you get a RNN as described in textbooks, and with units=n you get a layer which consists of n independent copies of such RNN - they'll have identical structure, but as they'll be initialized with different weights, they'll compute something different.

Alternatively, you can consider that in an LSTM with units=1 the key values (f, i, C, h) are scalar; and with units=n they'll be vectors of length n.

Cymric answered 9/3, 2019 at 4:2 Comment(2)

So if I have an input sequence with N timepoints, the number of cells is N (when unrolling the LSTM) and if I set units=1 then this is it. If I set it to 10 it will produce 10 independent copies of my LSTM chains that are N (N cells). Is that right? – Halfassed 27/7, 2023 at 13:4

If unit=1 and the input sequence is of length 10, this means that a single LSTM unit will be unrolled 10 times to process the 10 time points, right? – Halfassed 22/5 at 9:50

"Intuitively" just like a dense layer with 100 dim (Dense(100)) will have 100 neurons. Same way LSTM(100) will be a layer of 100 'smart neurons' where each neuron is the figure you mentioned and the output will be a vector of 100 dimensions

Victualler answered 15/2, 2020 at 13:56 Comment(0)

How many instances of "LSTM chains"

Recommended topics

Hot tags