Many to one and many to many LSTM examples in Keras
Asked Answered
W

2

162

I try to understand LSTMs and how to build them with Keras. I found out, that there are principally the 4 modes to run a RNN (the 4 right ones in the picture)

enter image description here Image source: Andrej Karpathy

Now I wonder how a minimalistic code snippet for each of them would look like in Keras. So something like

model = Sequential()
model.add(LSTM(128, input_shape=(timesteps, data_dim)))
model.add(Dense(1))

for each of the 4 tasks, maybe with a little bit of explanation.

Wartburg answered 26/3, 2017 at 21:47 Comment(1)
For the diagram of the one-to-many architecture, the RNN units to the right of the first X input also have inputs which are required. They can typically be set as the outputs (o or y) from the previous unit or a default zero vectorGeothermal
A
181

So:

  1. One-to-one: you could use a Dense layer as you are not processing sequences:

    model.add(Dense(output_size, input_shape=input_shape))
    
  2. One-to-many: this option is not supported well as chaining models is not very easy in Keras, so the following version is the easiest one:

    model.add(RepeatVector(number_of_times, input_shape=input_shape))
    model.add(LSTM(output_size, return_sequences=True))
    
  3. Many-to-one: actually, your code snippet is (almost) an example of this approach:

    model = Sequential()
    model.add(LSTM(1, input_shape=(timesteps, data_dim)))
    
  4. Many-to-many: This is the easiest snippet when the length of the input and output matches the number of recurrent steps:

    model = Sequential()
    model.add(LSTM(1, input_shape=(timesteps, data_dim), return_sequences=True))
    
  5. Many-to-many when number of steps differ from input/output length: this is freaky hard in Keras. There are no easy code snippets to code that.

EDIT: Ad 5

In one of my recent applications, we implemented something which might be similar to many-to-many from the 4th image. In case you want to have a network with the following architecture (when an input is longer than the output):

                                        O O O
                                        | | |
                                  O O O O O O
                                  | | | | | | 
                                  O O O O O O

You could achieve this in the following manner:

model = Sequential()
model.add(LSTM(1, input_shape=(timesteps, data_dim), return_sequences=True))
model.add(Lambda(lambda x: x[:, -N:, :])) #Select last N from output

Where N is the number of last steps you want to cover (on image N = 3).

From this point getting to:

                                        O O O
                                        | | |
                                  O O O O O O
                                  | | | 
                                  O O O 

is as simple as artificial padding sequence of length N using e.g. with 0 vectors, in order to adjust it to an appropriate size.

Arundinaceous answered 27/3, 2017 at 13:19 Comment(12)
One clarification: For example for many to one, you use LSTM(1, input_shape=(timesteps, data_dim))) I thought the 1 stands for the number of LSTM cells/hidden nodes, but apperently not How would you code a Many-to-one with lets say, 512 nodes though than? (Because I read something simliar I thought it would be done with model.add(LSTM(512, input_shape=...)) model.add(Dense(1)) what is that used for than?)Wartburg
In this case - your code - after correcting a typo should be ok.Flita
Why do we use the RepeatVector, and not a vector with the first entry 1= 0 and all the other entries = 0 (according to the picture above, the is no Input at all at the later states, and not always the same input, what Repeat Vector would do in my understanding)Wartburg
If you think carefully about this picture - it's only a conceptual presentation of an idea of one-to-many. All of this hidden units must accept something as an input. So - they might accept the same input as well input with the first input equal to x and other equal to 0. But - on the other hand - they might accept the same x repeated many times as well. Different approach is to chain models which is hard in Keras. The option I provided is the easiest case of one-to-many architecture in Keras.Flita
Nice ! Iam thinking about using LSTM N to N in a GAN architecture. I will have a LSTM based generator. I will give this generetor (as used in "Latent variable" in gans) the first half of the time series and this generator will produce the second half of the time series. Then I will combine the two halfs (real and generated) to produce the "fake" input for the gan. Do you think using the poin 4 of you soluction will work ? or, in another words, is this (solution 4) the right way to do this ?Devotion
@MarcinMożejko In your 'one-many' scenario how are you connecting the repeatvector with the LSTM layer?How should I set value of ''number_of_times'' in repeatetvector? Doesn't Keras by itself, finds the no. of time-steps required to build the model, and then repeat the input vector that no. of times?Sinker
@MarcinMożejko so you don't need to explicitly tell the model the length of the output sequence? You just use return_sequence=True and it infers the rest?Dragonnade
in the many-to-many example, there are 2 cases (in the op). One with and one without "offset". how do the models compare in a sample implementation? What is the difference when implementing those?Clement
How do you do the second Many 2 Many? to you put a mask on the last timesteps?Dermatologist
How is your Many-to-one different from Many-to-many? I mean what difference does adding return_sequences=True make?Analyst
Could you help in how to correctly feed the multidimensional data for many to many or autoencoder model? Let's say we have a total data set stored in an array with a shape (45000, 100, 6) = (Nsample, Ntimesteps, Nfeatures) i.e. we have a 45000 samples with 100 time steps and 6 features.Senhor
In many to many with unequal input and output can we use number 4 with padding?Mangosteen
L
17

Great Answer by @Marcin Możejko

I would add the following to NR.5 (many to many with different in/out length):

A) as Vanilla LSTM

model = Sequential()
model.add(LSTM(N_BLOCKS, input_shape=(N_INPUTS, N_FEATURES)))
model.add(Dense(N_OUTPUTS))

B) as Encoder-Decoder LSTM

model.add(LSTM(N_BLOCKS, input_shape=(N_INPUTS, N_FEATURES))  
model.add(RepeatVector(N_OUTPUTS))
model.add(LSTM(N_BLOCKS, return_sequences=True))  
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear')) 
Lavernalaverne answered 9/7, 2019 at 7:43 Comment(2)
Could you please explain the details of the B) Encoder-Decoder LSTM architecture? I'm having issues understanding the roles of "RepeatVector" / "TimeDistributed" steps.Samale
Could you please help in how to correctly feed the multidimensional data for many to many or encoder-decoder model? I'm mostly struggling with shape. Let's say we have a total data set stored in an array with a shape (45000, 100, 6) = (Nsample, Ntimesteps, Nfeatures) i.e. we have a 45000 samples with 100 time steps and 6 features.Senhor

© 2022 - 2024 — McMap. All rights reserved.