What is the difference between return state and return sequence in a keras GRU layer?
Asked Answered
B

1

9

I can't seem to wrap my head around the difference between return state and return sequence in a keras GRU layer.

Since a GRU unit does not have a cell state (it is equal to the ouput), how does return state differ from return sequence in a keras GRU layer?

More specifically, I built an encoder-decoder LSTM model with one encoder layer and one decoder layer. The encoder layer returns its state (return_state = TRUE) and the decoder layer uses these states as initial state (initial_state = encoder_states).

When trying to do this with GRU layers, I do not understand what states are passed between the encoder and decoder layer. Please let me know if you can clarify this. Thanks in advance.

Belie answered 26/2, 2019 at 14:10 Comment(1)
You can refer this blog. It explains your question in the context of a LSTM. But, can provide a basic idea since both GRU and LSTM are recurrent units.Weintraub
I
7

The "state" of a GRU layer will usually be be same as the "output". However if you pass in return_state=True and return_sequence=True then the output of the layer will the output after each element of the sequence but the state will only be the state after the last element of the sequence is processed.

Here's an example of an encoder/decoder for a seq-2-seq network using GRU layers

#Create layers
encoder_input_layer = Input(shape=(None,))
encoder_embedding_layer = Embedding(len(vocab), THOUGHT_VECTOR_SIZE)
encoder_gru_layer = GRU(THOUGHT_VECTOR_SIZE, return_state=True)

decoder_input_layer = Input(shape=(None,))
decoder_embedding_layer = Embedding(len(vocab), THOUGHT_VECTOR_SIZE)
decoder_gru_layer = GRU(THOUGHT_VECTOR_SIZE, return_sequences=True)
decoder_dense_layer = Dense(len(vocab), activation='softmax')


#connect network
encoder = encoder_embedding_layer(encoder_input_layer)
encoder, encoder_state = encoder_gru_layer(encoder)

decoder = decoder_embedding_layer(decoder_input_layer)
decoder = decoder_gru_layer(decoder, initial_state=encoder_state)
decoder = decoder_dense_layer(decoder)

model = Model([encoder_input_layer, decoder_input_layer], decoder)

But to your point, using return_state isn't really necessary here as the output and state from encoder_gru_layer will be the same.

Impersonality answered 26/2, 2019 at 14:49 Comment(2)
Thanks for you quick reply. So a model with two hidden GRU layers, in which only return_sequence is set to true, is essentially the same as the encoder-decoder model that you presented?Belie
@Belie yeah setting return_state=True on the encoder GRU layer is not really necessary.Impersonality

© 2022 - 2024 — McMap. All rights reserved.