I can't seem to wrap my head around the difference between return state and return sequence in a keras GRU layer.
Since a GRU unit does not have a cell state (it is equal to the ouput), how does return state differ from return sequence in a keras GRU layer?
More specifically, I built an encoder-decoder LSTM model with one encoder layer and one decoder layer. The encoder layer returns its state (return_state = TRUE)
and the decoder layer uses these states as initial state (initial_state = encoder_states)
.
When trying to do this with GRU layers, I do not understand what states are passed between the encoder and decoder layer. Please let me know if you can clarify this. Thanks in advance.