I was following the Keras Seq2Seq tutorial, and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions.
If you follow the character-level model, the input data is of 3 dims: #sequences
, #max_seq_len
, #num_char
since each character is one-hot encoded. When I plot the summary for the model as used in the tutorial, I get:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, 71) 0
_____________________________________________________________________________ __________________
input_2 (InputLayer) (None, None, 94) 0
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, 256), (None, 335872 input_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) [(None, None, 256), 359424 input_2[0][0]
lstm_1[0][1]
lstm_1[0][2]
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 94) 24158 lstm_2[0][0]
==================================================================================================
This compiles and trains just fine.
Now this tutorial has section "What if I want to use a word-level model with integer sequences?" And I've tried to follow those changes. Firstly, I encode all sequences using a word index. As such, the input and target data is now 2 dims: #sequences
, #max_seq_len
since I no longer one-hot encode but use now Embedding layers.
encoder_input_data_train.shape => (90000, 9)
decoder_input_data_train.shape => (90000, 16)
decoder_target_data_train.shape => (90000, 16)
For example, a sequence might look like this:
[ 826. 288. 2961. 3127. 1260. 2108. 0. 0. 0.]
When I use the listed code:
# encoder
encoder_inputs = Input(shape=(None, ))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]
# decoder
decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
the model compiles and looks like this:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_35 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
input_36 (InputLayer) (None, None) 0
__________________________________________________________________________________________________
embedding_32 (Embedding) (None, None, 256) 914432 input_35[0][0]
__________________________________________________________________________________________________
embedding_33 (Embedding) (None, None, 256) 914432 input_36[0][0]
__________________________________________________________________________________________________
lstm_32 (LSTM) [(None, 256), (None, 525312 embedding_32[0][0]
__________________________________________________________________________________________________
lstm_33 (LSTM) (None, None, 256) 525312 embedding_33[0][0]
lstm_32[0][1]
lstm_32[0][2]
__________________________________________________________________________________________________
dense_21 (Dense) (None, None, 3572) 918004 lstm_33[0][0]
While compile works, training
model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=32, epochs=1, validation_split=0.2)
fails with the following error: ValueError: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (90000, 16)
with the latter being the shape of the decoder input/target. Why does the Dense
layer an array of the shape of the decoder input data?
Things I've tried:
- I find it a bit strange that the decoder LSTM has a
return_sequences=True
since I thought I cannot give a sequences to aDense
layer (and the decoder of the original character-level model does not state this). However, simply removing or settingreturn_sequences=False
did not help. Of course, theDense
layer now has an output shape of(None, 3572)
. - I don' quite get the need for the
Input
layers. I've set them toshape=(max_input_seq_len, )
andshape=(max_target_seq_len, )
respectively so that the summary doesn't show(None, None)
but the respective values, e.g.,(None, 16)
. No change. - In the Keras Docs I've read that an Embedding layer should be used with
input_length
, otherwise aDense
layer upstream cannot compute its outputs. But again, still errors when I setinput_length
accordingly.
I'm a bit at a deadlock right? Am I even on the right track or do I missing something more fundamentally. Is the shape of my data wrong? Why does the last Dense
layer get array with shape (90000, 16)
? That seems rather off.
UPDATE: I figured out that the problem seems to be decoder_target_data
which currently has the shape (#sample, max_seq_len)
, e.g., (90000, 16)
. But I assume I need to one-hot encode the target output with respect to the vocabulary: (#sample, max_seq_len, vocab_size)
, e.g., (90000, 16, 3572)
.
Unfortunately, this throws a Memory
error. However, when I do for debugging purposes, i.e., assume a vocabulary size of 10:
decoder_target_data = np.zeros((len(input_sequences), max_target_seq_len, 10), dtype='float32')
and later in the decoder model:
x = Dense(10, activation='softmax')(x)
then the model trains without error. In case that's indeed my issue, I have to train the model with manually generate batches so I can keep the vocabulary size but reduce the #samples
, e.g., to 90 batches each of shape (1000, 16, 3572)
. Am I on the right track here?
(batchsize, num_encoder_tokens, latent_dim)
. Something wrong with embedding perhaps. Can you tryencoder_inputs = Input(num_encoder_tokens, ))
andx = Embedding(vocab_size, latent_dim)(encoder_inputs)
?? – Leftovernum_decoder_tokens
is myvocab_size
, which is 3,572 in my case. According to the original tutorialnum_decoder_tokens
is the number of unique output tokens. Regarding input shape, I've triedencoder_inputs = Input(shape=(max_input_seq_len, ))
since I assume the value reflect the length of the sequence. I just don't understand why the dense layer (or any layer for that matter) would ever see an array of (90000, 16) which is the shape of thedecoder_input_data
. The dense layer is clearly connected to a LSTM layer which outputs 33 dims. – Incineratex = TimeDistributed(Dense(num_decoder_tokens, activation='softmax'))(x)
had no effect. The decoder uses an embedding layer fordecoder_input_data
, but how would this work fordecoder_target_data
. I've checked the Pytorch Seq2Seq tutorial. Atloss += self.criterion(decoder_output, target_variable[di])
when I check the dimensions that thedecoder_output
is (1, #vocab_size) and target_variable[di] is of size 1. I assume that Keras needs thetarget_variable[di]
of (1, #vocab_size) explicitly; Pytorch does it more cleverly. Hence, I think I need to one-hot encode decoder_target_data. – Incineratesparse_categorical_crossentropy
loss. Can skip one hot encoding targets. – Leftoverdecoder_target_data
work? It would be awesome if the Keras team could do a complete working example at github.com/keras-team/keras/blob/master/examples – Aril