Word-level Seq2Seq with Keras
Asked Answered
I

1

7

I was following the Keras Seq2Seq tutorial, and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions.

If you follow the character-level model, the input data is of 3 dims: #sequences, #max_seq_len, #num_char since each character is one-hot encoded. When I plot the summary for the model as used in the tutorial, I get:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, None, 71)     0                                            
_____________________________________________________________________________ __________________
input_2 (InputLayer)            (None, None, 94)     0                                            
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [(None, 256), (None, 335872      input_1[0][0]                    
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [(None, None, 256),  359424      input_2[0][0]                    
                                                                 lstm_1[0][1]                     
                                                                 lstm_1[0][2]                     
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, None, 94)     24158       lstm_2[0][0]                     
==================================================================================================

This compiles and trains just fine.

Now this tutorial has section "What if I want to use a word-level model with integer sequences?" And I've tried to follow those changes. Firstly, I encode all sequences using a word index. As such, the input and target data is now 2 dims: #sequences, #max_seq_len since I no longer one-hot encode but use now Embedding layers.

encoder_input_data_train.shape   =>  (90000, 9)
decoder_input_data_train.shape   =>  (90000, 16)
decoder_target_data_train.shape  =>  (90000, 16)

For example, a sequence might look like this:

[ 826.  288. 2961. 3127. 1260. 2108.    0.    0.    0.]

When I use the listed code:

# encoder
encoder_inputs = Input(shape=(None, ))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]

# decoder
decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

the model compiles and looks like this:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_35 (InputLayer)           (None, None)         0                                            
__________________________________________________________________________________________________
input_36 (InputLayer)           (None, None)         0                                            
__________________________________________________________________________________________________
embedding_32 (Embedding)        (None, None, 256)    914432      input_35[0][0]                   
__________________________________________________________________________________________________
embedding_33 (Embedding)        (None, None, 256)    914432      input_36[0][0]                   
__________________________________________________________________________________________________
lstm_32 (LSTM)                  [(None, 256), (None, 525312      embedding_32[0][0]               
__________________________________________________________________________________________________
lstm_33 (LSTM)                  (None, None, 256)    525312      embedding_33[0][0]               
                                                                 lstm_32[0][1]                    
                                                                 lstm_32[0][2]                    
__________________________________________________________________________________________________
dense_21 (Dense)                (None, None, 3572)   918004      lstm_33[0][0]                    

While compile works, training

model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=32, epochs=1, validation_split=0.2)

fails with the following error: ValueError: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (90000, 16) with the latter being the shape of the decoder input/target. Why does the Dense layer an array of the shape of the decoder input data?

Things I've tried:

  • I find it a bit strange that the decoder LSTM has a return_sequences=True since I thought I cannot give a sequences to a Dense layer (and the decoder of the original character-level model does not state this). However, simply removing or setting return_sequences=False did not help. Of course, the Dense layer now has an output shape of (None, 3572).
  • I don' quite get the need for the Input layers. I've set them to shape=(max_input_seq_len, ) and shape=(max_target_seq_len, ) respectively so that the summary doesn't show (None, None) but the respective values, e.g., (None, 16). No change.
  • In the Keras Docs I've read that an Embedding layer should be used with input_length, otherwise a Dense layer upstream cannot compute its outputs. But again, still errors when I set input_length accordingly.

I'm a bit at a deadlock right? Am I even on the right track or do I missing something more fundamentally. Is the shape of my data wrong? Why does the last Dense layer get array with shape (90000, 16)? That seems rather off.

UPDATE: I figured out that the problem seems to be decoder_target_data which currently has the shape (#sample, max_seq_len), e.g., (90000, 16). But I assume I need to one-hot encode the target output with respect to the vocabulary: (#sample, max_seq_len, vocab_size), e.g., (90000, 16, 3572).

Unfortunately, this throws a Memory error. However, when I do for debugging purposes, i.e., assume a vocabulary size of 10:

decoder_target_data = np.zeros((len(input_sequences), max_target_seq_len, 10), dtype='float32')

and later in the decoder model:

x = Dense(10, activation='softmax')(x)

then the model trains without error. In case that's indeed my issue, I have to train the model with manually generate batches so I can keep the vocabulary size but reduce the #samples, e.g., to 90 batches each of shape (1000, 16, 3572). Am I on the right track here?

Incinerate answered 11/2, 2018 at 3:50 Comment(11)
Yes, the array passed to the dense layer is wrong. It should be (batchsize, num_encoder_tokens, latent_dim). Something wrong with embedding perhaps. Can you try encoder_inputs = Input(num_encoder_tokens, )) and x = Embedding(vocab_size, latent_dim)(encoder_inputs)??Leftover
num_decoder_tokens is my vocab_size, which is 3,572 in my case. According to the original tutorial num_decoder_tokens is the number of unique output tokens. Regarding input shape, I've tried encoder_inputs = Input(shape=(max_input_seq_len, )) since I assume the value reflect the length of the sequence. I just don't understand why the dense layer (or any layer for that matter) would ever see an array of (90000, 16) which is the shape of the decoder_input_data. The dense layer is clearly connected to a LSTM layer which outputs 33 dims.Incinerate
Oh sorry, totally missed that you don't have a time distributed dense layer. Don't one hot encode. Use embeddings and turn your dense layer into time distributed dense. Means each timestep has its own dense. Otherwise, you'd be getting one single output token which apparently keras doesn't even allow. I have feeling there's something else missing too.. have a go.Leftover
x = TimeDistributed(Dense(num_decoder_tokens, activation='softmax'))(x) had no effect. The decoder uses an embedding layer for decoder_input_data, but how would this work for decoder_target_data. I've checked the Pytorch Seq2Seq tutorial. At loss += self.criterion(decoder_output, target_variable[di]) when I check the dimensions that the decoder_output is (1, #vocab_size) and target_variable[di] is of size 1. I assume that Keras needs the target_variable[di] of (1, #vocab_size) explicitly; Pytorch does it more cleverly. Hence, I think I need to one-hot encode decoder_target_data.Incinerate
Keras tutorial weirdly doesn't talk about some important tools for working with RNNs, I assumed it used/mentioned sparse_categorical_crossentropy loss. Can skip one hot encoding targets.Leftover
Did you ever figure it out @Christian? @Leftover how do you do it without one-hot encoding?Springhead
@Springhead No, I haven't. I usually work with Pytorch (which handles this more neatly) so I didn't spend to much time over it. What I ended up doing was split the training data into chunks so that each chunk fit into the memory, and of course looped over all chunks.Incinerate
It seems the way to get the model to compile and train is to use sparse_categorical_crossentropy as the loss with a TimeDistributed(Dense) layer and to do np.expand_dims(data,-1) on the decoder target data. However I now find myself with an issue that loss = nanSpringhead
Was there any update on this? Did one-hot encoding and time-shifting the decoder_target_data work? It would be awesome if the Keras team could do a complete working example at github.com/keras-team/keras/blob/master/examplesAril
Any update on this>? @user2258651, @Christian? I have the same issue and asked the question one week ago but did not get any answr yet!Lampert
Any update on this? I have a similar error. What have you done to fix it?Zook
C
0

Recently I was also facing this problem. There is no other solution then creating small batches say batch_size=64 in a generator and then instead of model.fit do model.fit_generator. I have attached my generate_batch code below:

def generate_batch(X, y, batch_size=64):
    ''' Generate a batch of data '''
    while True:
        for j in range(0, len(X), batch_size):
            encoder_input_data = np.zeros((batch_size, max_encoder_seq_length),dtype='float32')
            decoder_input_data = np.zeros((batch_size, max_decoder_seq_length+2),dtype='float32')
            decoder_target_data = np.zeros((batch_size, max_decoder_seq_length+2, num_decoder_tokens),dtype='float32')

            for i, (input_text_seq, target_text_seq) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
                for t, word_index in enumerate(input_text_seq):
                    encoder_input_data[i, t] = word_index # encoder input seq

                for t, word_index in enumerate(target_text_seq):
                    decoder_input_data[i, t] = word_index
                    if (t>0)&(word_index<=num_decoder_tokens):
                        decoder_target_data[i, t-1, word_index-1] = 1.

            yield([encoder_input_data, decoder_input_data], decoder_target_data)

And then training like this:

batch_size = 64
epochs = 2

# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit_generator(
    generator=generate_batch(X=X_train_sequences, y=y_train_sequences, batch_size=batch_size),
    steps_per_epoch=math.ceil(len(X_train_sequences)/batch_size),
    epochs=epochs,
    verbose=1,
    validation_data=generate_batch(X=X_val_sequences, y=y_val_sequences, batch_size=batch_size),
    validation_steps=math.ceil(len(X_val_sequences)/batch_size),
    workers=1,
    )

X_train_sequences is list of lists like [[23,34,56], [2, 33544, 6, 10]].
Similarly others.

Also took help from this blog - word-level-english-to-marathi-nmt

Catalog answered 16/4, 2020 at 17:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.