Getting error while adding embedding layer to lstm autoencoder
Asked Answered
H

1

3

I have a seq2seq model which is working fine. I want to add an embedding layer in this network which I faced with an error.

this is my architecture using pretrained word embedding which is working fine(Actually the code is almost the same code available here, but I want to include the Embedding layer in the model rather than using the pretrained embedding vectors):

LATENT_SIZE = 20

inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input")

encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
encoded = Lambda(rev_ent)(encoded)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = Bidirectional(LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
NUM_EPOCHS = 1

num_train_steps = len(Xtrain) // BATCH_SIZE
num_test_steps = len(Xtest) // BATCH_SIZE

checkpoint = ModelCheckpoint(filepath=os.path.join('Data/', "simple_ae_to_compare"), save_best_only=True)
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, validation_data=test_gen, validation_steps=num_test_steps, callbacks=[checkpoint])

This is the summary:

Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 45, 50)            0         
_________________________________________________________________
encoder_lstm (Bidirectional) (None, 20)                11360     
_________________________________________________________________
lambda_1 (Lambda)            (512, 20)                 0         
_________________________________________________________________
repeater (RepeatVector)      (512, 45, 20)             0         
_________________________________________________________________
decoder_lstm (Bidirectional) (512, 45, 50)             28400  

when I change the code to add the embedding layer like this:

inputs = Input(shape=(SEQUENCE_LEN,), name="input")

embedding = Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)

I received this error:

expected decoder_lstm to have 3 dimensions, but got array with shape (512, 45)

So my question, what is wrong with my model?

Update

So, this error is raised in the training phase. I also checked the dimension of the data being fed to the model, it is (61598, 45) which clearly do not have the number of features or here, Embed_dim.

But why this error raises in the decoder part? because in the encoder part I have included the Embedding layer, so it is totally fine. though when it reached the decoder part and it does not have the embedding layer so it can not correctly reshape it to three dimensional.

Now the question comes why this is not happening in a similar code? this is my view, correct me if I'm wrong. because Seq2Seq code usually being used for Translation, summarization. and in those codes, in the decoder part also there is input (in the translation case, there is the other language input to the decoder, so the idea of having embedding in the decoder part makes sense). Finally, here I do not have seperate input, that's why I do not need any separate embedding in the decoder part. However, I don't know how to fix the problem, I just know why this is happening:|

Update2

this is my data being fed to the model:

   sent_wids = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'int32')
sample_seq_weights = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'float')
for index_sentence in range(len(parsed_sentences)):
    temp_sentence = parsed_sentences[index_sentence]
    temp_words = nltk.word_tokenize(temp_sentence)
    for index_word in range(SEQUENCE_LEN):
        if index_word < sent_lens[index_sentence]:
            sent_wids[index_sentence,index_word] = lookup_word2id(temp_words[index_word])
        else:
            sent_wids[index_sentence, index_word] = lookup_word2id('PAD')

def sentence_generator(X,embeddings, batch_size, sample_weights):
    while True:
        # loop once per epoch
        num_recs = X.shape[0]
        indices = np.random.permutation(np.arange(num_recs))
        # print(embeddings.shape)
        num_batches = num_recs // batch_size
        for bid in range(num_batches):
            sids = indices[bid * batch_size : (bid + 1) * batch_size]
            temp_sents = X[sids, :]
            Xbatch = embeddings[temp_sents]
            weights = sample_weights[sids, :]
            yield Xbatch, Xbatch
LATENT_SIZE = 60

train_size = 0.95
split_index = int(math.ceil(len(sent_wids)*train_size))
Xtrain = sent_wids[0:split_index, :]
Xtest = sent_wids[split_index:, :]
train_w = sample_seq_weights[0: split_index, :]
test_w = sample_seq_weights[split_index:, :]
train_gen = sentence_generator(Xtrain, embeddings, BATCH_SIZE,train_w)
test_gen = sentence_generator(Xtest, embeddings , BATCH_SIZE,test_w)

and parsed_sentences is 61598 sentences which are padded.

Also, this is the layer I have in the model as Lambda layer, I just added here in case it has any effect ever:

def rev_entropy(x):
        def row_entropy(row):
            _, _, count = tf.unique_with_counts(row)
            count = tf.cast(count,tf.float32)
            prob = count / tf.reduce_sum(count)
            prob = tf.cast(prob,tf.float32)
            rev = -tf.reduce_sum(prob * tf.log(prob))
            return rev

        nw = tf.reduce_sum(x,axis=1)
        rev = tf.map_fn(row_entropy, x)
        rev = tf.where(tf.is_nan(rev), tf.zeros_like(rev), rev)
        rev = tf.cast(rev, tf.float32)
        max_entropy = tf.log(tf.clip_by_value(nw,2,LATENT_SIZE))
        concentration = (max_entropy/(1+rev))
        new_x = x * (tf.reshape(concentration, [BATCH_SIZE, 1]))
        return new_x

Any help is appreciated:)

Hagiography answered 3/6, 2019 at 20:16 Comment(9)
I thought I am doing very clear mistake! getting no answer or comment :(Hagiography
When are you getting this error? When you start training or when you are building the model?Pawnbroker
@Pawnbroker thanks for following up, it is raising during trainingHagiography
I think I got it what is the dimensional problem. however, I do not know how to fix it. I will update my question, please have a look.Hagiography
I saw the update. What is the problem you are solving with the model? Also what are the labels?Pawnbroker
@Pawnbroker So, I fed the 20newsgroup dataset into the seq2seq model. it is not for translation or summarization, rather I am interested in the encoder layer. I want to work on the encoder layer to see what words left there after say 50 epoch. the input is a sequence of 45 words. Am I make sense?Hagiography
Can you tell what is going on in the rev_ent? Can't run the code without that. And can you please post the entire model (with all layers) after you change your code to use embeddings?Jeffiejeffrey
@thushv89, Thank you for following up with my question. actually you can comment that layer, because even with commenting that layer the same error raises, so it means that layer is not making any problem(probably).Hagiography
@sariii, can you edit your question to have train_gen function. Are you sure you are feeding in the labels/outputs with the correct shape?Jeffiejeffrey
J
1

I tried the following example on Google colab (TensorFlow version 1.13.1),

from tensorflow.python import keras
import numpy as np

SEQUENCE_LEN = 45
LATENT_SIZE = 20
EMBED_SIZE = 50
VOCAB_SIZE = 100

inputs = keras.layers.Input(shape=(SEQUENCE_LEN,), name="input")

embedding = keras.layers.Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)

encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()

And then trained the model using some random data,


x = np.random.randint(0, 90, size=(10, 45))
y = np.random.normal(size=(10, 45, 50))
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS)

This solution worked fine. I feel like the issue might be the way you are feeding in labels/outputs for MSE calculation.

Update

Context

In the original problem, you are attempting to reconstruct word embeddings using a seq2seq model, where embeddings are fixed and pre-trained. However you want to use a trainable embedding layer as a part of the model it becomes very difficult to model this problem. Because you don't have fixed targets (i.e. targets change every single iteration of the optimization because your embedding layer is changing). Furthermore this will lead to a very unstable optimization problem, because the targets are changing all the time.

Fixing your code

If you do the following you should be able to get the code working. Here embeddings is the pre-trained GloVe vector numpy.ndarray.

def sentence_generator(X, embeddings, batch_size):
    while True:
        # loop once per epoch
        num_recs = X.shape[0]
        embed_size = embeddings.shape[1]
        indices = np.random.permutation(np.arange(num_recs))
        # print(embeddings.shape)
        num_batches = num_recs // batch_size
        for bid in range(num_batches):
            sids = indices[bid * batch_size : (bid + 1) * batch_size]
            # Xbatch is a [batch_size, seq_length] array
            Xbatch = X[sids, :] 

            # Creating the Y targets
            Xembed = embeddings[Xbatch.reshape(-1),:]
            # Ybatch will be [batch_size, seq_length, embed_size] array
            Ybatch = Xembed.reshape(batch_size, -1, embed_size)
            yield Xbatch, Ybatch
Jeffiejeffrey answered 7/6, 2019 at 8:14 Comment(32)
thank you so much for putting efforts. can I ask you why did you feed embedding into two encoding layer? should not the second be encoded? like a chain?Hagiography
I have also updated the question with the way I prepared the data, which is the same as the link I provided in the question except I have changed the data to 20newsgroup.Hagiography
It seems the preparation of my data has a problem. I genuinely thank you if you can have a look at it and let me know what can I do, reshaping does not work, it says can not reshape it. on the other hand, I was wondering why that code works without Embedding layer addedHagiography
Hi @sariii, about the duplicated encoding layers, that was a mistake. I have corrected that.Jeffiejeffrey
I still do not get what is wrong with my code. Did you have a chance look at the second update I made and include the preparation data? Sorry if I'm taking your time, I really need it and the more I browse I don't find any reason for thatHagiography
Can you please have a look at the update, I have got no clue what is wrong with the preparation of the data in my side!Hagiography
Hi @sariii, the what you're doing in the data generators is not very clear to me. Can you add more text/comments explaining what's going on? For example I see Xbatch = embeddings[temp_sents]. Why would you use embeddings while generating data, because the embedding lookup should happen while propagating through the model, not before feeding into the model.Jeffiejeffrey
thanks for backing to me. Actually, that is one approach to get the weight of already trained embeddings. the code is like this one github.com/PacktPublishing/Deep-Learning-with-Keras/blob/master/…. So, my purpose is to not use the already trained embedding, but traine the embedding during the training of the model. When on that function just returning 'temp_sents' again the error raises.Hagiography
The embeddings will be trained during model training as long as you leave the embedding layer trainable=True. That means no matter you want to train the embeddings / use pretrained embeddings, the embedding lookup should only happen during the forward propagation. I am not sure how to fix your existing code as I don't know which problem you are solving, what X looks like, etc. But I can change my solution to work with a data generator if that helps. You will then have to compare and contrast with my solution to see where things are going wrong.Jeffiejeffrey
That also sounds good, the link I provided above is one of the chapter of the keras deep learning book, which they design the model like this. they did not include the embedding in the layer but I want to include it. Sothe code is well commented as well.Hagiography
I want to use the encoder part of the model like feature extractionHagiography
About the shape of the input data, now temp_sent= (batch_size, sequence_len) and the thing that generator return is (batch_size,seq_len, embed_size). sorry for commenting too much, I really need this work I have tried various ways but unable to figure it out!Hagiography
I think I am beginning to understand what you are doing. Seems you are trying to reconstruct the embeddings of sentences using a seq2seq model. And in the book. They are using the pretrained GloVe vectors for that (still don't understand the purpose of doing that but that's not relevant). Then you need to use a trainable embedding layer to do this instead of the pretrained vecs. I feel this is a very dangerous thing to do. Because you are essentially using an changing set of values (i.e. embedding vectors) as your targets. This will usually lead to a very unstable optimization problem.Jeffiejeffrey
Yea actually their idea for using the pre-trained vectors is not good, that's why I want to have an embedding independently in the model itself. So, a general question, I just want a simple seq2seq model, which reconstruct the input at the output (the same as the autoencoder but here we have the seq of data), dont you think this is doing this, except having the idea of using the pretraned glove embedding?Hagiography
I am actually saying that using trainable embeddings alone is a bad idea. What is probably a better solution is, you train the embedding layer of the model while using the GloVe embeddings as the targets. I think I can provide you a solution. But as said previously, that solution will require having two different embedding layers (1 - trainable embedding layer of the model, and 2- the embeddings of the GloVe which is pretrained and fix)Jeffiejeffrey
that would be great really. Also, I see your point, in my final code im not using the pretrained glove. I trained w2vec on my data and am using those as embedding. Can you please also say that if this summary I say is correct: this model is a seq2seq model which tries to reconstruct the same input at the output,and here each feature has 50 dim (embed_size)?Hagiography
I have already updated my solution with the fix you require. About the summary, yeah I think it's correct.Jeffiejeffrey
Thank you so much for all your help, I'll definitely try this tomorrow morning, I just forced my self to be awake as its 1 AM :)Hagiography
I can not thank you more, I truly and genuinly appreciate your help :)Hagiography
Regarding your point, for being unstable, do you think it is doable to have the same trained embedding in the output? I mean the same layer which is in the input have it in the output? like transform it in the y_target so the model wont be unstable, just curious to know.Hagiography
Not sure what you mean. Are you thinking of having the same word embeddings that are being trained as the output targets? That's actually what I believe will be unstable. Because your targets are not fixed. They will change every single iteration. So you are trying to optimize your model against something that is not constant, which is a bad idea!!! But that is doable. You will have to define at least two models. One to get the embeddings of a given set of word IDs and the other would be the normal sequence to sequence model. Then you use the first model to get target values.Jeffiejeffrey
No worries. Happy to help :)Jeffiejeffrey
Sorry, but I have the last question (I promise), why I can not set the return_state=True here? with exactly this same code when I add return_state=True I face with error(this is the last thing I need to add to see how the result changes). actually, the reason for this is that I want to keep the memory of the batches, like if the batch changes, I want my model to remember the features it has seen in the previous batch. I genuinely appreciate if you can help me with this last question. And really sorry for long comments question answering.Hagiography
Hey sorry took this long (missed your comment). Can you tell me what you are planning to do with the state? But you should be able to use return_state=True. And also the error that you got will be helpful.Jeffiejeffrey
@thshv89 thank you so much for getting back to me, I will post a question with clear explanations I hope you have time to take a look :)Hagiography
So I had plan to ask new question, but my question is kind of vague, I appreciate if you can shed light on it as from your explanations about the embedding I got what you mean by instability and reconstructing the embedding!!. I followed the same thing as the book suggestion, but instead of using already pre-trained glove vector I used the wv trained on my data set. Now, when I look at the features in the middle layer(the whole features I saved in a data frame in each epoch) there are only 50 unique features which I think because the decoder part we have embed size as the number of neurons.Hagiography
But I need to have the whole vocab in the decoder part, and at the same time having the embedding layer (trainable=False) in my model. do you think this one makes sense at all in terms of the architecture?. Really I appreciate your help I wanted to start a new question but I felt there is no coding issue here but I just dont know if this is making sense or not!Hagiography
Hi @sariii, totally agree with you that this thread is now too long. Maybe you can post the question on DataScience stackexchange if there's not much coding, and I can have a look?Jeffiejeffrey
I know it sounds crazy, can I ask you have a look at the last inquiries I have regarding this model? #56938658Hagiography
Hey @sariii, sorry about the delay. I've been pretty busy. Will have a look today.Jeffiejeffrey
Hey @sariii, looks to me that the question has been answered or am I missing something?Jeffiejeffrey
Thanks a lot for following with my question, I understand that you are busy and not frequently checking. It sounds that its not logical the thing Im looking for. I want the output matrice of encoder layer be like latent_size, vocab_size an d at the same time I do not want to represent words as one_hot encoding. Based on his answer I can not do such a thing.Hagiography

© 2022 - 2024 — McMap. All rights reserved.