How to build an embedding layer in Tensorflow RNN?
Asked Answered
S

2

8

I'm building an RNN LSTM network to classify texts based on the writers' age (binary classification - young / adult).

Seems like the network does not learn and suddenly starts overfitting:

rnn_overfitting
Red: train
Blue: validation

One possibility could be that the data representation is not good enough. I just sorted the unique words by their frequency and gave them indices. E.g.:

unknown -> 0
the     -> 1
a       -> 2
.       -> 3
to      -> 4

So I'm trying to replace that with word embedding. I saw a couple of examples but I'm not able to implement it in my code. Most of the examples look like this:

embedding = tf.Variable(tf.random_uniform([vocab_size, hidden_size], -1, 1))
inputs = tf.nn.embedding_lookup(embedding, input_data)

Does this mean we're building a layer that learns the embedding? I thought that one should download some Word2Vec or Glove and just use that.

Anyway let's say I want to build this embedding layer...
If I use these 2 lines in my code I get an error:

TypeError: Value passed to parameter 'indices' has DataType float32 not in list of allowed values: int32, int64

So I guess I have to change the input_data type to int32. So I do that (it's all indices after all), and I get this:

TypeError: inputs must be a sequence

I tried wrapping inputs (argument to tf.contrib.rnn.static_rnn) with a list: [inputs] as suggested in this answer, but that produced another error:

ValueError: Input size (dimension 0 of inputs) must be accessible via shape inference, but saw value None.


Update:

I was unstacking the tensor x before passing it to embedding_lookup. I moved the unstacking after the embedding.

Updated code:

MIN_TOKENS = 10
MAX_TOKENS = 30
x = tf.placeholder("int32", [None, MAX_TOKENS, 1])
y = tf.placeholder("float", [None, N_CLASSES]) # 0.0 / 1.0
...
seqlen = tf.placeholder(tf.int32, [None]) #list of each sequence length*
embedding = tf.Variable(tf.random_uniform([VOCAB_SIZE, HIDDEN_SIZE], -1, 1))
inputs = tf.nn.embedding_lookup(embedding, x) #x is the text after converting to indices
inputs = tf.unstack(inputs, MAX_POST_LENGTH, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, inputs, dtype=tf.float32, sequence_length=seqlen) #---> Produces error

*seqlen: I zero-padded the sequences so all of them have the same list size, but since the actual size differ, I prepared a list describing the length without the padding.

New error:

ValueError: Input 0 of layer basic_lstm_cell_1 is incompatible with the layer: expected ndim=2, found ndim=3. Full shape received: [None, 1, 64]

64 is the size of each hidden layer.

It's obvious that I have a problem with the dimensions... How can I make the inputs fit the network after embedding?

Stowers answered 4/9, 2018 at 14:10 Comment(0)
C
4

From the tf.nn.static_rnn , we can see the inputs arguments to be:

A length T list of inputs, each a Tensor of shape [batch_size, input_size]

So your code should be something like:

x = tf.placeholder("int32", [None, MAX_TOKENS])
...
inputs = tf.unstack(inputs, axis=1)
Canaigre answered 12/9, 2018 at 18:50 Comment(5)
That solved my ValueError problem (+ adjusting the input accordingly - reshaping the batches from (BATCH_SIZE, MAX_TOKENS, 1) to (BATCH_SIZE, MAX_TOKENS) to match with the new x shape). This however didn't solve the learning problem. The graphs now look like: this. I guess you deserve the bounty though (and +1).Stowers
Can you share the entire code, so that I can try it out locally?Canaigre
Instead of assigning unique values to the words, its better to assign a pretained embedding vectors from glove or word2vec and not train them.Canaigre
I tried also doing that (glove). The graphs looked similar of what I posted originally. So surely I'm doing something wrong. Here's the code (using glove): linkStowers
From the code, i see that you change the embedding vector of glove to a single value using np.linalg.norm, a single value may not be good enough to represent the word. I recommend you to change this.Canaigre
F
1

tf.squeeze is a method that removes dimensions of size 1 from the tensor. If the end goal is to have the input shape as [None,64], then put a line similar to inputs = tf.squeeze(inputs) and that would fix your problem.

Feudalism answered 8/9, 2018 at 20:58 Comment(5)
It didn't work. Where do I put that? If I put it just before the call to static_rnn, I get another error: Inputs must be a sequenceStowers
Try wrapping inputs in brackets to make tf.contrib.rnn.static_rnn(lstm_cell, [inputs], dtype=tf.float32, sequence_length=seqlen)Feudalism
I wrote that I tried that. Read the full post pleaseStowers
Did you do it with the tf.sqeeze? Could you also print the dtype and shape of the inputs right before the rnn call?Feudalism
Yes, as I said in my first comment, I tried your suggestion with squeeze and it didn't work. The types are tensorflow.python.framework.ops.Tensor for x, sqlen, and keep_prob, and dict for weights and biasesStowers

© 2022 - 2024 — McMap. All rights reserved.