Tensorflow embedding_lookup

Asked 9/2, 2016 at 14:57 Answered 8/9, 2016 at 6:25

Solved python python-2.7 machine-learning tensorflow word-embedding

I am trying to learn the word representation of the imdb dataset "from scratch" through the TensorFlow tf.nn.embedding_lookup() function. If I understand it correctly, I have to set up an embedding layer before the other hidden layer, and then when I perform gradient descent, the layer will "learn" a word representation in the weights of this layer. However, when I try to do this, I get a shape error between my embedding layer and the first fully-connected layer of my network.

def multilayer_perceptron(_X, _weights, _biases):
    with tf.device('/cpu:0'), tf.name_scope("embedding"):
        W = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W")
        embedding_layer = tf.nn.embedding_lookup(W, _X)    
    layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(embedding_layer, _weights['h1']), _biases['b1'])) 
    layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2'])) 
    return tf.matmul(layer_2, weights['out']) + biases['out']

x = tf.placeholder(tf.int32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])

pred = multilayer_perceptron(x, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(cost)

init = tf.initialize_all_variables()

The error I get is:

ValueError: Shapes TensorShape([Dimension(None), Dimension(300), Dimension(128)])
and TensorShape([Dimension(None), Dimension(None)]) must have the same rank

Rebate answered 9/2, 2016 at 14:57 Comment(0)

The shape error arises because you are using a two-dimensional tensor, x to index into a two-dimensional embedding tensor W. Think of tf.nn.embedding_lookup() (and its close cousin tf.gather()) as taking each integer value i in x and replacing it with the row W[i, :]. From the error message, one can infer that n_input = 300 and embedding_size = 128. In general, the result of tf.nn.embedding_lookup() number of dimensions equal to rank(x) + rank(W) - 1… in this case, 3. The error arises when you try to multiply this result by _weights['h1'], which is a (two-dimensional) matrix.

To fix this code, it depends on what you're trying to do, and why you are passing in a matrix of inputs to the embedding. One common thing to do is to aggregate the embedding vectors for each input example into a single row per example using an operation like tf.reduce_sum(). For example, you might do the following:

W = tf.Variable(
    tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0) ,name="W")
embedding_layer = tf.nn.embedding_lookup(W, _X)

# Reduce along dimension 1 (`n_input`) to get a single vector (row)
# per input example.
embedding_aggregated = tf.reduce_sum(embedding_layer, [1])

layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(
    embedding_aggregated, _weights['h1']), _biases['b1']))

Hundley answered 9/2, 2016 at 15:54 Comment(2)

Thanks ! I think tf.nn.reduce_sum should be tf.reduce_sum ? When you choose to reduce the dimension of the embedding layer, how did you choose the one to reduce between "n_input=300" and "embedding_size=128" ? – Rebate 9/2, 2016 at 17:48

You're right about the typo - corrected it above, thanks! I chose to reduce along the n_input dimension because it seemed more likely that that would match your problem, and I assumed that (e.g.) the order of inputs was not important. It's fairly typical to do this for bag-of-words type problems. You could reduce along embedding_size but I think that would lose a lot of information from the embedding, so it probably wouldn't work as well. – Hundley 9/2, 2016 at 17:53

One another possible solution is : Instead of adding the embedding vectors, concatenate these vectors into a single vector and increase the number of neurons in the hidden layer.
I used :
embedding_aggregated = tf.reshape(embedding_layer, [-1, embedding_size * sequence_length])
Also, i changed the number of neurons in hidden layer to embedding_size * sequence_length. Observation : Accuracy also improved on using concatenation rather than addition.

Colman answered 8/9, 2016 at 6:25 Comment(0)

Recommended topics

Hot tags