I've been trying to understand the sample code with https://www.tensorflow.org/tutorials/recurrent which you can find at https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py
(Using tensorflow 1.3.0.)
I've summarized (what I think are) the key parts, for my question, below:
size = 200
vocab_size = 10000
layers = 2
# input_.input_data is a 2D tensor [batch_size, num_steps] of
# word ids, from 1 to 10000
cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(size) for _ in range(2)]
)
embedding = tf.get_variable(
"embedding", [vocab_size, size], dtype=tf.float32)
inputs = tf.nn.embedding_lookup(embedding, input_.input_data)
inputs = tf.unstack(inputs, num=num_steps, axis=1)
outputs, state = tf.contrib.rnn.static_rnn(
cell, inputs, initial_state=self._initial_state)
output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])
softmax_w = tf.get_variable(
"softmax_w", [size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())
logits = tf.matmul(output, softmax_w) + softmax_b
# Then calculate loss, do gradient descent, etc.
My biggest question is how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence? Concretely, I imagine the flow is like this, but I cannot get my head around what the code for the commented lines would be:
prefix = ["What", "is", "your"]
state = #Zeroes
# Call static_rnn(cell) once for each word in prefix to initialize state
# Use final output to set a string, next_word
print(next_word)
My sub-questions are:
- Why use a random (uninitialized, untrained) word-embedding?
- Why use softmax?
- Does the hidden layer have to match the dimension of the input (i.e. the dimension of the word2vec embeddings)
- How/Can I bring in a pre-trained word2vec model, instead of that uninitialized one?
(I'm asking them all as one question, as I suspect they are all connected, and connected to some gap in my understanding.)
What I was expecting to see here was loading an existing word2vec set of word embeddings (e.g. using gensim's KeyedVectors.load_word2vec_format()
), convert each word in the input corpus to that representation when loading in each sentence, and then afterwards the LSTM would spit out a vector of the same dimension, and we would try and find the most similar word (e.g. using gensim's similar_by_vector(y, topn=1)
).
Is using softmax saving us from the relatively slow similar_by_vector(y, topn=1)
call?
BTW, for the pre-existing word2vec part of my question Using pre-trained word2vec with LSTM for word generation is similar. However the answers there, currently, are not what I'm looking for. What I'm hoping for is a plain English explanation that switches the light on for me, and plugs whatever the gap in my understanding is. Use pre-trained word2vec in lstm language model? is another similar question.
UPDATE: Predicting next word using the language model tensorflow example and Predicting the next word using the LSTM ptb model tensorflow example are similar questions. However, neither shows the code to actually take the first few words of a sentence, and print out its prediction of the next word. I tried pasting in code from the 2nd question, and from https://mcmap.net/q/586236/-predicting-next-word-using-the-language-model-tensorflow-example (which comes with a github branch), but cannot get either to run without errors. I think they may be for an earlier version of TensorFlow?
ANOTHER UPDATE: Yet another question asking basically the same thing: Predicting Next Word of LSTM Model from Tensorflow Example It links to Predicting next word using the language model tensorflow example (and, again, the answers there are not quite what I am looking for).
In case it still isn't clear, what I am trying to write a high-level function called getNextWord(model, sentencePrefix)
, where model
is a previously built LSTM that I've loaded from disk, and sentencePrefix
is a string, such as "Open the", and it might return "pod". I then might call it with "Open the pod" and it will return "bay", and so on.
An example (with a character RNN, and using mxnet) is the sample()
function shown near the end of https://github.com/zackchase/mxnet-the-straight-dope/blob/master/chapter05_recurrent-neural-networks/simple-rnn.ipynb
You can call sample()
during training, but you can also call it after training, and with any sentence you want.
getNextWord(model, sentencePrefix)
from each suggested answer, but they each either ended in exceptions being thrown, or had a gap in the explanation that was really the point of the question. If/when I get this working, I will self-answer. (BTW, thanks for being first to answer - it really helped me clarify my question, which led to finding more related questions.) – Spermatozoid