I have been working toward the exact same goal, and just got it to work. You have many of the right modifications here, but I think you've missed a few steps.
First, for generating text you need to create a different version of the model which represents only a single timestep. The reason is that we need to sample each output y before we can feed it into the next step of the model. I did this by making a new config which sets num_steps
and batch_size
both equal to 1.
class SmallGenConfig(object):
"""Small config. for generation"""
init_scale = 0.1
learning_rate = 1.0
max_grad_norm = 5
num_layers = 2
num_steps = 1 # this is the main difference
hidden_size = 200
max_epoch = 4
max_max_epoch = 13
keep_prob = 1.0
lr_decay = 0.5
batch_size = 1
vocab_size = 10000
I also added a probabilities to the model with these lines:
self._output_probs = tf.nn.softmax(logits)
and
@property
def output_probs(self):
return self._output_probs
Then, there are a few differences in my generate_text()
function. The first one is that I load saved model parameters from disk using the tf.train.Saver()
object. Note that we do this after instantiating the PTBModel with the new config from above.
def generate_text(train_path, model_path, num_sentences):
gen_config = SmallGenConfig()
with tf.Graph().as_default(), tf.Session() as session:
initializer = tf.random_uniform_initializer(-gen_config.init_scale,
gen_config.init_scale)
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel(is_training=False, config=gen_config)
# Restore variables from disk.
saver = tf.train.Saver()
saver.restore(session, model_path)
print("Model restored from file " + model_path)
The second difference is that I get the lookup table from ids to word strings (I had to write this function, see the code below).
words = reader.get_vocab(train_path)
I set up the initial state the same way you do, but then I set up the initial token in a different manner. I want to use the "end of sentence" token so that I'll start my sentence with the right types of words. I looked through the word index and found that <eos>
happens to have index 2 (deterministic) so I just hard-coded that in. Finally, I wrap it in a 1x1 Numpy Matrix so that it is the right type for the model inputs.
state = m.initial_state.eval()
x = 2 # the id for '<eos>' from the training set
input = np.matrix([[x]]) # a 2D numpy matrix
Finally, here's the part where we generate sentences. Note that we tell session.run()
to compute the output_probs
and the final_state
. And we give it the input and the state. In the first iteration the input is <eos>
and the state is the initial_state
, but on subsequent iterations we give as input our last sampled output, and we pass the state along from the last iteration. Note also that we use the words
list to look up the word string from the output index.
text = ""
count = 0
while count < num_sentences:
output_probs, state = session.run([m.output_probs, m.final_state],
{m.input_data: input,
m.initial_state: state})
x = sample(output_probs[0], 0.9)
if words[x]=="<eos>":
text += ".\n\n"
count += 1
else:
text += " " + words[x]
# now feed this new word as input into the next iteration
input = np.matrix([[x]])
Then all we have to do is print out the text we accumulated.
print(text)
return
That's it for the generate_text()
function.
Finally, let me show you the function definition for get_vocab()
, which I put in reader.py.
def get_vocab(filename):
data = _read_words(filename)
counter = collections.Counter(data)
count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))
words, _ = list(zip(*count_pairs))
return words
The last thing you need to do is to be able to save the model after training it, which looks like
save_path = saver.save(session, "/tmp/model.ckpt")
And that's the model that you'll load from disk later when generating text.
There was one more problem: I found that sometimes the probability distribution produced by the Tensorflow softmax function didn't sum exactly to 1.0. When the sum was larger than 1.0, np.random.multinomial()
throws an error. So I had to write my own sampling function, which looks like this
def sample(a, temperature=1.0):
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
r = random.random() # range: [0,1)
total = 0.0
for i in range(len(a)):
total += a[i]
if total>r:
return i
return len(a)-1
When you put all this together, the small model was able to generate me some cool sentences.