Use LSTM tutorial code to predict next word in a sentence?

Asked 8/9, 2017 at 21:55 Answered 19/1, 2018 at 18:45

python tensorflow lstm word2vec word-embedding

I've been trying to understand the sample code with https://www.tensorflow.org/tutorials/recurrent which you can find at https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py

(Using tensorflow 1.3.0.)

I've summarized (what I think are) the key parts, for my question, below:

 size = 200
 vocab_size = 10000
 layers = 2
 # input_.input_data is a 2D tensor [batch_size, num_steps] of
 #    word ids, from 1 to 10000

 cell = tf.contrib.rnn.MultiRNNCell(
    [tf.contrib.rnn.BasicLSTMCell(size) for _ in range(2)]
    )

 embedding = tf.get_variable(
      "embedding", [vocab_size, size], dtype=tf.float32)
 inputs = tf.nn.embedding_lookup(embedding, input_.input_data)

inputs = tf.unstack(inputs, num=num_steps, axis=1)
outputs, state = tf.contrib.rnn.static_rnn(
    cell, inputs, initial_state=self._initial_state)

output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])
softmax_w = tf.get_variable(
    "softmax_w", [size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())
logits = tf.matmul(output, softmax_w) + softmax_b

# Then calculate loss, do gradient descent, etc.

My biggest question is how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence? Concretely, I imagine the flow is like this, but I cannot get my head around what the code for the commented lines would be:

prefix = ["What", "is", "your"]
state = #Zeroes
# Call static_rnn(cell) once for each word in prefix to initialize state
# Use final output to set a string, next_word
print(next_word)

My sub-questions are:

Why use a random (uninitialized, untrained) word-embedding?
Why use softmax?
Does the hidden layer have to match the dimension of the input (i.e. the dimension of the word2vec embeddings)
How/Can I bring in a pre-trained word2vec model, instead of that uninitialized one?

(I'm asking them all as one question, as I suspect they are all connected, and connected to some gap in my understanding.)

What I was expecting to see here was loading an existing word2vec set of word embeddings (e.g. using gensim's KeyedVectors.load_word2vec_format()), convert each word in the input corpus to that representation when loading in each sentence, and then afterwards the LSTM would spit out a vector of the same dimension, and we would try and find the most similar word (e.g. using gensim's similar_by_vector(y, topn=1)).

Is using softmax saving us from the relatively slow similar_by_vector(y, topn=1) call?

BTW, for the pre-existing word2vec part of my question Using pre-trained word2vec with LSTM for word generation is similar. However the answers there, currently, are not what I'm looking for. What I'm hoping for is a plain English explanation that switches the light on for me, and plugs whatever the gap in my understanding is.　　Use pre-trained word2vec in lstm language model? is another similar question.

UPDATE: Predicting next word using the language model tensorflow example and Predicting the next word using the LSTM ptb model tensorflow example are similar questions. However, neither shows the code to actually take the first few words of a sentence, and print out its prediction of the next word. I tried pasting in code from the 2nd question, and from https://mcmap.net/q/586236/-predicting-next-word-using-the-language-model-tensorflow-example (which comes with a github branch), but cannot get either to run without errors. I think they may be for an earlier version of TensorFlow?

ANOTHER UPDATE: Yet another question asking basically the same thing: Predicting Next Word of LSTM Model from Tensorflow Example It links to Predicting next word using the language model tensorflow example (and, again, the answers there are not quite what I am looking for).

In case it still isn't clear, what I am trying to write a high-level function called getNextWord(model, sentencePrefix), where model is a previously built LSTM that I've loaded from disk, and sentencePrefix is a string, such as "Open the", and it might return "pod". I then might call it with "Open the pod" and it will return "bay", and so on.

An example (with a character RNN, and using mxnet) is the sample() function shown near the end of https://github.com/zackchase/mxnet-the-straight-dope/blob/master/chapter05_recurrent-neural-networks/simple-rnn.ipynb You can call sample() during training, but you can also call it after training, and with any sentence you want.

Spermatozoid answered 8/9, 2017 at 21:55 Comment(7)

Unfortunately, as of the time I needed to give the bounty, none of the answers worked for me; that is why I am leaving it un-ticked for the moment. I gave the bounty to the answer that appeared to be answering my key question most closely. – Spermatozoid 18/9, 2017 at 18:13

The answers didn't work for you because there is no generic answer for all language model implementation, each implementation is a little different. I think that this question should choose the level to ask, either intuitive understanding or specific code implementation. Not that I'm against the question though, I did up vote it. Actually if you have the understandings of the model and have fluency in Python, implementing would be not difficult. It takes time though, so if you posted your solution for this specific language model here after implemented it, it would be very useful for others. – Harrumph 19/9, 2017 at 5:21

@Harrumph It was a bit more objective than that. By "didn't work" I meant I tried to implement the getNextWord(model, sentencePrefix) from each suggested answer, but they each either ended in exceptions being thrown, or had a gap in the explanation that was really the point of the question. If/when I get this working, I will self-answer. (BTW, thanks for being first to answer - it really helped me clarify my question, which led to finding more related questions.) – Spermatozoid 19/9, 2017 at 7:42

Did you manage to get it working? I am tackling the same problem! – Grosbeak 28/9, 2017 at 15:34

@Grosbeak No, not yet. So if you master it, please do post some code! – Spermatozoid 28/9, 2017 at 15:37

@DarrenCook I was very surprised that the tutorial did not include this, it is just the next logical step. I opened another question, let`s hope I dont get patronized as a "duplicate!". What I am after is 1.save the net to disk 2. invoke it whenever I want to get next 10 most probable word. Sure Ill post some code and if my question doesnt get answere I ll message the tensorflow guys for help. – Grosbeak 28/9, 2017 at 15:41

Did you ever figure this out? I am having the same problem! – Initiative 12/2, 2019 at 2:29

My biggest question is how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence?

I.e. I'm trying to write a function with the signature: getNextWord(model, sentencePrefix)

Before I explain my answer, first a remark about your suggestion to # Call static_rnn(cell) once for each word in prefix to initialize state: Keep in mind that static_rnn does not return a value like a numpy array, but a tensor. You can evaluate a tensor to a value when it is run (1) in a session (a session is keeps the state of your computional graph, including the values of your model parameters) and (2) with the input that is necessary to calculate the tensor value. Input can be supplied using input readers (the approach in the tutorial), or using placeholders (what I will use below).

Now follows the actual answer: The model in the tutorial was designed to read input data from a file. The answer of @user3080953 already showed how to work with your own text file, but as I understand it you need more control over how the data is fed to the model. To do this you will need to define your own placeholders and feed the data to these placeholders when calling session.run().

In the code below I subclassed PTBModel and made it responsible for explicitly feeding data to the model. I introduced a special PTBInteractiveInput that has an interface similar to PTBInput so you can reuse the functionality in PTBModel. To train your model you still need PTBModel.

class PTBInteractiveInput(object):
  def __init__(self, config):
    self.batch_size = 1
    self.num_steps = config.num_steps
    self.input_data = tf.placeholder(dtype=tf.int32, shape=[self.batch_size, self.num_steps])
    self.sequence_len = tf.placeholder(dtype=tf.int32, shape=[])
    self.targets = tf.placeholder(dtype=tf.int32, shape=[self.batch_size, self.num_steps])

class InteractivePTBModel(PTBModel):

  def __init__(self, config):
    input = PTBInteractiveInput(config)
    PTBModel.__init__(self, is_training=False, config=config, input_=input)
    output = self.logits[:, self._input.sequence_len - 1, :]
    self.top_word_id = tf.argmax(output, axis=2)

  def get_next(self, session, prefix):
    prefix_array, sequence_len = self._preprocess(prefix)
    feeds = {
      self._input.sequence_len: sequence_len,
      self._input.input_data: prefix_array,
    }
    fetches = [self.top_word_id]
    result = session.run(fetches, feeds)
    self._postprocess(result)

  def _preprocess(self, prefix):
    num_steps = self._input.num_steps
    seq_len = len(prefix)
    if seq_len > num_steps:
      raise ValueError("Prefix to large for model.")
    prefix_ids = self._prefix_to_ids(prefix)
    num_items_to_pad = num_steps - seq_len
    prefix_ids.extend([0] * num_items_to_pad)
    prefix_array = np.array([prefix_ids], dtype=np.float32)
    return prefix_array, seq_len

  def _prefix_to_ids(self, prefix):
    # should convert your prefix to a list of ids
    pass

  def _postprocess(self, result):
    # convert ids back to strings
    pass

In the __init__ function of PTBModel you need to add this line:

self.logits = logits

Why use a random (uninitialized, untrained) word-embedding?

First note that, although the embeddings are random in the beginning, they will be trained with the rest of the network. The embeddings you obtain after training will have similar properties than the embeddings you obtain with word2vec models, e.g., the ability to answer analogy questions with vector operations (king - man + woman = queen, etc.) In tasks were you have a considerable amount of training data like language modelling (which does not need annotated training data) or neural machine translation, it is more common to train embeddings from scratch.

Why use softmax?

Softmax is a function that normalizes a vector of similarity scores (the logits), to a probability distribution. You need a probability distribution to train you model with cross-entropy loss and to be able to sample from the model. Note that if you are only interested in the most likely words of a trained model, you don't need the softmax and you can use the logits directly.

Does the hidden layer have to match the dimension of the input (i.e. the dimension of the word2vec embeddings)

No, in principal it can be any value. Using a hidden state with a lower dimension than your embedding dimension, does not make much sense, however.

How/Can I bring in a pre-trained word2vec model, instead of that uninitialized one?

Here is a self-contained example of initializing an embedding with a given numpy array. If you want that the embedding remains fixed/constant during training, set trainable to False.

import tensorflow as tf
import numpy as np
vocab_size = 10000
size = 200
trainable=True
embedding_matrix = np.zeros([vocab_size, size]) # replace this with code to load your pretrained embedding
embedding = tf.get_variable("embedding",
                            initializer=tf.constant_initializer(embedding_matrix),
                            shape=[vocab_size, size],
                            dtype=tf.float32,
                            trainable=trainable)

Circumcision answered 14/9, 2017 at 10:18 Comment(5)

Thanks! I've pasted your code into the middle of ptb_word_lm.py. If I wanted to test it by, say, having it output its next word suggestion for a test prefix after each epoch of training, do I create one instance of InteractivePTBModel at the top of main (e.g. just after I have config), or create it fresh each time within the loop (e.g. at github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/… ) – Spermatozoid 15/9, 2017 at 10:41

I get "RuntimeError: Graph is finalized and cannot be modified." (on my first attempt to create the instance, inside the loop). Very long error message, but I think it is triggered by self.input_data = tf.placeholder(dtype=tf.float32, shape=[self.batch_size, self.num_steps]) – Spermatozoid 15/9, 2017 at 13:56

Everytime you initialize InteractivePTBModel new operations are added to the computational graph. The error you are getting is caused by the fact that you initialize the graph inside a managed_session which is does not allow the graph to be modified. You can create the model here, similarly to how the validation and test models are created. I hope this helps. – Circumcision 15/9, 2017 at 16:28

self.input_data and self.targets appear to want to be in32 not float32. Unfortunately fixing that only got me to the next error (talking about shapes must be equal rank and strided slices). I'm still only trying to create an object of InteractivePTBModel, not even calling get_next() on it yet! Was your code working for you? – Spermatozoid 15/9, 2017 at 20:31

You are right, the placeholders need to be int32 of course. I updated my answer. With the edits the equal rank error should also be fixed (which I believe was because self.self.sequence_len was defined as vector instead of a scalar). I don't have time to test the code right now. – Circumcision 15/9, 2017 at 21:37

Main Question

Loading words

Load custom data instead of using the test set:

reader.py@ptb_raw_data

test_path = os.path.join(data_path, "ptb.test.txt")
test_data = _file_to_word_ids(test_path, word_to_id)  # change this line

test_data should contain word ids (print out word_to_id for a mapping). As an example, it should look like: [1, 52, 562, 246] ...

Displaying predictions

We need to return the output of the FC layer (logits) in the call to sess.run

ptb_word_lm.py@PTBModel.__init__

    logits = tf.reshape(logits, [self.batch_size, self.num_steps, vocab_size])
    self.top_word_id = tf.argmax(logits, axis=2)  # add this line

ptb_word_lm.py@run_epoch

  fetches = {
      "cost": model.cost,
      "final_state": model.final_state,
      "top_word_id": model.top_word_id # add this line
  }

Later in the function, vals['top_word_id'] will have an array of integers with the ID of the top word. Look this up in word_to_id to determine the predicted word. I did this a while ago with the small model, and the top 1 accuracy was pretty low (20-30% iirc), even though the perplexity was what was predicted in the header.

Subquestions

Why use a random (uninitialized, untrained) word-embedding?

You'd have to ask the authors, but in my opinion, training the embeddings makes this more of a standalone tutorial: instead of treating embedding as a black box, it shows how it works.

Why use softmax?

The final prediction is not determined by the cosine similarity to the output of the hidden layer. There is an FC layer after the LSTM that converts the embedded state to a one-hot encoding of the final word.

Here's a sketch of the operations and dimensions in the neural net:

word -> one hot code (1 x vocab_size) -> embedding (1 x hidden_size) -> LSTM -> FC layer (1 x vocab_size) -> softmax (1 x vocab_size)

Does the hidden layer have to match the dimension of the input (i.e. the dimension of the word2vec embeddings)

Technically, no. If you look at the LSTM equations, you'll notice that x (the input) can be any size, as long as the weight matrix is adjusted appropriately.

How/Can I bring in a pre-trained word2vec model, instead of that uninitialized one?

I don't know, sorry.

Handful answered 11/9, 2017 at 1:12 Comment(3)

Thanks. I think this might be along the right lines, but it still doesn't answer my key question: once I have a model built, I want to load it from disk, give it a string (the first few words in a sentence), and ask it to suggest the next word in the sentence. I want to do that multiple times, with different prefix strings each time. I.e. I'm trying to write a function with the signature: getNextWord(model, sentencePrefix) – Spermatozoid 13/9, 2017 at 10:37

I followed your instructions, but when I do print(vals['top_word_id']) I see [[1 2] [1 1] [0 2] ... [1 1]] I.e. not a single number I can pass to word_to_id[]. (I also didn't see how I can specify the sentence prefix, with this approach.) – Spermatozoid 13/9, 2017 at 10:38

When you're printing vals['top_word_id'], are you using the test_config or the eval_config? (Check the variable num_steps)You should be using the latter because you're evaluating the model. You can specify the sentence prefix by looking up words in sentencePrefix in word_to_id. Admittedly this approach is not the cleanest way to write the code – Handful 15/9, 2017 at 15:25

There are many questions, I would try to clarify some of them.

how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence?

The key point here is, next word generation is actually word classification in the vocabulary. So you need a classifier, that is why there is a softmax in the output.

The principle is, at each time step, the model would output the next word based on the last word embedding and internal memory of previous words. tf.contrib.rnn.static_rnn automatically combine input into the memory, but we need to provide the last word embedding and classify the next word.

We can use a pre-trained word2vec model, just init the embedding matrix with the pre-trained one. I think the tutorial uses random matrix for the sake of simplicity. Memory size is not related to embedding size, you can use larger memory size to retain more information.

These tutorials are high-level. If you want to deeply understand the details, I would suggest looking at the source code in plain python/numpy.

Harrumph answered 9/9, 2017 at 9:39 Comment(3)

Thanks. I've just added some pseudo code to my question: what I'm hoping for is an answer that shows me the real code, so I can actually print out the answer. – Spermatozoid 9/9, 2017 at 11:10

Re: "using softmax as it is word classification": with word embeddings, the cosine similarity is used to find the nearest word to our 300-dimension vector input. What I don't get is why we are using softmax, instead of doing that. Is it for speed (and if so, is there a trade-off), to give a simpler tutorial (e.g. no gensim dependency), better quality results, it is the only way to train the LSTM, or something else? – Spermatozoid 9/9, 2017 at 11:15

@DarrenCook word classification is the straight forward way to get the next word. Sure there are other ways, like your suggestion about embedding similarity, but there are no guarantee they would work better, as I don't see any more information used. Not to mention it would be difficult to compute the gradient. This answer only give an intuition, you may search for code in language model repos I think. – Harrumph 9/9, 2017 at 18:22

You can find all the code at the end of the answer.

Most of your questions (why a Softmax, how to use pretrained embedding layer, etc...) were answered I reckon. However as you were still waiting for a concise code to produce generated text from a seed, here I try to report how I ended up doing it myself.

I struggled, starting from the official Tensorflow tutorial, to get to the point were I could easily generate words from a produced model. Fortunately after taking some bits of answer in practically all the answers you mentioned in your question, I got a better view of the problem (and solutions). This might contains errors, but at least it runs and generates some text...

how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence?

I will wrap the next word suggestion in a loop, to generate a whole sentence, but you will easily reduce that to one word only.

Let's say you followed the current tutorial given by tensorflow (v1.4 at time of writing) here, which will save a model after training it.

Then what is left for us to do is to load it from disk, and to write a function which take this model and some seed input and returns generated text.

Generate text from saved model

I assume we write all this code in a new python script. Whole script at the bottom as a recap, here I explain the main steps.

First necessary steps

FLAGS = tf.flags.FLAGS
FLAGS.model = "medium" # or whatever size you used

Now, quite importantly, we create dictionnaries to map ids to words and vice-versa (so we don't have to read a list of integers...).

word_to_id = reader._build_vocab('../data/ptb.train.txt') # here we load the word -> id dictionnary ()
id_to_word = dict(zip(word_to_id.values(), word_to_id.keys())) # and transform it into id -> word dictionnary
_, _, test_data, _ = reader.ptb_raw_data('../data')

Then we load the configuration class, also setting num_steps and batch_size to 1, as we want to sample 1 word at a time while the LSTM will process also 1 word at a time. Also creating the input instance on the fly:

eval_config = get_config()
eval_config.num_steps = 1
eval_config.batch_size = 1
model_input = PTBInput(eval_config, test_data)

Building graph

To load the saved model (as saved by the Supervisor.saver module in the tutorial), we need first to rebuild the graph (easy with the PTBModel class) which must use the same configuration as when trained:

sess = tf.Session()
initializer = tf.random_uniform_initializer(-eval_config.init_scale, eval_config.init_scale)
# not sure but seems to need the same name for variable scope as when saved ....!!
with tf.variable_scope("Model", reuse=None, initializer=initializer):
    tf.global_variables_initializer()
    mtest = PTBModel(is_training=False, config=eval_config, input=model_input)

Restoring saved weights:

sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('../Whatever_folder_you_saved_in')) # the path must point to the hierarchy where your 'checkpoint' file is

... Sampling words from a given seed:

First we need the model to contain an access to the logits outputs, or more precisely the probability distribution over the whole vocabulary. So in the ptb_lstm.py file add the line:

# the line goes somewhere below the reshaping "logits = tf.reshape(logits, [self.batch_size, ..."
self.probas = tf.nn.softmax(logits, name="probas")

Then we can design some sampling function (you're free to use whatever you like here, best approach is sampling with a temperature that tends to flatten or sharpen the distributions), here is a basic random sampling method:

def sample_from_pmf(probas):
    t = np.cumsum(probas)
    s = np.sum(probas)
    return int(np.searchsorted(t, np.random.rand(1) * s))

And finally a function that takes a seed, your model, the dictionary that maps word to ids, and vice versa, as inputs and outputs the generated string of texts:

def generate_text(session, model, word_to_index, index_to_word, 
                  seed='</s>', n_sentences=10):
    sentence_cnt = 0
    input_seeds_id = [word_to_index[w] for w in seed.split()]
    state = session.run(model.initial_state)

    # Initiate network with seeds up to the before last word:
    for x in input_seeds_id[:-1]:
        feed_dict = {model.initial_state: state,
                     model.input.input_data: [[x]]}
        state = session.run([model.final_state], feed_dict)

    text = seed
    # Generate a new sample from previous, starting at last word in seed
    input_id = [[input_seeds_id[-1]]]
    while sentence_cnt < n_sentences:
        feed_dict = {model.input.input_data: input_id,
                     model.initial_state: state}
        probas, state = session.run([model.probas, model.final_state],
                                 feed_dict=feed_dict)
        sampled_word = sample_from_pmf(probas[0])
        if sampled_word == word_to_index['</s>']:
            text += '.\n'
            sentence_cnt += 1
        else:
            text += ' ' + index_to_word[sampled_word]
        input_wordid = [[sampled_word]]

    return text

TL;DR

Do not forget to add the line:

self.probas = tf.nn.softmax(logits, name='probas')

In the ptb_lstm.py file, in the __init__ definition of PTBModel class, anywhere after the line logits = tf.reshape(logits, [self.batch_size, self.num_steps, vocab_size]).

The whole script, just run it from the same directory where you have reader.py, ptb_lstm.py:

import reader
import numpy as np
import tensorflow as tf
from ptb_lstm import PTBModel, get_config, PTBInput

FLAGS = tf.flags.FLAGS
FLAGS.model = "medium"

def sample_from_pmf(probas):
    t = np.cumsum(probas)
    s = np.sum(probas)
    return int(np.searchsorted(t, np.random.rand(1) * s))

def generate_text(session, model, word_to_index, index_to_word, 
                  seed='</s>', n_sentences=10):
    sentence_cnt = 0
    input_seeds_id = [word_to_index[w] for w in seed.split()]
    state = session.run(model.initial_state)

    # Initiate network with seeds up to the before last word:
    for x in input_seeds_id[:-1]:
        feed_dict = {model.initial_state: state,
                     model.input.input_data: [[x]]}
        state = session.run([model.final_state], feed_dict)

    text = seed
    # Generate a new sample from previous, starting at last word in seed
    input_id = [[input_seeds_id[-1]]]
    while sentence_cnt < n_sentences:
        feed_dict = {model.input.input_data: input_id,
                     model.initial_state: state}
        probas, state = sess.run([model.probas, model.final_state],
                                 feed_dict=feed_dict)
        sampled_word = sample_from_pmf(probas[0])
        if sampled_word == word_to_index['</s>']:
            text += '.\n'
            sentence_cnt += 1
        else:
            text += ' ' + index_to_word[sampled_word]
        input_wordid = [[sampled_word]]

    print(text)

if __name__ == '__main__':

    word_to_id = reader._build_vocab('../data/ptb.train.txt') # here we load the word -> id dictionnary ()
    id_to_word = dict(zip(word_to_id.values(), word_to_id.keys())) # and transform it into id -> word dictionnary
    _, _, test_data, _ = reader.ptb_raw_data('../data')

    eval_config = get_config()
    eval_config.batch_size = 1
    eval_config.num_steps = 1
    model_input = PTBInput(eval_config, test_data, name=None)

    sess = tf.Session()
    initializer = tf.random_uniform_initializer(-eval_config.init_scale,
                                            eval_config.init_scale)
    with tf.variable_scope("Model", reuse=None, initializer=initializer):
        tf.global_variables_initializer()
        mtest = PTBModel(is_training=False, config=eval_config, 
                         input_=model_input)

    sess.run(tf.global_variables_initializer())

    saver = tf.train.Saver()
    saver.restore(sess, tf.train.latest_checkpoint('../models'))

    while True:
        print(generate_text(sess, mtest, word_to_id, id_to_word, seed="this sentence is"))
        try:
            raw_input('press Enter to continue ...\n')
        except KeyboardInterrupt:
            print('\b\bQuiting now...')
            break

Update

As for restoring old checkpoints (for me the model saved 6 months ago, not sure about exact TF version used then) with recent tensorflow (1.6 at least), it might raise an error about some variables not being found (see comment). In that case, you should update your checkpoints using this script.

Also, note that for me, I had to modify this even further, as I noticed the saver.restore function was trying to read lstm_cell variables although my variables were transformed into basic_lstm_cell which led also to NotFound Error. So an easy fix, just a small change in the checkpoint_convert.py script, line 72-73, is to remove basic_ in the new names.

A convenient way to check the name of the variables contained in your checkpoints is (CKPT_FILE is the suffix that comes before .index, .data0000-1000, etc..):

reader = tf.train.NewCheckpointReader(CKPT_FILE)
reader.get_variable_to_shape_map()

This way you can verify that you have indeed the correct names (or the bad ones in the old checkpoints versions).

Top answered 19/1, 2018 at 18:45 Comment(3)

Thanks a lot, stackoverflow.com/users/5303618/h-rev. The code looks promising, but I cannot get it to work. I get the following error. Any idea how to fix it? File "test.py", line 64, in <module> mtest = PTBModel(is_training=False, config=eval_config, input_=model_input) ... ValueError: Variable Model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/kernel does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope? – Buonarroti 7/6, 2018 at 13:56

At the time of writing it worked, and now indeed, I get same error (with tensofrlow 1.6+). The issue arises if you saved the model with an earlier version and restore with a recent one. The naming convention for LSTM parameters changed, e.g. cell_0/basic_lstm_cell/weights became cell_0/basic_lstm_cell/kernel. Which is why you cannot restore them if you try to restore old checkpoints with recent TF (so bad...). Use this script to update your checkpoints. (See update in answer) – Top 8/6, 2018 at 18:17

@NiklasHeidloff Were you able to solve this? I'm facing the same problem. I'm trying to use the checkpoint right after storing it. So at least in my case the reason can't be the difference between versions. – Franconian 9/9, 2019 at 22:14

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++