How to feed back RNN output to input in tensorflow
Asked Answered
D

3

9

In case where suppose I have a trained RNN (e.g. language model), and I want to see what it would generate on its own, how should I feed its output back to its input?

I read the following related questions:

Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs.

The following code works:

tf.reset_default_graph()
n_samples=100

state_size=5

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)

pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)

Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)


opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))

steps = 1001
for i in range(steps):
    p, l, _= sess.run([pred, loss, opt])

The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series. However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. I tried to compute the state and output at every step:

with tf.variable_scope('sine', reuse=True):
    X_test = tf.placeholder(tf.float64)
    X_reshaped = tf.reshape(X_test, [1, -1, 1])
    output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
    pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)


    test_vals = [0.]
    for i in range(1000):
        val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
        test_vals.append(val)

However in this model it seems that there is no continuity between the LSTM cells. What is going on here?

Do I have to initialize a zero array with i.e. 100 time steps, and assign each run's result into the array? Like feeding the network with this:

run 0: input_feed = [0, 0, 0 ... 0]; res1 = result

run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result

run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result

etc...

What to do if I want to use this trained network to use its own output as its input in the following time step?

Distrust answered 24/2, 2017 at 14:0 Comment(0)
P
6

If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1, right? To do so, there is a relatively easy work around that you can use at test time:

  1. Make sure your input placeholders can accept a dynamic sequence length, i.e. the size of the time dimension is None.
  2. Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example).
  3. Pass the initial state into dynamic_rnn.
  4. Then, at test time, you can loop through your sequence and feed each time step individually (i.e. max sequence length is 1). Additionally, you just have to carry over the internal state of the RNN. See pseudo code below (the variable names refer to your code snippet).

I.e., change the definition of the model to something like this:

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1])  # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
    initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)

Then you can perform inference like so:

fetches = {'final_state': last_state,
           'prediction': pred}

toy_initial_input = np.array([[[1]]])  # put suitable data here
seq_length = 20  # put whatever is reasonable here for you

# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']

for i in range(1, seq_length):
    feed_dict = {X: outputs[-1],
                 initial_state: next_state}
    eval_out = sess.run(fetches, feed_dict)
    outputs.append(eval_out['prediction'])
    next_state = eval_out['final_state']

# outputs now contains the sequence you want

Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch.

If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement.

Padua answered 19/12, 2017 at 13:51 Comment(7)
I'm trying to do this at train time - is there an alternative to manually unrolling the RNN (i.e. is there some way to not have to specify X in advance?)Pantelegraph
@Pantelegraph You could either create your own cell or use tf.nn.raw_rnn, that gives you more control over what happens before and after the call to the LSTM cell. Take a look at this blog post and carefully read the TF documentation of raw_rnn.Padua
Thanks! tl;dr for anyone reading, use raw_rnn instead of dynamic_rnn, which allows you (among other flexibilities) to pass a lambda for the inputs.Pantelegraph
@Pantelegraph A more detailed post about the workings of tf.nn.raw_rnn can be found in this post: #39681526Padua
@Padua How can we use this method when our input consists of more than one time series and we predict just one output from model and want to use this predicted output along with other inputs as at time step t+1?Slipsheet
@AtherCheema I don't quite understand your question - may be consider opening a new post for that?Padua
@Padua Thankyou for attention. Here #55005957 is the question. Would appreciate very much if you can help me out.Slipsheet
E
1

You can use its own output (last state) as the next-step input (initial state). One way to do this is to:

  1. use zero-initialized variables as the input state at every time step
  2. each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got.

The second can be done by either:

  1. fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models
  2. build an update op in the graph and add a dependency, as done in the ptb example in tensorpack.
Exposure answered 24/2, 2017 at 17:32 Comment(7)
So is it like, if I use time step 4, then would the input feed look like this: [0, 0, 0, 0 ] [0, 0, 0, h0] [0, 0, h0, h1] [0, h0, h1, h2] [h0, h1, h2, h3] [h1, h2, h3, h4] ... etcDistrust
Assuming you're talking about method 1. You only have to feed the initial state (0,0,0,0). the rnn function in tensorflow already computes internal statesExposure
computes the internal states, however, if I feed (0,0,0,0), then I get 4 output (hidden, h) values - from which I would like to use ONLY the first as an INPUT for the next. By the way I dont want to feed the states (c & h) of the network, rather its last prediction, meaning that if state_size=N than I have to further process the state to get a single scalar for prediction. I could not find a relative answer in the mentioned official implementations. Could you please update your answer with the specific lines? Thanks :)Distrust
I'm confused about what you're asking now, and the notation as well. What is h0-h4? Are they hidden states or are they outputs of models? What do you mean by 4 output (hidden, h) values? The rnn function gives you one output of size (batch x time x cell_size) and another output of (batch x state_size).Exposure
Sorry for being unclear. h0-h4 means hidden states - which for this example were single scalars ht at time t. Maybe this github issue will clarify, what I wanted to ask: github.com/craffel/nntools/issues/15Distrust
The notations is still not clear. A hidden state should never be a scalar in practice, and in your code you use a size of 5. What you're talking about sounds more like "output state". The figure 1 and its comments in this tutorial may help with notation confusion. And if that is the case, you don't need to feed a sequence but only the last output state and last hidden state, to predict the next output.Exposure
Okay, so from the beginning: I want to train an RNN to produce a time-series, like sentences, stock prices etc. I want to use it in test time in the following manner: I start it with a given initial state, and w/out any external input, I want it to make samples on its own. However the TF api requires me to feed input to the network. I want to feed back its own produced-so-far-sequence. At the moment I can do that, I don't have to use unrolled LSTM, because I only use the last state. However the produced samples are improper, because at training time I didn't use its own output as inputDistrust
U
1

I know I'm a bit late to the party but I think this gist could be useful:

https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31

It lets you autofeed the input through a filter and back into the network as input. To make shapes match up processing can be set as a tf.layers.Dense layer.

Please ask any questions!

Edit:

In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. Ex:

# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
 ...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)
Ulcerate answered 1/2, 2019 at 19:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.