Initializing LSTM hidden state Tensorflow/Keras
Asked Answered
R

4

17

Can someone explain how can I initialize hidden state of LSTM in tensorflow? I am trying to build LSTM recurrent auto-encoder, so after i have that model trained i want to transfer learned hidden state of unsupervised model to hidden state of supervised model. Is that even possible with current API? This is paper I am trying to recreate:

http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf

Readymix answered 23/2, 2017 at 12:34 Comment(0)
I
21

Yes - this is possible but truly cumbersome. Let's go through an example.

  1. Defining a model:

    from keras.layers import LSTM, Input
    from keras.models import Model
    
    input = Input(batch_shape=(32, 10, 1))
    lstm_layer = LSTM(10, stateful=True)(input)
    
    model = Model(input, lstm_layer)
    model.compile(optimizer="adam", loss="mse")
    

    It's important to build and compile model first as in compilation the initial states are reset. Moreover - you need to specify a batch_shape where batch_size is specified as in this scenario our network should be stateful (which is done by setting a stateful=True mode.

  2. Now we could set the values of initial states:

    import numpy
    import keras.backend as K
    
    hidden_states = K.variable(value=numpy.random.normal(size=(32, 10)))
    cell_states = K.variable(value=numpy.random.normal(size=(32, 10)))
    
    model.layers[1].states[0] = hidden_states
    model.layers[1].states[1] = cell_states 
    

    Note that you need to provide states as a keras variables. states[0] holds hidden states and states[1] holds cell states.

Hope that helps.

Ibbetson answered 23/2, 2017 at 15:51 Comment(1)
if i want to set just the initial hidden state, would the code be model.layers[1].states[0][0] = h_0Londoner
J
7

As stated in the Keras API documentation for recurrent layers (https://keras.io/layers/recurrent/):

Note on specifying the initial state of RNNs

You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.

You can specify the initial state of RNN layers numerically by calling reset_states with the keyword argument states. The value of states should be a numpy array or list of numpy arrays representing the initial state of the RNN layer.

Since the LSTM layer has two states (hidden state and cell state) the value of initial_state and states is a list of two tensors.


Examples

Stateless LSTM

Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)

import tensorflow as tf
import numpy as np

inputs = np.random.random([1, 10, 1]).astype(np.float32)

lstm = tf.keras.layers.LSTM(8)

c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))

outputs = lstm(inputs, initial_state=[h_0, c_0])

Stateful LSTM

Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)

Note that for stateful lstm you need to specify also batch_size.

import tensorflow as tf
import numpy as np
from pprint import pprint

inputs = np.random.random([1, 10, 1]).astype(np.float32)

lstm = tf.keras.layers.LSTM(8, stateful=True, batch_size=(1, 10, 1))

c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))

outputs = lstm(inputs, initial_state=[h_0, c_0])

With a Stateful LSTM, the states are not reset at the end of each sequence and we can notice that the output of the layer correspond to the hidden state (i.e. lstm.states[0]) at the last timestep:

>>> pprint(outputs)
<tf.Tensor: id=821, shape=(1, 8), dtype=float32, numpy=
array([[ 0.07119043,  0.07012419, -0.06118739, -0.11008392,  0.00573938,
        -0.05663438,  0.11196419,  0.02663924]], dtype=float32)>
>>>
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.07119043,  0.07012419, -0.06118739, -0.11008392,  0.00573938,
        -0.05663438,  0.11196419,  0.02663924]], dtype=float32)>,
 <tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.14726108,  0.13584498, -0.12986949, -0.22309153,  0.0125412 ,
        -0.11446435,  0.22290672,  0.05397629]], dtype=float32)>]

Calling reset_states() it is possible to reset the states:

>>> lstm.reset_states()
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>,
 <tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>]
>>>

or to set them to a specific value:

>>> lstm.reset_states(states=[h_0, c_0])
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
        0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>,
 <tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
        0.05909375, 0.68569875, 0.19087408]], dtype=float32)>]
>>>
>>> pprint(h_0)
<tf.Tensor: id=422, shape=(1, 8), dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
        0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>
>>>
>>> pprint(c_0)
<tf.Tensor: id=421, shape=(1, 8), dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
        0.05909375, 0.68569875, 0.19087408]], dtype=float32)>
>>>
Jamey answered 20/2, 2020 at 13:47 Comment(1)
When doing this I get an error saying that the hidden states need to be symbolic. I am using the functional API. How can I convert the initial state to symbolic?Motherwort
M
3

I used this approach, totally worked out for me:

lstm_cell = LSTM(cell_num, return_state=True) 

output, h, c = lstm_cell(input, initial_state=[h_prev, c_prev])
Meggie answered 19/8, 2019 at 0:23 Comment(0)
R
1

Assuming an RNN is in layer 1 and hidden/cell states are numpy arrays. You can do this:

from keras import backend as K

K.set_value(model.layers[1].states[0], hidden_states)
K.set_value(model.layers[1].states[1], cell_states)

States can also be set using

model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states

but when I did it this way my state values stayed constant even after stepping the RNN.

Relique answered 25/11, 2018 at 9:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.