As stated in the Keras API documentation for recurrent layers (https://keras.io/layers/recurrent/):
Note on specifying the initial state of RNNs
You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state
. The value of initial_state
should be a tensor or list of tensors representing the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by calling reset_states
with the keyword argument states
. The value of states
should be a numpy array or list of numpy arrays representing the initial state of the RNN layer.
Since the LSTM layer has two states (hidden state and cell state) the value of initial_state
and states
is a list of two tensors.
Examples
Stateless LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
import tensorflow as tf
import numpy as np
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8)
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
Stateful LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
Note that for stateful lstm you need to specify also batch_size
.
import tensorflow as tf
import numpy as np
from pprint import pprint
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8, stateful=True, batch_size=(1, 10, 1))
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
With a Stateful LSTM, the states are not reset at the end of each sequence and we can notice that the output of the layer correspond to the hidden state (i.e. lstm.states[0]
) at the last timestep:
>>> pprint(outputs)
<tf.Tensor: id=821, shape=(1, 8), dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>
>>>
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.14726108, 0.13584498, -0.12986949, -0.22309153, 0.0125412 ,
-0.11446435, 0.22290672, 0.05397629]], dtype=float32)>]
Calling reset_states()
it is possible to reset the states:
>>> lstm.reset_states()
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>]
>>>
or to set them to a specific value:
>>> lstm.reset_states(states=[h_0, c_0])
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>]
>>>
>>> pprint(h_0)
<tf.Tensor: id=422, shape=(1, 8), dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>
>>>
>>> pprint(c_0)
<tf.Tensor: id=421, shape=(1, 8), dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>
>>>