How to use multilayered bidirectional LSTM in Tensorflow?

Asked 13/9, 2017 at 5:12 Answered 20/8, 2018 at 10:34

Solved tensorflow lstm recurrent-neural-network bidirectional multi-layer

I want to know how to use multilayered bidirectional LSTM in Tensorflow.

I have already implemented the contents of bidirectional LSTM, but I wanna compare this model with the model added multi-layers.

How should I add some code in this part?

x = tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
#print(x[0].get_shape())

# Define lstm cells with tensorflow
# Forward direction cell
lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

# Get lstm cell output
try:
    outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                          dtype=tf.float32)
except Exception: # Old TensorFlow version only returns outputs not states
    outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                    dtype=tf.float32)

# Linear activation, using rnn inner loop last output
outputs = tf.stack(outputs, axis=1)
outputs = tf.reshape(outputs, (batch_size*n_steps, n_hidden*2))
outputs = tf.matmul(outputs, weights['out']) + biases['out']
outputs = tf.reshape(outputs, (batch_size, n_steps, n_classes))

Emmanuelemmeline answered 13/9, 2017 at 5:12 Comment(0)

You can use two different approaches to apply multilayer bilstm model:

1) use out of previous bilstm layer as input to the next bilstm. In the beginning you should create the arrays with forward and backward cells of length num_layers. And

for n in range(num_layers):
        cell_fw = cell_forw[n]
        cell_bw = cell_back[n]

        state_fw = cell_fw.zero_state(batch_size, tf.float32)
        state_bw = cell_bw.zero_state(batch_size, tf.float32)

        (output_fw, output_bw), last_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,
                                                                             initial_state_fw=state_fw,
                                                                             initial_state_bw=state_bw,
                                                                             scope='BLSTM_'+ str(n),
                                                                             dtype=tf.float32)

        output = tf.concat([output_fw, output_bw], axis=2)

2) Also worth a look at another approach stacked bilstm.

Muntjac answered 9/10, 2017 at 14:2 Comment(1)

I tried this and got this error: ValueError: Variable bidirectional_rnn/fw/lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True in VarScope? Can you provide a working example? – Towland 12/12, 2017 at 1:53

This is primarily same as the first answer but with a little variation of usage of scope name and with added dropout wrappers. It also takes care of the error the first answer gives about variable scope.

def bidirectional_lstm(input_data, num_layers, rnn_size, keep_prob):

    output = input_data
    for layer in range(num_layers):
        with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):

            # By giving a different variable scope to each layer, I've ensured that
            # the weights are not shared among the layers. If you want to share the
            # weights, you can do that by giving variable_scope as "encoder" but do
            # make sure first that reuse is set to tf.AUTO_REUSE

            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)

            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)

            outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                              cell_bw, 
                                                              output,
                                                              dtype=tf.float32)

            # Concat the forward and backward outputs
            output = tf.concat(outputs,2)

    return output

Fifine answered 14/8, 2018 at 14:6 Comment(7)

I have a question related to that. I concat the outputs and reshaped it using output = tf.reshape(tf.concat(output,1), [-1, 2 * rnn_size]) and the dimension is now (Batch_size X timesteps, 2*rnn_size). When I pass it through a dense layer by using logits=tf.matmul(output, weight) + bias, my dimension becomes (Batch_size X timesteps, num_classes). These are my logits. How can I then find loss by using tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))? cause the shape of Y placeholder is [None, num_classes]. – Casilde 13/9, 2018 at 21:4

You can't directly. You need to eliminate that timestep dimension. Is there any specific reason to use the output of all timesteps? Generally, we take output at the last time step only. You can do this by returning output = output[:,-1,:]. Now logits would be [batch_size,num_classes] – Fifine 14/9, 2018 at 7:8

thank you very much for your quick response. To be honest, this is how I learned LSTM. Like in this example they flatten the output and use it to compute logits, not eliminating the timesteps. I am confused a bit now. – Casilde 14/9, 2018 at 12:32

He did that because he's using tf.contrib.seq2seq.sequence_loss which expects the time_step dimension. Notice that once logits are calculated, he again reshaped it to original shape. In your case, you want to use tf.nn.softmax_cross_entropy_with_logits which won't take that shape. It will require the last time_step only. – Fifine 14/9, 2018 at 12:41

Oh I understand.so you say, before dense layer and softwax, I should choose the last time steps of data points and go from there? – Casilde 14/9, 2018 at 12:47

If you want to use tf.nn.softmax_cross_entropy_with_logits then yes. Though in that particular problem, you might want to use seq2seq loss. – Fifine 14/9, 2018 at 12:57

for the one in the link? yeah i understand. – Casilde 14/9, 2018 at 12:58

On top of Taras's answer. Here is another example using just 2-layer Bidirectional RNN with GRU cells

    embedding_weights = tf.Variable(tf.random_uniform([vocabulary_size, state_size], -1.0, 1.0))
    embedding_vectors = tf.nn.embedding_lookup(embedding_weights, tokens)

    #First BLSTM
    cell = tf.nn.rnn_cell.GRUCell(state_size)
    cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell, cell, inputs=embedding_vectors,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_1')
    outputs = tf.concat([forward_output, backward_output], axis=2)

    #Second BLSTM using the output of previous layer as an input.
    cell2 = tf.nn.rnn_cell.GRUCell(state_size)
    cell2 = tf.nn.rnn_cell.DropoutWrapper(cell2, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell2, cell2, inputs=outputs,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_2')
    outputs = tf.concat([forward_output, backward_output], axis=2)

BTW, don't forget to add different scope name. Hope this help.

Twitch answered 8/5, 2018 at 9:36 Comment(0)

As @Taras pointed out, you can use:

(1) tf.nn.bidirectional_dynamic_rnn()

(2) tf.contrib.rnn.stack_bidirectional_dynamic_rnn().

All previous answers only capture (1), so I give some details on (2), in particular since it usually outperforms (1). For an intuition about the different connectivities see here.

Let's say you want to create a stack of 3 BLSTM layers, each with 64 nodes:

num_layers = 3
num_nodes = 64


# Define LSTM cells
enc_fw_cells = [LSTMCell(num_nodes)for layer in range(num_layers)]
enc_bw_cells = [LSTMCell(num_nodes) for layer in range(num_layers)]

# Connect LSTM cells bidirectionally and stack
(all_states, fw_state, bw_state) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
        cells_fw=enc_fw_cells, cells_bw=enc_bw_cells, inputs=input_embed, dtype=tf.float32)

# Concatenate results
for k in range(num_layers):
    if k == 0:
        con_c = tf.concat((fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((fw_state[k].h, bw_state[k].h), 1)
    else:
        con_c = tf.concat((con_c, fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((con_h, fw_state[k].h, bw_state[k].h), 1)

output = tf.contrib.rnn.LSTMStateTuple(c=con_c, h=con_h)

In this case, I use the final states of the stacked biRNN rather than the states at all timesteps (saved in all_states), since I was using an encoding decoding scheme, where the above code was only the encoder.

Rugged answered 20/8, 2018 at 10:34 Comment(2)

Thank you for the detailed explanation. Can I ask about the "final states"? When the input sequences have different length, does "final states" have actual final states corresponding to each different length input? or it may include zero paddings? – Ability 11/11, 2020 at 3:7

this code snippet was done for tf==1.X and if I remember correctly it can't handle variable length sequences out of the box. I always used zero-padding. Tensorflow 2.X may have a better solution for this though – Rugged 11/11, 2020 at 8:16

Recommended topics

Hot tags