What's the difference between a bidirectional LSTM and an LSTM?

Asked 26/3, 2017 at 23:31 Answered 22/3, 2020 at 12:12

Solved machine-learning neural-network keras lstm recurrent-neural-network

101

Can someone please explain this? I know bidirectional LSTMs have a forward and backward pass but what is the advantage of this over a unidirectional LSTM?

What is each of them better suited for?

Medulla answered 26/3, 2017 at 23:31 Comment(0)

152

LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.

Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.

Using bidirectional will run your inputs in two ways, one from past to future and one from future to past and what differs this approach from unidirectional is that in the LSTM that runs backwards you preserve information from the future and using the two hidden states combined you are able in any point in time to preserve information from both past and future.

What they are suited for is a very complicated question but BiLSTMs show very good results as they can understand context better, I will try to explain through an example.

Lets say we try to predict the next word in a sentence, on a high level what a unidirectional LSTM will see is

The boys went to ....

And will try to predict the next word only by this context, with bidirectional LSTM you will be able to see information further down the road for example

Forward LSTM:

The boys went to ...

Backward LSTM:

... and then they got out of the pool

You can see that using the information from the future it could be easier for the network to understand what the next word is.

Yoon answered 20/5, 2017 at 6:51 Comment(7)

One doubt, when a sentence is run through bidirectional lstm then the output of forward or backward lstm cell, which one should we use, if we are trying to encode the sentence ? – Funambulist 25/7, 2017 at 11:5

I don't think there is one answer for that, but I believe that using both will be a good approach - maybe this article can be of help web.stanford.edu/class/cs224n/reports/2760320.pdf – Yoon 25/7, 2017 at 11:16

but then the uses of bidirectional lstm would be limited right? because when you are trying to predict a word you won´t know the next words, maybe you could point out some real world examples of this? thanks a lot btw! – Scuttlebutt 29/1, 2018 at 14:43

what could be the uses, maybe translation or sentiment analysis? – Scuttlebutt 29/1, 2018 at 14:50

There are many uses, like you said, translation, sentiment analysis and other applications which are not NLP related. Also bidirectional LSTMs (or even more than 2 way LSTMs) can be applied to images or spectrograph inputs – Yoon 29/1, 2018 at 15:10

@Scuttlebutt the bi-LSTMs are usually employed in sequence-to-sequence applications, where you know the full input at prediction time, but you don't know what it corresponds to. As said, examples are translation (you have the full phrase), speech recognition (you have the full utterance), OCR (you have the full image) – Shun 11/3, 2018 at 16:25

<joke><troll> Am I the only person interested in knowing where the boys went? "River"? "To swim"? "Mars"?</troll></joke> – Cesya 23/1, 2020 at 23:43

Adding to Bluesummer's answer, here is how you would implement Bidirectional LSTM from scratch without calling BiLSTM module. This might better contrast the difference between a uni-directional and bi-directional LSTMs. As you see, we merge two LSTMs to create a bidirectional LSTM.

You can merge outputs of the forward and backward LSTMs by using either {'sum', 'mul', 'concat', 'ave'}.

left = Sequential()
left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13)))
right = Sequential()
right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))

model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

Orpha answered 24/6, 2018 at 22:36 Comment(1)

Will it differ if you use Bidirectional(LSTM(64)) instead of the left and right? If not, then is the Bidirectional thing internally implemented like that? – Mccraw 13/5, 2020 at 21:29

In comparison to LSTM, BLSTM or BiLSTM has two networks, one access pastinformation in forward direction and another access future in the reverse direction. wiki

A new class Bidirectional is added as per official doc here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional

model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5,
10)))

and activation function can be added like this:

model = Sequential()
model.add(Bidirectional(LSTM(num_channels, 
        implementation = 2, recurrent_activation = 'sigmoid'),
        input_shape=(input_length, input_dim)))

Complete example using IMDB data will be like this.The result after 4 epoch.

Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
17465344/17464789 [==============================] - 4s 0us/step
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 78s 3ms/step - loss: 0.4219 - acc: 0.8033 - val_loss: 0.2992 - val_acc: 0.8732
Epoch 2/4
25000/25000 [==============================] - 82s 3ms/step - loss: 0.2315 - acc: 0.9106 - val_loss: 0.3183 - val_acc: 0.8664
Epoch 3/4
25000/25000 [==============================] - 91s 4ms/step - loss: 0.1802 - acc: 0.9338 - val_loss: 0.3645 - val_acc: 0.8568
Epoch 4/4
25000/25000 [==============================] - 92s 4ms/step - loss: 0.1398 - acc: 0.9509 - val_loss: 0.3562 - val_acc: 0.8606

BiLSTM or BLSTM

import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
from keras.datasets import imdb


n_unique_words = 10000 # cut texts after this number of words
maxlen = 200
batch_size = 128

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=n_unique_words)
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
y_train = np.array(y_train)
y_test = np.array(y_test)

model = Sequential()
model.add(Embedding(n_unique_words, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=4,
          validation_data=[x_test, y_test])

Joelie answered 7/12, 2018 at 6:48 Comment(0)

Another use case of bidirectional LSTM might be for word classification in the text. They can see the past and future context of the word and are much better suited to classify the word.

Sweptback answered 26/3, 2018 at 15:0 Comment(0)

It can also be helpful in Time Series Forecasting problems, like predicting the electric consumption of a household. However, we can also use LSTM in this but Bidirectional LSTM will also do a better job in it.

Emperor answered 22/3, 2020 at 12:12 Comment(1)

How can you say that if you do not know future values? The backward layer takes values from the future to predict the past. This is not possible for timeseries.... – Anglice 24/6, 2023 at 11:19

Recommended topics

Hot tags