Pytorch LSTM vs LSTMCell

Asked 15/7, 2019 at 23:3 Answered 17/4, 2021 at 11:7

Solved pytorch lstm recurrent-neural-network lstm-stateful

What is the difference between LSTM and LSTMCell in Pytorch (currently version 1.1)? It seems that LSTMCell is a special case of LSTM (i.e. with only one layer, unidirectional, no dropout).

Then, what's the purpose of having both implementations? Unless I'm missing something, it's trivial to use an LSTM object as an LSTMCell (or alternatively, it's pretty easy to use multiple LSTMCells to create the LSTM object)

Apostolate answered 15/7, 2019 at 23:3 Comment(0)

Yes, you can emulate one by another, the reason for having them separate is efficiency.

LSTMCell is a cell that takes arguments:

Input of shape batch × input dimension;
A tuple of LSTM hidden states of shape batch x hidden dimensions.

It is a straightforward implementation of the equations.

LSTM is a layer applying an LSTM cell (or multiple LSTM cells) in a "for loop", but the loop is heavily optimized using cuDNN. Its input is

A three-dimensional tensor of inputs of shape batch × input length × input dimension;
Optionally, an initial state of the LSTM, i.e., a tuple of hidden states of shape batch × hidden dim (or tuple of such tuples if the LSTM is bidirectional)

You often might want to use the LSTM cell in a different context than apply it over a sequence, i.e. make an LSTM that operates over a tree-like structure. When you write a decoder in sequence-to-sequence models, you also call the cell in a loop and stop the loop when the end-of-sequence symbol is decoded.

Kalsomine answered 16/7, 2019 at 12:2 Comment(2)

Your answer successfully helped me understand how to implement this paper: arxiv.org/pdf/1607.00148.pdf. I was having trouble understanding the decoder portion, but now that I know to use a single cell, I can do it. Thank you. – Condole 8/12, 2020 at 16:26

LSTMcell is also very usefull when applying to sequences of different lengths, as this can be handled by an outer for loop instead of the optimized one. – Scurrilous 17/11, 2023 at 12:53

Let me show some specific examples:

# LSTM example:
>>> rnn = nn.LSTM(10, 20, 2)
>>> input = torch.randn(5, 3, 10)
>>> h0 = torch.randn(2, 3, 20)
>>> c0 = torch.randn(2, 3, 20)
>>> output, (hn, cn) = rnn(input, (h0, c0))
# LSTMCell example:
>>> rnn = nn.LSTMCell(10, 20)
>>> input = torch.randn(3, 10)
>>> hx = torch.randn(3, 20)
>>> cx = torch.randn(3, 20)
>>> output = []
>>> for i in range(6):
        hx, cx = rnn(input[i], (hx, cx))
        output.append(hx)

The key difference:

LSTM: the argument 2, stands num_layers, number of recurrent layers. There are seq_len * num_layers=5 * 2 cells. No loop but more cells.
LSTMCell: in for loop (seq_len=5 times), each output of ith instance will be input of (i+1)th instance. There is only one cell, Truly Recurrent

If we set num_layers=1 in LSTM or add one more LSTMCell, the codes above will be the same.

Obviously, It is easier to apply parallel computing in LSTM.

Curious answered 17/4, 2021 at 11:7 Comment(3)

This is wrong. In the LSTM case, the weights are still shared time steps. – Murrumbidgee 18/6, 2021 at 11:46

@Murrumbidgee Weights are shared in all RNN's cells like RNN, LSTM, GRU. But I'm comparing LSTM and LSTMCell. LSTM has more cells, so can apply parallel computing, and Cell can only compute with loops – Curious 20/6, 2021 at 5:35

LSTMCell takes (inputs, (h_0,c_0)) -> ((batch, input_size) , ((batch, hidden_size), (batch, hidden_size)) ) OR ( input_size , (hidden_size, hidden_size) ). Your input shape in the loop isn't right. – Kindergarten 15/4, 2022 at 12:57

Recommended topics

Hot tags