Simple LSTM in PyTorch with Sequential module
Asked Answered
T

3

10

In PyTorch, we can define architectures in multiple ways. Here, I'd like to create a simple LSTM network using the Sequential module.

In Lua's torch I would usually go with:

model = nn.Sequential()
model:add(nn.SplitTable(1,2))
model:add(nn.Sequencer(nn.LSTM(inputSize, hiddenSize)))
model:add(nn.SelectTable(-1)) -- last step of output sequence
model:add(nn.Linear(hiddenSize, classes_n))

However, in PyTorch, I don't find the equivalent of SelectTable to get the last output.

nn.Sequential(
  nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
  # what to put here to retrieve last output of LSTM ?,
  nn.Linear(hiddenSize, classe_n))
Truitt answered 23/5, 2017 at 9:26 Comment(0)
L
9

Define a class to extract the last cell output:

# LSTM() returns tuple of (tensor, (recurrent state))
class extract_tensor(nn.Module):
    def forward(self,x):
        # Output shape (batch, features, hidden)
        tensor, _ = x
        # Reshape shape (batch, hidden)
        return tensor[:, -1, :]

nn.Sequential(
    nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
    extract_tensor(),
    nn.Linear(hiddenSize, classe_n)
)
Levirate answered 8/10, 2020 at 15:11 Comment(2)
Hi, How do I proced from here? something like: net = nn.Sequential( ... ) for epoch in epochs: outputs = net(inputs) I tried the above and still got errorsFizz
what error you get , can you show me .. the above is toy example you defin the model and give values for the variables (inputSize , hiddenSize, classe_n ) base on your workLevirate
T
0

According to the LSTM cell documentation the outputs parameter has a shape of (seq_len, batch, hidden_size * num_directions) so you can easily take the last element of the sequence in this way:

rnn = nn.LSTM(10, 20, 2)
input = Variable(torch.randn(5, 3, 10)) 
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, (h0, c0))
print(output[-1]) # last element

Tensor manipulation and Neural networks design in PyTorch is incredibly easier than in Torch so you rarely have to use containers. In fact, as stated in the tutorial PyTorch for former Torch users PyTorch is built around Autograd so you don't need anymore to worry about containers. However, if you want to use your old Lua Torch code you can have a look to the Legacy package.

Toggle answered 25/5, 2017 at 20:55 Comment(3)
I've coded an LSTM in this exact way beforehand. But my question is how would you do it within the Sequential module nn.Sequential? The LSTM returns two values, output and hn in your code, how to retrieve output[-1] in the Sequential fashion?Truitt
I don't think that it's fair to downvote my answer however I have updated it inserting more evidences of the fact that using containers as in Lua Torch is out-dated.Toggle
According to this post, they wanted to get rid of the Sequential module in PyTorch but they kept it for its convenience as a container. I guess that it's not possible to access intermediate output within a Sequential container.Truitt
R
0

As far as I'm concerned there's nothing like a SplitTable or a SelectTable in PyTorch. That said, you are allowed to concatenate an arbitrary number of modules or blocks within a single architecture, and you can use this property to retrieve the output of a certain layer. Let's make it more clear with a simple example.

Suppose I want to build a simple two-layer MLP and retrieve the output of each layer. I can build a custom class inheriting from nn.Module:

class MyMLP(nn.Module):

    def __init__(self, in_channels, out_channels_1, out_channels_2):
        # first of all, calling base class constructor
        super().__init__()
        # now I can build my modular network
        self.block1 = nn.Linear(in_channels, out_channels_1)
        self.block2 = nn.Linear(out_channels_1, out_channels_2)

    # you MUST implement a forward(input) method whenever inheriting from nn.Module
    def forward(x):
        # first_out will now be your output of the first block
        first_out = self.block1(x)
        x = self.block2(first_out)
        # by returning both x and first_out, you can now access the first layer's output
        return x, first_out

In your main file you can now declare the custom architecture and use it:

from myFile import MyMLP
import numpy as np

in_ch = out_ch_1 = out_ch_2 = 64
# some fake input instance
x = np.random.rand(in_ch)

my_mlp = MyMLP(in_ch, out_ch_1, out_ch_2)
# get your outputs
final_out, first_layer_out = my_mlp(x)

Moreover, you could concatenate two MyMLP in a more complex model definition and retrieve the output of each one in a similar way. I hope this is enough to clarify, but in case you have more questions, please feel free to ask, since I may have omitted something.

Roam answered 8/10, 2020 at 15:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.