Order of layers in hidden states in PyTorch GRU return

About

Asked 17/1, 2019 at 18:28 Answered 18/1, 2019 at 4:23

deep-learning pytorch recurrent-neural-network tensor gated-recurrent-unit

This is the API I am looking at, https://pytorch.org/docs/stable/nn.html#gru

It outputs:

output of shape (seq_len, batch, num_directions * hidden_size)
h_n of shape (num_layers * num_directions, batch, hidden_size)

For GRU with more than one layers, I wonder how to fetch the hidden state of the last layer, should it be h_n[0] or h_n[-1]?

What if it's bidirectional, how to do the slicing to obtain the last hidden layer states of GRUs in both directions?

Armenian answered 17/1, 2019 at 18:28 Comment(1)

I think it's h_n[-1]. Just confirmed myself – Armenian 17/1, 2019 at 18:38

The documentation nn.GRU is clear about this. Here is an example to make it more explicit:

For the unidirectional GRU/LSTM (with more than one hidden layer):

output - would contain all the output features of all the timesteps t
h_n - would return the hidden state (at last timestep) of all layers.

To get the hidden state of the last hidden layer and last timestep, use:

first_hidden_layer_last_timestep = h_n[0]
last_hidden_layer_last_timestep = h_n[-1]

where n is the sequence length.

This is because description says:

num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results.

So, it is natural and intuitive to also return the results (i.e. hidden states) accordingly in the same order.

Popp answered 18/1, 2019 at 4:23 Comment(7)

I think what you were thinking is output, "tensor containing the output features h_t from the last layer of the GRU". h_n only contains the hidden state at last timestep of all hidden layers. Note the dimensions of these tensors. – Armenian 18/1, 2019 at 4:32

You're right, I meant to write output; I have now updated the answer. Thanks for correcting me! – Popp 18/1, 2019 at 5:6

It wasn't obvious to me. How do you if it's not the other way, first layer is e.g. h_n[-1]? – Armenian 18/1, 2019 at 5:9

I have added some explanation based on the input parameters description from the docs. +1 – Popp 18/1, 2019 at 5:17

Intuition could be wrong, I confirmed it myself as the output of last layer of GRU hidden is supposed to be equal to the output at the last step – Armenian 18/1, 2019 at 5:21

@Armenian good! I want to ask a follow-up question. For example, for a two layer GRU, if we get the hidden state as a tensor of shape torch.Size([2, 1, 1500]), say for the last hidden layer. How can we get a vector out of this? Should we reshape first and then take a mean to get a 1D vector? – Popp 18/1, 2019 at 5:28

Let us continue this discussion in chat. – Armenian 18/1, 2019 at 5:30

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags