Order of layers in hidden states in PyTorch GRU return
Asked Answered
A

1

8

This is the API I am looking at, https://pytorch.org/docs/stable/nn.html#gru

It outputs:

  1. output of shape (seq_len, batch, num_directions * hidden_size)
  2. h_n of shape (num_layers * num_directions, batch, hidden_size)

For GRU with more than one layers, I wonder how to fetch the hidden state of the last layer, should it be h_n[0] or h_n[-1]?

What if it's bidirectional, how to do the slicing to obtain the last hidden layer states of GRUs in both directions?

Armenian answered 17/1, 2019 at 18:28 Comment(1)
I think it's h_n[-1]. Just confirmed myselfArmenian
P
2

The documentation nn.GRU is clear about this. Here is an example to make it more explicit:

For the unidirectional GRU/LSTM (with more than one hidden layer):

output - would contain all the output features of all the timesteps t
h_n - would return the hidden state (at last timestep) of all layers.

To get the hidden state of the last hidden layer and last timestep, use:

first_hidden_layer_last_timestep = h_n[0]
last_hidden_layer_last_timestep = h_n[-1]

where n is the sequence length.


This is because description says:

num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results.

So, it is natural and intuitive to also return the results (i.e. hidden states) accordingly in the same order.

Popp answered 18/1, 2019 at 4:23 Comment(7)
I think what you were thinking is output, "tensor containing the output features h_t from the last layer of the GRU". h_n only contains the hidden state at last timestep of all hidden layers. Note the dimensions of these tensors.Armenian
You're right, I meant to write output; I have now updated the answer. Thanks for correcting me!Popp
It wasn't obvious to me. How do you if it's not the other way, first layer is e.g. h_n[-1]?Armenian
I have added some explanation based on the input parameters description from the docs. +1Popp
Intuition could be wrong, I confirmed it myself as the output of last layer of GRU hidden is supposed to be equal to the output at the last stepArmenian
@Armenian good! I want to ask a follow-up question. For example, for a two layer GRU, if we get the hidden state as a tensor of shape torch.Size([2, 1, 1500]), say for the last hidden layer. How can we get a vector out of this? Should we reshape first and then take a mean to get a 1D vector?Popp
Let us continue this discussion in chat.Armenian

© 2022 - 2024 — McMap. All rights reserved.