Hidden size vs input size in RNN
Asked Answered
N

1

9

Premise 1:

Regarding neurons in a RNN layer - it is my understanding that at "each time step, every neuron receives both the input vector x (t) and the output vector from the previous time step y (t –1)" [1]:

https://static.mcmap.net/file/mcmap/ZG-AbGLDKwfnZ7-ocV9QWRft/ebolotin6/ebolotin6.github.io/blob/master/images/rnn.png

Premise 2:

It is also my understanding that in Pytorch's GRU layer, input_size and hidden_size mean the following:

  • input_size – The number of expected features in the input x
  • hidden_size – The number of features in the hidden state h

So naturally, hidden_size should represent the number of neurons in a GRU layer.

My question:

Given the following GRU layer:

# assume that hidden_size = 3

class Encoder(nn.Module):
    def __init__(self, src_dictionary_size, hidden_size):
        super(Encoder, self).__init__()
        self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
        self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size)

Assuming a hidden_size of 3, my understanding is that the GRU layer above would have 3 neurons, each which accepts an input vector of size 3 simultaneously for every timestep.

My question is: why do the arguments to hidden_size and input_size have to be equal? I.e. why can't each of the 3 neurons accept say, an input vector of size 5?

Case in point: both of the following produce size mismatch:

self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size+1)

[1] Géron, Aurélien. Hands-On Machine Learning with Scikit-Learn and TensorFlow (p. 388). O'Reilly Media. Kindle Edition.

[3] https://pytorch.org/docs/stable/nn.html#torch.nn.GRU


Adding full code for reproducibility:

import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, src_dictionary_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
        self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)

    def forward(self, pad_seqs, seq_lengths, hidden):
        """
        Args:
          pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
          seq_lengths: List of sequence lengths.
          hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.

        Returns:
          outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
          hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
        """
        embedded_sqs = self.embedding(pad_seqs).squeeze(2)
        packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
        packed_output, h_n = self.gru(packed_sqs, hidden)
        output, input_sizes = pad_packed_sequence(packed_output)

        return output, h_n

    def init_hidden(self, batch_size=1):
        return torch.zeros(1, batch_size, self.hidden_size)

def test_Encoder_shapes():
    hidden_size = 5
    encoder = Encoder(src_dictionary_size=5, hidden_size=hidden_size)

    # maximum word count
    max_seq_length = 4

    # num sentences
    batch_size = 2
    hidden = encoder.init_hidden(batch_size=batch_size)

    # these are padded sequences (sentences of words). There are 2 sentences (i.e. 2 batches) with a maximum of 4 words.
    pad_seqs = torch.tensor([
        [1, 2],
        [2, 3],
        [3, 0],
        [4, 0]
    ]).view(max_seq_length, batch_size, 1)

    outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
    assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
    assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
    print('Success')

test_Encoder_shapes()
Nitro answered 4/12, 2019 at 18:23 Comment(2)
Just tried this and did not receive any error. Perhaps your input data shape does not match expected shape.Amaliaamalie
Just updated my question to include code for reproducibility. There is clearly a shape mismatch, but I'm struggling to understand why.Nitro
N
5

I just resolved this and the mistake was self-inflicted.

Conclusion: input_size and hidden_size can differ in size and there is no inherent problem with this. The premises in the question are correctly stated.

The problem with the (full) code above was that the initial hidden state of the GRU did not have the correct dimensions. The initial hidden state must have the same dimensions as subsequent hidden states. In my case, the initial hidden state had the shape of (1,2,5) instead of (1,2,4). In the former, 5 represents the dimensionality of the embedding vector. 4 represents the hidden_size (num neurons) in the GRU. The correct code is below:

import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, src_dictionary_size, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(src_dictionary_size, input_size)
        self.gru = nn.GRU(input_size = input_size, hidden_size = hidden_size)

    def forward(self, pad_seqs, seq_lengths, hidden):
        """
        Args:
          pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
          seq_lengths: List of sequence lengths.
          hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.

        Returns:
          outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
          hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
        """
        embedded_sqs = self.embedding(pad_seqs).squeeze(2)
        packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
        packed_output, h_n = self.gru(packed_sqs, hidden)
        output, input_sizes = pad_packed_sequence(packed_output)

        return output, h_n

    def init_hidden(self, batch_size=1):
        return torch.zeros(1, batch_size, self.hidden_size)

def test_Encoder_shapes():
    hidden_size = 4
    embedding_size = 5
    encoder = Encoder(src_dictionary_size=5, input_size = embedding_size, hidden_size = hidden_size)
    print(encoder)

    max_seq_length = 4
    batch_size = 2
    hidden = encoder.init_hidden(batch_size=batch_size)
    pad_seqs = torch.tensor([
        [1, 2],
        [2, 3],
        [3, 0],
        [4, 0]
    ]).view(max_seq_length, batch_size, 1)

    outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
    assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
    assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
    print('Success')

test_Encoder_shapes()
Nitro answered 4/12, 2019 at 19:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.