Understanding input shape to PyTorch conv1D?
Asked Answered
I

3

13

This seems to be one of the common questions on here (1, 2, 3), but I am still struggling to define the right shape for input to PyTorch conv1D.

I have text sequences of length 512 (number of tokens per sequence) with each token being represented by a vector of length 768 (embedding). The batch size I am using is 6.

So my input tensor to conv1D is of shape [6, 512, 768].

input = torch.randn(6, 512, 768) 

Now, I want to convolve over the length of my sequence (512) with a kernel size of 2 using the conv1D layer from PyTorch.

Understanding 1:

I assumed that "in_channels" are the embedding dimension of the conv1D layer. If so, then a conv1D layer will be defined in this way where

in_channels = embedding dimension (768)
out_channels = 100 (arbitrary number)
kernel = 2

convolution_layer = nn.conv1D(768, 100, 2)
feature_map = convolution_layer(input)

But with this assumption, I get the following error:

RuntimeError: Given groups=1, weight of size 100 768 2, expected input `[4, 512, 768]` to have 768 channels, but got 512 channels instead

Understanding 2:

Then I assumed that "in_channels" is the sequence length of the input sequence. If so, then a conv1D layer will be defined in this way where

in_channels = sequence length (512)
out_channels = 100 (arbitrary number)
kernel = 2

convolution_layer = nn.conv1D(512, 100, 2) 
feature_map = convolution_layer(input)

This works fine and I get an output feature map of dimension [batch_size, 100, 767]. However, I am confused. Shouldn't the convolutional layer convolve over the sequence length of 512 and output a feature map of dimension [batch_size, 100, 511]?

I will be really grateful for your help.

Introrse answered 14/6, 2020 at 13:7 Comment(1)
In general, PyTorch nn modules work on (N, C_in, *) input and they output (N, C_out, *), where C_in and C_out the * dims are the ones in which the "operation takes place". This is true for all Conv*.Pythagoras
G
8

In pytorch your input shape of [6, 512, 768] should actually be [6, 768, 512] where the feature length is represented by the channel dimension and sequence length is the length dimension. Then you can define your conv1d with in/out channels of 768 and 100 respectively to get an output of [6, 100, 511].

Given an input of shape [6, 512, 768] you can convert it to the correct shape with Tensor.transpose.

input = input.transpose(1, 2).contiguous()

The .contiguous() ensures the memory of the tensor is stored contiguously which helps avoid potential issues during processing.

Gey answered 14/6, 2020 at 13:54 Comment(0)
I
3

I found an answer to it (source).

So, usually, BERT outputs vectors of shape

[batch_size, sequence_length, embedding_dim].

where,

sequence_length = number of words or tokens in a sequence (max_length sequence BERT can handle is 512)
embedding_dim = the vector length of the vector describing each token (768 in case of BERT).

thus, input = torch.randn(batch_size, 512, 768)

Now, we want to convolve over the text sequence of length 512 using a kernel size of 2.

So, we define a PyTorch conv1D layer as follows,

convolution_layer = nn.conv1d(in_channels, out_channels, kernel_size)

where,

in_channels = embedding_dim
out_channels = arbitrary int
kernel_size = 2 (I want bigrams)

thus, convolution_layer = nn.conv1d(768, 100, 2)

Now we need a connecting link between the expected input by convolution_layer and the actual input.

For this, we require to

current input shape [batch_size, 512, 768] expected input [batch_size, 768, 512]

To achieve this expected input shape, we need to use the transpose function from PyTorch.

input_transposed = input.transpose(1, 2)
Introrse answered 14/6, 2020 at 14:2 Comment(0)
F
0

I have a suggestion for you which may not be what you asked for but might help. Because your input is (6, 512, 768) you can use conv2d instead of 1d.

All you need to do is to add a dimension of 1 at index 1: input.unsqueeze(1) which works as your channel (consider it as a grayscale image)

def forward(self, x):
        x = self.embedding(x) # [Batch, seq length, Embedding] = [5, 512, 768])
        x = torch.unsqueeze(x, 1) #  [5, 1, 512, 768]) # like a grayscale image

and also for your conv2d layer, you can define like this:

window_size=3 # for trigrams
EMBEDDING_SIZE = 768
NUM_FILTERS = 10 # or whatever you want
self.conv = nn.Conv2d(in_channels = 1,
                      out_channels = NUM_FILTERS,
                      kernel_size = [window_size, EMBEDDING_SIZE], 
                      padding=(window_size - 1, 0))```
Fussbudget answered 23/11, 2020 at 3:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.