Understanding input and output size for Conv2d
Asked Answered
S

1

5

I'm learning image classification using PyTorch (using CIFAR-10 dataset) following this link.

I'm trying to understand the input & output parameters for the given Conv2d code:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

My conv2d() understanding (Please correct if I am wrong/missing anything):

  • since image has 3 channels that's why first parameter is 3. 6 is no of filters (randomly chosen)
  • 5 is kernel size (5, 5) (randomly chosen)
  • likewise we create next layer (previous layer output is input of this layer)
  • Now creating a fully connected layer using linear function: self.fc1 = nn.Linear(16 * 5 * 5, 120)

16 * 5 * 5: here 16 is the output of last conv2d layer, But what is 5 * 5 in this?.

Is this kernel size ? or something else? How to know we need to multiply by 5*5 or 4*4 or 3*3.....

I researched & got to know that since image size is 32*32, applying max pool(2) 2 times, so image size would be 32 -> 16 -> 8, so we should multiply it by last_ouput_size * 8 * 8 But in this link its 5*5.

Could anyone please explain?

Stephenson answered 29/3, 2021 at 6:57 Comment(0)
S
12

These are the dimensions of the image size itself (i.e. Height x Width).

Unpadded convolutions

Unless you pad your image with zeros, a convolutional filter will shrink the size of your output image by filter_size - 1 across the height and width:

enter image description here enter image description here
3-filter takes a 5x5 image to a (5-(3-1) x 5-(3-1)) image Zero padding preserves image dimensions

You can add padding in Pytorch by setting Conv2d(padding=...).

Chain of transformations

Since it has gone through:

Layer Shape Transformation
one conv layer (without padding) (h, w) -> (h-4, w-4)
a MaxPool -> ((h-4)//2, (w-4)//2)
another conv layer (without padding) -> ((h-8)//2, (w-8)//2)
another MaxPool -> ((h-8)//4, (w-8)//4)
a Flatten -> ((h-8)//4 * (w-8)//4)

We go from the original image size of (32,32) to (28,28) to (14,14) to (10,10) to (5,5) to (5x5).


To visualise this you can use the torchsummary package:

from torchsummary import summary

input_shape = (3,32,32)
summary(Net(), input_shape)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 6, 28, 28]             456
         MaxPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         MaxPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
================================================================
Saida answered 29/3, 2021 at 7:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.