Understanding input and output size for Conv2d

import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x net = Net()

These are the dimensions of the image size itself (i.e. Height x Width).

Unpadded convolutions

Unless you pad your image with zeros, a convolutional filter will shrink the size of your output image by filter_size - 1 across the height and width:


3-filter takes a 5x5 image to a (5-(3-1) x 5-(3-1)) image	Zero padding preserves image dimensions

You can add padding in Pytorch by setting Conv2d(padding=...).

Chain of transformations

Since it has gone through:

Layer	Shape Transformation
one conv layer (without padding)	`(h, w) -> (h-4, w-4)`
a MaxPool	`-> ((h-4)//2, (w-4)//2)`
another conv layer (without padding)	`-> ((h-8)//2, (w-8)//2)`
another MaxPool	`-> ((h-8)//4, (w-8)//4)`
a Flatten	`-> ((h-8)//4 * (w-8)//4)`

We go from the original image size of (32,32) to (28,28) to (14,14) to (10,10) to (5,5) to (5x5).

To visualise this you can use the torchsummary package:

from torchsummary import summary

input_shape = (3,32,32)
summary(Net(), input_shape)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 6, 28, 28]             456
         MaxPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         MaxPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
================================================================

Unpadded convolutions

Chain of transformations

Recommended topics

Hot tags