Calculate the output size in convolution layer [closed]

Asked 2/12, 2018 at 12:9 Answered 21/1, 2021 at 19:39

machine-learning deep-learning pytorch conv-neural-network

102

How do I calculate the output size in a convolution layer?

For example, I have a 2D convolution layer that takes a 3x128x128 input and has 40 filters of size 5x5.

Industrious answered 2/12, 2018 at 12:9 Comment(2)

I’m voting to close this question because it is not about programming as defined in the help center but about ML theory and/or methodology - please see the intro and NOTE in the machine-learning tag info. – Paget 30/9, 2021 at 9:40

According to Pytorch Conv2D docs, Lout = ⌊ (Lin + 2 * padding - dilation * (kernel - 1) - 1) / stride + 1 ⌋, where Lin is input length/width/height, Lout is output length. – Acrefoot 12/12, 2023 at 5:1

190

you can use this formula [(W−K+2P)/S]+1.

W is the input volume - in your case 128
K is the Kernel size - in your case 5
P is the padding - in your case 0 i believe
S is the stride - which you have not provided.

So, we input into the formula:

Output_Shape = (128-5+0)/1+1

Output_Shape = (124,124,40)

NOTE: Stride defaults to 1 if not provided and the 40 in (124, 124, 40) is the number of filters provided by the user.

Cornejo answered 2/12, 2018 at 12:16 Comment(10)

Further reading: en.wikipedia.org/wiki/… – Sirois 2/12, 2018 at 12:32

what if the calculated size wasn't an integer number? how should the number be rounded? – Isidora 24/6, 2020 at 21:1

@Isidora i just ran a small test and it seems to round down in my case. Feel free to create a model with an input shape of 224 and replicate! – Cornejo 12/7, 2020 at 23:52

Doesn't the number of input channels have an effect? – Garay 13/10, 2020 at 17:57

@PyWalker2797 afaik it doesnt as the way the operations are done on the input plane is for each channel, no matter the number of input channels. – Cornejo 13/10, 2020 at 22:34

The square brackets "[ ]" should in fact be the floor function – Pacificism 16/12, 2020 at 21:44

@Isidora - In this wiki link it says: "If this number is not an integer, then the strides are incorrect and the neurons cannot be tiled to fit across the input volume in a symmetric way." – Enchondroma 25/6, 2021 at 7:3

Thank you. Is there a proof for this formula please? – Kimkimball 25/2, 2022 at 6:42

Shoudnt it be: Math.roundUp(W−K+2P+1/S) ? – Groovy 12/6, 2022 at 16:54

@asalimih: The AlexNet has the input of size 224x224x3, filter of size 11x11x3 with stride 4 in the first convolutional layer. The spatial size of the next layer is 54.25. I guess padding could be used in the case. – Dioptric 20/12, 2023 at 3:31

You can find it in two ways: simple method: input_size - (filter_size - 1)

W - (K-1)
Here W = Input size
            K = Filter size
            S = Stride
            P = Padding

But the second method is the standard to find the output size.

Second method: (((W - K + 2P)/S) + 1)
        Here W = Input size
        K = Filter size
        S = Stride
        P = Padding

Riancho answered 21/1, 2021 at 19:39 Comment(2)

For other readers, you can do a WolframAlpha computation of this formula to quickly check the effect of some of these parameters. – Thetis 12/7, 2021 at 10:44

Is there a derivation for this equation? I can't understand the logic behind even if it works. – Voelker 8/2 at 3:2

Let me start simple; since you have square matrices for both input and filter let me get one dimension. Then you can apply the same for other dimension(s). Imagine your are building fences between trees, if there are N trees, you have to build N-1 fences. Now apply that analogy to convolution layers.

Your output size will be: input size - filter size + 1

Because your filter can only have n-1 steps as fences I mentioned.

Let's calculate your output with that idea. 128 - 5 + 1 = 124 Same for other dimension too. So now you have a 124 x 124 image.

That is for one filter.

If you apply this 40 times you will have another dimension: 124 x 124 x 40

Here is a great guide if you want to know more about advanced convolution arithmetic: https://arxiv.org/pdf/1603.07285.pdf

Fimble answered 13/11, 2020 at 15:7 Comment(0)

Formula : n[i]=(n[i-1]−f[i]+2p[i])/s[i]+1

where,

n[i-1]=128

f[i]=5

p[i]=0

s[i]=1

so,

n[i]=(128-5+0)/1+1 =124

so the size of the output layer is: 124x124x40 Where '40' is the number of filters

Attractant answered 21/2, 2019 at 10:17 Comment(0)

(124*124*3)*40 = 1845120 width = 124 height = 124 depth = 3 no. of filters = 40 stride = 1 padding = 0

Sculpt answered 10/4, 2020 at 17:27 Comment(0)

Recommended topics

Hot tags