How to count the amount of layers in a CNN?

Asked 3/4, 2017 at 2:43 Answered 29/11, 2020 at 11:32

Solved neural-network deep-learning pytorch resnet deep-residual-networks

The Pytorch implementation of ResNet-18. has the following structure, which appears to be 54 layers, not 18.

So why is it called "18"? How many layers does it actually have?


ResNet (
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (relu): ReLU (inplace)
  (maxpool): MaxPool2d (size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1))
  (layer1): Sequential (
    (0): BasicBlock (
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    )
    (1): BasicBlock (
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    )
  )
  (layer2): Sequential (
    (0): BasicBlock (
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
      (downsample): Sequential (
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (1): BasicBlock (
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    )
  )
  (layer3): Sequential (
    (0): BasicBlock (
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      (downsample): Sequential (
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (1): BasicBlock (
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    )
  )
  (layer4): Sequential (
    (0): BasicBlock (
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
      (downsample): Sequential (
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (1): BasicBlock (
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
      (relu): ReLU (inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    )
  )
  (avgpool): AvgPool2d (
  )
  (fc): Linear (512 -> 1000)
)

Festive answered 3/4, 2017 at 2:43 Comment(4)

It was designed to have 18 layers, probably through experimentation, so there is no "why" we can tell you. – Nichrome 3/4, 2017 at 11:16

Thanks, but there has to be a way to count the layers through the code, like CNN, we can count layers through __init__(), but ResNet-18 has layer1~4, and each layer will call _make_layer(), just like the output above, there are 54 layers. – Festive 4/4, 2017 at 3:45

What I want to do is to record each layer's gradient, so I need to figure out how many layers in the net. – Festive 4/4, 2017 at 3:54

I think what you want and what you asked are not the same. – Nichrome 4/4, 2017 at 6:51

From your output, we can know that there are 20 convolution layers (one 7x7 conv, 16 3x3 conv, and plus 3 1x1 conv for downsample). Basically, if you ignore the 1x1 conv, and counting the FC (linear) layer, the number of layers are 18.

And I've also made an example on how to visualize your architecture in pytorch via graphviz, hope it will help you understand your architecture.

Meemeece answered 12/4, 2017 at 13:32 Comment(2)

Thanks，so the number of layers is not strict？ – Festive 12/4, 2017 at 14:57

Basically we usually just count the number of convolution and fully connected layers. But in resnet, I am not sure whether the shortcut convolution layer should be counted. It should be up to the author's definition. – Meemeece 12/4, 2017 at 15:10

Why does ResNet-18 have 18 layers?

Well, then the answer is pretty straightforward, the number of layers in Neural Net is a hyperparameter (means you can tune it as you want). In the ResNet paper, the authors have gone through training multiple models of various layers (like 18, 34, 50) to conduct a proper study of accuracy, error rate, etc. thus the naming convention they followed is ResNet-18, ResNet-34, ResNet-50...

Why the architecture of ResNet-18 (that you've provided in your question) have more than 18 layers?

There're a number of ways people calculate the number of layers of a deep neural net model, some people count input/output layers as well, some count the pooling layers.

But the way the authors did in the ResNet paper is they just calculated all the convolution layers and the fully connected layers, nothing else. However the model architecture that you've given, there are more than 18 layers! It is simply because of the 1x1 convolution layers, the authors called them projection layers, these layers are simply used for matching input dimension (x) with residual block's dimension (F(x)) so that they can be summed (y=F(x)+x). So If you count without those projections (1x1 convs.) you'll see there are 18 layers, thus the name ResNet-18

Venitavenite answered 29/11, 2020 at 11:32 Comment(2)

Your explanation is good, but in ResNet-18, the original paper has no projection layer. Please review my comment and your answer accordingly. If we talk about 50, all Conv. layers including projection and fully-connected layer make complete count of 50. Correct me, if i am wrong... thank you! – Distemper 3/3, 2021 at 14:16

I'm quoting from the original paper: "we consider two options: (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut in Eqn.(2) is used to match dimensions (done by 1×1 convolutions)." -Page 4.... So as you can see they proposed 2 options, if you use padding then the projection isn't required and that won't cost you any extra parameter or (1x1) convolution layer as well, however, in the projection option, extra num. of parameters and conv layers are incurred. – Venitavenite 3/3, 2021 at 18:28

Recommended topics

Hot tags