What's the difference between Conv layer and Pooling layer in CNN?
Asked Answered
D

3

17

Pooling can be considered as convolution whether it's max/average, right?

The difference is that conv has parameters for optimization, but pooling doesn't, right? - e.g. the weights that filter in pooling has are not changed during learning.

I'd also like to know what's the difference between the aims of conv and pooling.

Why do we use each layers? What'll happen, if we don't use each layers?

Damning answered 19/4, 2017 at 2:36 Comment(0)
E
19

Convolutional layer

The convolutional layer serves to detect (multiple) patterns in multipe sub-regions in the input field using receptive fields.

Pooling layer

The pooling layer serves to progressively reduce the spatial size of the representation, to reduce the number of parameters and amount of computation in the network, and hence to also control overfitting.

The intuition is that the exact location of a feature is less important than its rough location relative to other features.


Also, you said 'weights that filter in pooling has are not changed during learning', there don't always have to be weights. For example, in a MAX_POOLING layer, there is no need for weights:

enter image description here

Eudemonia answered 19/4, 2017 at 9:5 Comment(0)
W
12

A conv-layer has parameters to learn (that is your weights which you update each step), whereas the pooling layer does not - it is just applying some given function e.g max-function.

Walton answered 23/11, 2018 at 16:49 Comment(0)
M
4

The difference can be summarized in (1) how do you compute them and (2) what is used for.

  1. How do you compute them:

Take for example an input data that is a matrix (5x5) -think about an image of 5 by 5 pixels-. The pooling layer and the convolution layer are operations that are applied to each of the input "pixels". Let's take a pixel in the center of the image (to avoid to discuss what happens with the corners, will elaborate later) and define a "kernel" for both the pooling layer and the convolution layer of (3x3).

Pooling layer: you super-impose the pooling kernel on the input pixel (in the figure you put the center of the blue matrix on top of the black X_00, and take the maximum.

Convolutional layer: you super-impose the convolutional kernel on the input pixel (in the figure you put the center of the orange matrix on top of the black X_00) and then perform the element wise multiplication and then summation as indicated in the figure.

The convolution coefficients, F_.., where are they taken from ? they are learnt when training the network. For the maxpooling, you do not have to learn nothing, you take the maximum. You can consider the maxpooling is like a convolution but with fixed coefficients, and instead of summing, taking the maximum.

You perform this for each input element. What happens an the input image corners, depens on what your choice: discard the input elements at the sides/corners, pad, etc.. Also you can not move continuously, pixel by pixel, by jumping, etc...

  1. what is used for: max_pooling reduces the size of the input, and performs kind of summarization of the data, and at the same time provides some invariance to translational transformations (e.g. if the object moves left-right, up-down). convultion, depending on the conditions on the filter coefficients (e.g. a column must be negative, while other positive) can be regarded as filters allowing to extract some patterns, like vertical lines, horizontal lines, etc...

input image, max_pool_kernel, conv_kernel

Maltreat answered 25/1, 2021 at 7:4 Comment(1)
Is there any way (weights value followed by non linearity such as ReLU) so that the conv is exactly the same as max pool?Fright

© 2022 - 2024 — McMap. All rights reserved.