Why is the convolutional filter flipped in convolutional neural networks? [closed]
Asked Answered
N

3

16

I don't understand why there is the need to flip filters when using convolutional neural networks.

According to the lasagne documentation,

flip_filters : bool (default: True)

Whether to flip the filters before sliding them over the input, performing a convolution (this is the default), or not to flip them and perform a correlation. Note that for some other convolutional layers in Lasagne, flipping incurs an overhead and is disabled by default – check the documentation when using learned weights from another layer.

What does that mean? I never read about flipping filters when convolving in any neural network book. Would someone clarify, please?

Nightwalker answered 17/7, 2017 at 19:47 Comment(2)
It's some sort of correction for different layer types, see github.com/Lasagne/Recipes/issues/39Championship
I’m voting to close this question because it's an ML theory question, and not about programming.Islam
R
22

The underlying reason for transposing a convolutional filter is the definition of the convolution operation - which is a result of signal processing. When performing the convolution, you want the kernel to be flipped with respect to the axis along which you're performing the convolution because if you don't, you end up computing a correlation of a signal with itself. It's a bit easier to understand if you think about applying a 1D convolution to a time series in which the function in question changes very sharply - you don't want your convolution to be skewed by, or correlated with, your signal.

This answer from the digital signal processing stack exchange site gives an excellent explanation that walks through the mathematics of why convolutional filters are defined to go in the reverse direction of the signal.

This page walks through a detailed example where the flip is done. This is a particular type of filter used for edge detection called a Sobel filter. It doesn't explain why the flip is done, but is nice because it gives you a worked-out example in 2D.

I mentioned that it is a bit easier to understand the why (as in, why is convolution defined this way) in the 1D case (the answer from the DSP SE site is really a great explanation); but this convention does apply to 2D and 3D as well (the Conv2DDNN anad Conv3DDNN layers both have the flip_filter option). Ultimately, however, because the convolutional filter weights are not something that the human programs, but rather are "learned" by the network, it is entirely arbitrary - unless you are loading weights from another network, in which case you must be consistent with the definition of convolution in that network. If convolution was defined correctly (i.e., according to convention), the filter will be flipped. If it was defined incorrectly (in the more "naive" and "lazy" way), it will not.

The broader field that convolutions are a part of is "linear systems theory" so searching for this term might turn up more about this, albeit outside the context of neural networks.

Note that the convolution/correlation distinction is also mentioned in the docstrings of the corrmm.py class in lasagne:

flip_filters : bool (default: False) Whether to flip the filters and perform a convolution, or not to flip them and perform a correlation. Flipping adds a bit of overhead, so it is disabled by default. In most cases this does not make a difference anyway because the filters are learnt. However, flip_filters should be set to True if weights are loaded into it that were learnt using a regular :class:lasagne.layers.Conv2DLayer, for example.

Restaurateur answered 22/10, 2017 at 9:44 Comment(0)
A
5

I never read about flipping filters when convolving in any neural network book.

You can try a simple experiment. Take an image having the centermost pixel as value 1 and all other pixels with value 0. Now take any filter smaller than the image (let us say a 3 by 3 filter with values from 1-9). Now do a simple correlation instead of convolution. You end up with the flipped filter as the output after the operation.

Now flip the filter yourself and then do the same operation. You obviously end up with the original filter as the output.

The second operation somehow seems neater. It is like multiplying with a 1 and returning the same value. However the first one is not necessarily wrong. It works most of the times even though it may not have nice mathematical properties. After all, why would the program care about whether the operation is associative or not. It just does the job which it is told to do. Moreover the filter could be symmetrical..flipping it returns the same filter so correlation operation and convolution operation return the same output.

Is there a case where these mathematical properties help? Well sure, they do! If (ab)c is not equal to a(bc), then I wouldn't be able to combine 2 filters and then apply the result on an image. To clarify, imagine I have 2 filters a,b and an image c. I would have to first apply 'b' on the image 'c' and then 'a' on the above result in case of correlation. In case of convolution, I could just do 'a b' first and then apply the result on the image 'c'. If I have a million images to process, the efficiencies gained due to combining the filters 'a' and 'b' start becoming obvious.

Every single mathematical property that a convolution satisfies gives certain benefits and hence if we have a choice (& we certainly do) we should prefer convolutions to correlations. The only difference between them is - in convolution we flip the filter before doing the multiplication operation and in correlation - we directly do the multiplication operation.

Applying convolution satisfies the mathematician inside all of us and also gives us some tangible benefits as well.

Though nowadays feature engineering in images is done end-to-end completely by Mrs DL itself and we need not even bother about it, there are other traditional image operations that may need these kind of operations.

Auckland answered 4/3, 2022 at 12:37 Comment(0)
M
1

Firstly, since CNNs are trained from scratch instead of human-designed, if the flip operation is necessary, the learned filters would be the flipped one and the cross-correlation with the flipped filters is implemented. Secondly, flipping is neccessary in 1D time-series processing, since the past inputs impact the current system output given the "current" input. But in 2D/3D image spatial convolution, there is not "time" concept, then not "past" input and its impact on "now", therefore, we don't need to consider the relationship of "signal" and "system", and there is only the relationship of "signal"(image patch) and "signal"(image patch), which means we only need cross-correlation instead of convolution (although DL borrow this concept from signal processing). Therefore, the flip operation is actually not needed. (I guess.)

Mastic answered 9/3, 2021 at 3:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.