What does the filter parameter mean in Conv2d layer?
Asked Answered
M

2

11

I am getting confused with the filter paramater, which is the first parameter in the Conv2D() layer function in keras. As I understand the filters are supposed to do things like edge detection or sharpening the image or blurring the image, but when I am defining the model as

input_shape = (32, 32, 3)
model = Sequential()
model.add( Conv2D(64, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same') )
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(64, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(128, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same'))
model.add(Flatten())
model.add(Dense(3072, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

I am not mentioning the the edge detection or blurring or sharpening anywhere in the Conv2D function. The input images are 32 by 32 RGB images.

So my question is, when I define the Convolution layer as Conv2D(64, ...), does this 64 means 64 different types of filters, such as vertical edge, horizontal edge, etc, which are chosen by keras at random? if so then is the output of the convolution layer (with 64 filters and 5x5 kernel and 1x1 stride) on a 32x32 1-channel image is 64 images of 28x28 size each. How are these 64 images combined to form a single image for further layers?

Melloney answered 7/5, 2021 at 17:5 Comment(0)
B
15

The filters argument sets the number of convolutional filters in that layer. These filters are initialized to small, random values, using the method specified by the kernel_initializer argument. During network training, the filters are updated in a way that minimizes the loss. So over the course of training, the filters will learn to detect certain features, like edges and textures, and they might become something like the image below (from here).

A set of CNN filters

It is very important to realize that one does not hand-craft filters. These are learned automatically during training -- that's the beauty of deep learning.

I would highly recommend going through some deep learning resources, particularly https://cs231n.github.io/convolutional-networks/ and https://www.youtube.com/watch?v=r5nXYc2wYvI&list=PLypiXJdtIca5sxV7aE3-PS9fYX3vUdIOX&index=3&t=3122s.

Bawbee answered 7/5, 2021 at 17:18 Comment(0)
B
3

Just wanted to clarify what the output shape was.

Although jakub's answer was good, I don't think it addressed the "single image for further layers" part of the question.

I did a model.summary() to find out more.

I found that the shape returned from a Conv2D is (None, img_width, img_height, num_filters)

So when you pass the output of the Conv2D to MaxPooling you are passing that shape which means it is basically passing each entire convoluted image.

The other layers handle this gracefully. MaxPooling2D(2,2) returns the same shape but half the image size (None, img_width / 2, img_height / 2, num_filters).

Side note: I wish the filters was named num_filters because filters seems to imply you're passing in a list of filters in which to convolute the image.

Bargeman answered 7/4, 2022 at 21:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.