Cropping/Scaling ImageNet Images
Asked Answered
Z

1

8

ImageNet images are all different sizes, but neural networks need a fixed size input.

One solution is to take a crop size that is as large as will fit in the image, centered around the center point of the image. This works but has some drawbacks. Often times important parts of the object of interest in the image are cut out, and there are even cases where the correct object is completely missing while another object that belongs to a different class is visible, meaning your model will be trained wrong for that image.

Another solution would be to use the entire image and zero pad it to where each image has the same dimensions. This seems like it would interfere with the training process though, and the model would learn to look for vertical/horizontal patches of black near the edge of images.

What is commonly done?

Zobias answered 3/5, 2016 at 23:58 Comment(0)
F
9

There are several approaches:

  • Multiple crops. For example AlexNet was originally trained on 5 different crops: center, top-left, top-right, bottom-left, bottom-right.
  • Random crops. Just take a number of random crops from the image and hope that the Neural Network will not be biased.
  • Resize and deform. Resize the image to a fixed size without considering the aspect ratio. This witll deform the image contents but preserves but now you are sure that no content is cut.
  • Variable-sized Inputs. Do not crop and train the network on variable sized images, using something like Spatial Pyramid Pooling to extract a fixed size feature vector that can be used with fully connected layers.

You could take a look how the latest ImageNet networks are trained, like VGG and ResNet. They usually describe this step in detail.

Feeney answered 4/5, 2016 at 23:55 Comment(3)
Is it common to see a method that crops before resizing?Gremial
@Gremial I imagine that order doesn't matter as long as you're keeping a sufficiently large fraction of the image in your crop.Sawfish
Some of these methods sound like they're for training. For validation, I checked this example which seems to resize to 256 then center-crop 224x224. (I believe this is how AlexNet originally handled it, right?)Sawfish

© 2022 - 2024 — McMap. All rights reserved.