What is the expected input range for working with Keras VGG models?

I'm trying to use a pretrained VGG 16 from keras. But I'm really unsure about what the input range should be.

Quick answer, which of these color orders?

And which range?

0 to 255?
balanced from about -125 to about +130?
0 to 1?
-1 to 1?

I notice the file where the model is defined imports an input preprocessor:

from .imagenet_utils import preprocess_input

But this preprocessor is never used in the rest of the file.

Also, when I check the code for this preprocessor, it has two modes: caffe and tf (tensorflow).

Each mode works differently.

Finally, I can't find consistent documentation on the internet.

So, what is the best range for working? To what range are the model weights trained?

The model weights were ported from caffe, so it's in BGR format.

Caffe uses a BGR color channel scheme for reading image files. This is due to the underlying OpenCV implementation of imread. The assumption of RGB is a common mistake.

You can find the original caffe model weight files on VGG website. This link can also be found on Keras documentation.

I think the second range would be the closest one. There's no scaling during training, but the authors have subtracted the mean value of the ILSVRC2014 training set. As stated in the original VGG paper, section 2.1:

The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

This sentence is actually what imagenet_utils.preprocess_input(mode='caffe') does.

Convert from RGB to BGR: because keras.preprocessing.image.load_img() loads images in RGB format, this conversion is required for VGG16 (and all models ported from caffe).
Subtract the mean BGR values: (103.939, 116.779, 123.68) is subtracted from the image array.

The preprocessor is not used in vgg16.py. It's imported in the file so that users can use the preprocess function by calling keras.applications.vgg16.preprocess_input(rgb_img_array), without caring about where model weights come from. The argument for preprocess_input() is always an image array in RGB format. If the model was trained with caffe, preprocess_input() will convert the array into BGR format.

Note that the function preprocess_input() is not intended to be called from imagenet_utils module. If you are using VGG16, call keras.applications.vgg16.preprocess_input() and the images will be converted to a suitable format and range that VGG16 was trained on. Similarly, if you are using Inception V3, call keras.applications.inception_v3.preprocess_input() and the images will be converted to the range that Inception V3 was trained on.

Recommended topics

Hot tags