Very good validation accuracy but bad predictions
Asked Answered
P

5

5

I'm building a keras model to classify cats and dogs. I used transfer learning with bottleneck features and fine tuning with vgg model. Now I get very good validation accuracy like 97% but when I get to predict I get very bad results regarding the classification report and confusion matrix. What could be the problem?

Here is the code of fine tuning and the results I get

base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(150,150,3))
print('Model loaded.')

# build a classifier model to put on top of the convolutional model
top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2, activation='sigmoid'))

# note that it is necessary to start with a fully-trained
# classifier, including the top classifier,
# in order to successfully do fine-tuning
top_model.load_weights(top_model_weights_path)

# add the model on top of the convolutional base
# model.add(top_model)
model = Model(inputs=base_model.input, outputs=top_model(base_model.output))

# set the first 25 layers (up to the last conv block)
# to non-trainable (weights will not be updated)
for layer in model.layers[:15]:
    layer.trainable = False

# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

# prepare data augmentation configuration
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')

model.summary()

# fine-tune the model
model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size,
    verbose=2)
scores=model.evaluate_generator(generator=validation_generator,
steps=nb_validation_samples // batch_size)
print("Accuracy = ", scores[1])

Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size)

y_pred = np.argmax(Y_pred, axis=1)

print('Confusion Matrix')

print(confusion_matrix(validation_generator.classes, y_pred))

print('Classification Report')

target_names = ['Cats', 'Dogs']

print(classification_report(validation_generator.classes, y_pred, target_names=target_names))
model.save("model_tuned.h5")

Accuracy = 0.974375

Confusion Matrix [[186 214] [199 201]]

Classification Report

          precision    recall  f1-score   support

    Cats       0.48      0.47      0.47       400
    Dogs       0.48      0.50      0.49       400

micro avg 0.48 0.48 0.48 800 macro avg 0.48 0.48 0.48 800 weighted avg 0.48 0.48 0.48 800

Pilloff answered 29/6, 2019 at 6:14 Comment(1)
I have the same problem, basically, the title of your question here is incorrect. Your training accuracy is high, but the classification report does not reflect the truth.Jonell
W
6

I think the problem is that you should add shuffle = False in your validation generator

validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)

The problem is that the default behaviour is to shuffle the images so the label order of

validation_generator.classes

doesn't match the generator

War answered 28/10, 2019 at 11:25 Comment(0)
A
1

There are two issues with your model. First you need to use softmax activation if you have more than one output neuron:

top_model.add(Dense(2, activation='softmax'))

And then you have to use categorical_crossentropy loss, binary crossentropy is only for when you have one output neuron with sigmoid activations.

model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])
Amateurish answered 29/6, 2019 at 8:13 Comment(5)
actually i'm confused. I have two classes, is it okay to use softmaxPilloff
Yes, it is okayAmateurish
I've made the changes you recommended but I still get bad predictions. are there other suggestions? I appreciate your help sirPilloff
With two classes that are mutually exclusive, you should just use 1 output neuron with a sigmoid activation function and binary_crossentropy loss. The softmax can actually be seen as an extension of the sigmoid when you have more than 2 classes.Bridgetbridgetown
@GiuseppeMarra No, a two class softmax can also be used, there is no single option as you are implying.Amateurish
H
1

I am doing skin cancer classification and the data are balanced. Now consider the confusion matrix below and its accuracy. This is still not matching and it can not be a case of data imbalance. Link

Dataset test_pred = model.predict(test_generator) output is accuracy 89% and

the confusion matrix is given by array([[267, 271], [233, 229]]) This is not anyway matching. [1]: https://kaggle.com/hasnainjaved/melanoma-skin-cancer-dataset-of-10000-images

Hoeve answered 10/4, 2023 at 4:55 Comment(0)
I
0

There are usually two reasons for this problem:

  1. The most common one is when we implement (predict) the model with a different form of image (maybe forget to normalize or mix up height and width). This does not seem to be the case here.

  2. The second one is when there are many more samples of one class over the others. Say there are 1000 samples A and 100 samples B. If the model only gesses A it will be correct 90% of the time. This is called a "local minimum" in mathematics, and even if the validation result yields 0.9 accuracy, the implementation will be horrible.

In short, are you dealing with imbalanced data? It is sometimes hard to avoid local minima in this case. Could this be the issue here?

Inborn answered 27/10, 2021 at 11:51 Comment(2)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Sileas
I am doing skin cancer classification and the data is balanced. Now consider the confusion matrix below and its accuracy. This is still not matching and it can not be case of data imbalance. [Link](kaggle.com/hasnainjaved/melanoma-skin-cancer-dataset-of-10000-images) Dataset code test_pred = model.predict(test_generator) code out put is 89% and confusion matrix is code array([[267, 271], [233, 229]]) codeHoeve
C
0

Somehow, the predict_generator() of Keras' model does not work as expected. I would rather loop through all test images one-by-one and get the prediction for each image in each iteration. I am using Plaid-ML Keras as my backend and to get prediction I am using the following code.

import os
from PIL import Image
import keras
import numpy

print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all traffic signs class.
classes = {
    0:'This is Cat',
    1:'This is Dog',
}
for file_name in files:
    total += 1
    image = Image.open(dir + "/" + file_name).convert('RGB')
    image = image.resize((100,100))
    image = numpy.expand_dims(image, axis=0)
    image = numpy.array(image)
    image = image/255
    pred = model.predict_classes([image])[0]
    sign = classes[pred]
    if ("cat" in file_name) and ("cat" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
    elif ("dog" in file_name) and ("dog" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
print("accuracy: ", (correct/total))

Clements answered 17/11, 2022 at 16:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.