Loss is NaN on image classification task
Asked Answered
E

3

6

I'm trying to train a basic CNN on the image dataset that contains faces of celebrities with the class assigned corresponding to each person. Given that there are about 10,000 classes I used sparse_categorical_crossentropy rather than one-hot encoding the classes, however as soon as the network starts training the loss is stuck at one number and after several batches is goes to NaN I tried different scaling of the images and a smaller network but with no luck. Any clues on what might be causing the NaN?

Function that generates batches:

def Generator(data, label, batch_size):
    url = "../input/celeba-dataset/img_align_celeba/img_align_celeba/"
    INPUT_SHAPE = (109, 109)
    i = 0
    while True:
        image_batch = [ ]
        label_batch = [ ]
        for b in range(batch_size):
            if i == len(data):
                i = 0
                data, label = shuffle(data, label)
            sample = data[i]
            label_batch.append(label[i])
            i += 1
            image = cv2.resize(cv2.imread(url + sample), INPUT_SHAPE)
            image_batch.append((image.astype(float)) / 255)

        yield (np.array(image_batch), np.array(label_batch))

The model:

class CNN():

def __init__(self, train, val, y_train, y_val, batch_size):
    ## Load the batch generator
    self.train_batch_gen = Generator(train, y_train, batch_size)
    self.val_batch_gen = Generator(val, y_val, batch_size)

    self.input_shape = (109, 109, 3)
    self.num_classes = len(np.unique(y_train))
    self.len_train = len(train)
    self.len_val = len(val)

    self.batch_size = batch_size
    self.model = self.buildModel()

def buildModel(self):

    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation='relu', padding="same", input_shape=self.input_shape))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding="same", input_shape=self.input_shape))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(96, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(192, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(160, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(320, (3, 3), activation='relu', padding="same"))
    model.add(layers.AveragePooling2D(pool_size=(4, 4)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='tanh'))
    model.add(layers.Dropout(rate=0.1))
    model.add(layers.Dense(self.num_classes, activation = "softmax")) #Classification layer or output layer
    opt = tf.keras.optimizers.Adam(learning_rate=0.00001)
    model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    return model

def trainModel(self, epochs):

    self.model.fit_generator(generator=self.train_batch_gen,
                            steps_per_epoch = int(self.len_train // self.batch_size),
                            epochs=epochs,
                            validation_data = self.val_batch_gen,
                            validation_steps = int(self.len_val // self.batch_size))
Episode answered 24/7, 2019 at 12:2 Comment(1)
Did you try varying the learning rate of the Adam optimizer? It could be too small, you should prefer the default values.Procession
C
3

In my case, I used sparse_categorical_crossentropy with labels numbered from [1,2,3] (3 classes). In this case it produced NaNs from the start.

When I changed the labels from [1,2,3] to [0,1,2] the problem has disappeared.

Crosscountry answered 16/10, 2020 at 12:13 Comment(1)
That's correct especially when the last layer is set to 'softmax' in the multiclassification problemsPepsinogen
H
0

Not sure why you are seeing those nans. I suspect it has something to do with your tanh activation on your dense layer. I would replace it with relu. I also suggest using more neurons on this dense layer cause 128 is probably small for a 10000 output.

If i were you, i would also try a pre-trained model and/or Siamese networks.

Herson answered 24/7, 2019 at 12:24 Comment(1)
Thanks, yeah I've tried relu and larger dense layer, didn't help, any chance the large number of classes and the small number of images per class can cause this issue?Episode
R
0

This looks like Exploding Gradients problem. I would recommend you to check how the weights and gradients are varying. See this: https://github.com/keras-team/keras/issues/2226

Check https://www.dlology.com/blog/how-to-deal-with-vanishingexploding-gradients-in-keras/ on how to spot exploding gradient problem and solutions to it. Also try out Xavier initialization in your dense layers to prevent exploding gradients.

Ruiz answered 24/7, 2019 at 17:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.