keras val very slow when use model.fit_generator

Asked 1/4, 2017 at 10:11 Answered 12/7, 2020 at 14:19

When I use my dataset to turn base on Resnet-50 in Keras(backend is tensorflow), I find it very odd that when after each epoch, val is slower than train. I don't know why, is it because my GPU do not have enough memory? My GPU is K2200, which has 4 GB memory. Am I misunderstanding the paras' meaning ?

I have 35946 train pic so I use:

samples_per_epoch=35946,

I have 8986 val pic so I use：

 nb_val_samples=8986,

The following is part of my code:

train_datagen = ImageDataGenerator(
    rescale=1./255,
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=20,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False,
    zoom_range=0.1,
    channel_shift_range=0.,
    fill_mode='nearest',
    cval=0.,

)
test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    'data/train',
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    'data/val',
    batch_size=batch_size,
    class_mode='categorical')
model.fit_generator(train_generator,
                    # steps_per_epoch=X_train.shape[0] // batch_size,
                    samples_per_epoch=35946,
                    epochs=epochs,
                    validation_data=validation_generator,
                    verbose=1,
                    nb_val_samples=8986,
                    callbacks=[earlyStopping,saveBestModel,tensorboard])

Boanerges answered 1/4, 2017 at 10:11 Comment(6)

What do you mean by "val is slower than train"? – Cyclostyle 1/4, 2017 at 11:1

The time test the val accuracy and loss is very long. – Boanerges 1/4, 2017 at 11:33

I also find it that it seems wait a long time at this: def data_generator_task(): while not self._stop_event.is_set(): try: if self._pickle_safe or self.queue.qsize() < max_q_size: generator_output = next(self._generator) self.queue.put(generator_output) else: time.sleep(wait_time) except Exception: self._stop_event.set() raise ` – Boanerges 1/4, 2017 at 11:36

I find that it seems take a long time in the first epoch and first time val. After the first epoch it becomes faster now. – Boanerges 1/4, 2017 at 12:8

Could you show us the logs? – Moonrise 3/4, 2017 at 14:17

IMO and Experience as well, one needs to manage the batch_size in concordance with the available hardware. For example, on my specific h/w, for a specific case, if I keep batch size to be 128, it takes ~ 60 seconds per epoch. Everything else being exactly same, if I make batch size to 64, it takes ~ 14 seconds per epoch. I roughly understand the reason behind this. – Zavras 11/5, 2017 at 11:9

@Yanning As you have mentioned in your comment itself the first epoch is slow because the ImageDataGenerator is reading data from disk to RAM. This part is very slow. Once the data has been read into RAM it just the matter of reading and transferring data from RAM to GPU.

Therefore if your dataset is not huge and can fit into your RAM, you can try to make a single numpy file out of all the dataset and read this data in the beginning. This will save a lot of disk seek time.

Please checkout this post to get some comparison between time taken for different operations:

Latency Numbers Every Programmer Should Know

Latency Comparison Numbers

Main memory reference                         100 ns
Read 1 MB sequentially from memory        250,000 ns 
Read 1 MB sequentially from SSD         1,000,000 ns
Read 1 MB sequentially from disk       20,000,000 ns

Brain answered 3/12, 2017 at 1:44 Comment(0)

I think the answer lies in the various choice of arguments for "fit_generator" function. I was having same issue, and have fixed that by using following arguments in the "fit_generator" function.

steps_per_epoch=training_samples_count // batch_size,
validation_steps=validation_samples_count // batch_size,

Note that I have specified steps for both validation and training, and this makes validation blazing fast.

Hamburg answered 12/7, 2020 at 14:19 Comment(0)

Latency Comparison Numbers

Recommended topics

Hot tags