Can flow_from_directory get train and validation data from the same directory in Keras?

Asked 29/10, 2018 at 1:3 Answered 8/3, 2022 at 6:12

Solved python machine-learning keras training-data

I got the following example from here.

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

There are two separate directories for train and validation. Just curious whether I can get train and validation data split from the same directory instead of two separate directories? Any example?

Hershelhershell answered 29/10, 2018 at 1:3 Comment(0)

You can pass validation_split argument (a number between 0 and 1) to ImageDataGenerator class instance to split the data into train and validation sets:

generator = ImagaDataGenerator(..., validation_split=0.3)

And then pass subset argument to flow_from_directory to specify training and validation generators:

train_gen = generator.flow_from_directory(dir_path, ..., subset='training')
val_gen = generator.flow_from_directory(dir_path, ..., subset='validation')

Note: If you have set augmentation parameters for the ImageDataGenerator, then by using this solution both training and validation images will be augmented.

Downthrow answered 29/10, 2018 at 14:32 Comment(0)

The above solution requires you to apply the same augmentations to the training and validation set, which might not be desired (You might not want to apply shear,rotation and zoom etc to the validation data). Separate training and validation augmentations from the same folder is not yet available.

See https://github.com/keras-team/keras/issues/5862 for full discussion (and some possible ways to handle this). People have usually resorted to scripts that create a new folder for validation, but that won't be an exact answer to this question.

Thriller answered 25/6, 2019 at 13:41 Comment(0)

As @dapperdan mentioned, the current marked solution by @today means that both training and validation sets go through the same transformations; which is fine if you are not planning to do data augmentation. If you want to do data augmentation then one would want to transform the training data and leave the validation data 'unaugmented'.

To do that, you should create two ImageDataGenerators with the required transformations from for the appropriate data; and then select subsets using 'flow_from_directory' with same seed.

# Validation ImageDataGenerator with rescaling.
valid_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
# Training ImagaDataGenerator with Augmentation transf.  
train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2,\
                                   rotation_range=15, shear_range=10,\
                                   zoom_range=0.1, fill_mode='nearest', \
                                   height_shift_range=0.05, width_shift_range=0.1)

# Create a flow from the directory for validation data - seed=42
# Choose subset = 'validation'
valid_gen = valid_datagen.flow_from_directory(dir_path, subset='validation',\
                                              shuffle=True, seed=42, 
                                              target_size=img_shape,\
                                              batch_size=64)
# Create a flow from the directory using same seed and 'training' subset.
train_gen = train_datagen.flow_from_directory(dir_path, subset='training',\
                          shuffle=True, seed=42, target_size=img_shape,\
                          batch_size=64)

Flight answered 15/12, 2019 at 9:17 Comment(1)

have you done any tests to make sure there is no leakage/overlap? – Sweptwing 12/3, 2021 at 9:43

seed will make sure that data is randomized similarly.

datagen_train = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255, 
validation_split=0.2,
rotation_range=20,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest')

datagen_val = tf.keras.preprocessing.image.ImageDataGenerator( rescale=1./255, validation_split=0.2)    

train_generator = datagen_train.flow_from_directory(
data_root,
seed=66,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE, 
shuffle=True,
subset='training')

val_generator = datagen_val.flow_from_directory(
data_root,
seed=66,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE, 
shuffle=True,
subset='validation')

Interatomic answered 8/3, 2022 at 6:12 Comment(1)

github.com/keras-team/keras/issues/5862 – Interatomic 8/3, 2022 at 6:13

Recommended topics

Hot tags