how to load images by their paths in dataframe columns for dual input using datagenerator
Asked Answered
S

1

1

I have got a memory error due to a huge amount of images, that happens when I directly load all the images from their given paths in a dataframe.

dataframe(df_train_data)'s format for training set is like below:

class_id    ID      uu         vv
Abnormal    1001    1001_05.png 1001_06.png
Abnormal    1002    1002_05.png 1002_06.png
Abnormal    1003    1003_05.png 1003_06.png
Normal  1554    1554_05.png 1554_06.png
Normal  1555    1555_05.png 1555_06.png
Normal  1556    1556_05.png 1556_06.png
...

Note that Normal class instances come after all Abnormal class instances, they are all ordered in that way.

I am reading the images and their IDs in the following form:

X_uu_train = read_imgs(df_train_data.uu.values, img_height, img_width, channels)
X_vv_train = read_imgs(df_train_data.vv.values, img_height, img_width, channels)
train_labels = df_train_data.ID.values

where read_imgs returns all of the images in numpy array.

The Memory error happens right at the X_uu_train = read_imgs(df_train_data.uu.values, img_height, img_width, channels).

I have seen some solutions where it is recommended to use ImageDataGenerator to load images as batches. However, I am not loading images from a directory as shown on most sites. Turns out that there is a way to load images from data frames that goes like .flow_from_dataframe.

Here is the training stage:

hist = base_model.fit([X_uu_train, X_vv_train], train_labels,
                         batch_size=batch_size, epochs=epochs,  verbose=1,
                         validation_data=([X_uu_val, X_vv_val], val_labels), shuffle=True)
preds = base_model.predict([X_uu_val, X_vv_val])

The thing is it does it only with a single input, but my generator should bring image batches for dual input.

Could someone help me construct an ImageDataGenerator so that I can hopefully load images without running into MemoryError

While loading from uu and vv columns, images should be input into the network with their corresponding pairs in a shuffled order.

P.S. I may provide more info if necessary

Thank you.

EDIT:

<BatchDataset shapes: (((None, 224, 224, 3), (None, 224, 224, 3)), (None,)), types: ((tf.float32, tf.float32), tf.int32)>

EDIT-2:

AttributeError                            Traceback (most recent call last)
<ipython-input-18-4ae4c12b2b76> in <module>
     43 
     44                 base_model = combined_net()
---> 45                 hist = base_model.fit(ds_train, epochs=epochs,  verbose=1,  validation_data=ds_val, shuffle=True)
     46 
     47                 preds = base_model.predict(ds_val)

~\Anaconda3\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
   1152             sample_weight=sample_weight,
   1153             class_weight=class_weight,
-> 1154             batch_size=batch_size)
   1155 
   1156         # Prepare validation data.

~\Anaconda3\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
    577             feed_input_shapes,
    578             check_batch_axis=False,  # Don't enforce the batch size.
--> 579             exception_prefix='input')
    580 
    581         if y is not None:

~\Anaconda3\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
     97         data = data.values if data.__class__.__name__ == 'DataFrame' else data
     98         data = [data]
---> 99     data = [standardize_single_array(x) for x in data]
    100 
    101     if len(data) != len(names):

~\Anaconda3\lib\site-packages\keras\engine\training_utils.py in <listcomp>(.0)
     97         data = data.values if data.__class__.__name__ == 'DataFrame' else data
     98         data = [data]
---> 99     data = [standardize_single_array(x) for x in data]
    100 
    101     if len(data) != len(names):

~\Anaconda3\lib\site-packages\keras\engine\training_utils.py in standardize_single_array(x)
     32                 'Got tensor with shape: %s' % str(shape))
     33         return x
---> 34     elif x.ndim == 1:
     35         x = np.expand_dims(x, 1)
     36     return x

AttributeError: 'BatchDataset' object has no attribute 'ndim'
Swollen answered 28/8, 2020 at 15:6 Comment(0)
A
3

ImageDataGenerator creates a tf.data.Dataset object, so you can use that directly for more flexibility. You can pass a list of filenames and it will only load them iteratively.

import pandas as pd
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf

df = pd.read_clipboard()

x = df.uu
y = df.vv
z = df.class_id

def load(file_path):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(img, size=(100, 100))
    return img

ds = tf.data.Dataset.from_tensor_slices((x, y, z)).\
    map(lambda xx, yy, zz: (load(xx), load(yy), zz)).\
    batch(4)

next(iter(ds))

Here's a complete example starting from a list of files (it's easy when you have a data frame), all the way to model training.

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import numpy as np
import cv2
from skimage import data
import tensorflow as tf

coffee = data.coffee()
cat = data.chelsea()

for image, name in zip([coffee, cat], ['coffee', 'cat']):
    for i in range(5):
        cv2.imwrite(f'{name}_{i}.png', image)

cat_files = list(filter(lambda x: x.startswith('cat'), os.listdir()))
coffee_files = list(filter(lambda x: x.startswith('coffee'), os.listdir()))


def load(file_path):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(img, size=(100, 100))
    return img


def label(string):
    return tf.cast(tf.equal(string, 'abnormal'), tf.int32)


x = cat_files
y = coffee_files
z = np.random.choice(['normal', 'abnormal'], 5)

inputs = tf.data.Dataset.from_tensor_slices((x, y)).map(lambda x, y: (load(x), load(y)))
labels = tf.data.Dataset.from_tensor_slices(z).map(lambda x: label(x))

ds = tf.data.Dataset.zip((inputs, labels)).batch(4)

next(iter(ds))

inputs1 = tf.keras.layers.Input(shape=(100, 100, 3), name='input1')
inputs2 = tf.keras.layers.Input(shape=(100, 100, 3), name='input2')

xx = tf.keras.layers.Flatten()(inputs1)
yy = tf.keras.layers.Flatten()(inputs2)
x = tf.keras.layers.Concatenate()([xx, yy])
x = tf.keras.layers.Dense(32, activation='relu')(x)
output = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=[inputs1, inputs2], outputs=output)

model.compile(loss='binary_crossentropy', optimizer='adam')

history = model.fit(ds)
Train for 2 steps
1/2 [==============>...............] - ETA: 0s - loss: 0.7527
2/2 [==============================] - 1s 251ms/step - loss: 5.5188

Then you can also predict:

model.predict(ds)
array([[4.7391814e-26],
       [4.7391814e-26],
       [4.7391814e-26],
       [4.7391814e-26],
       [4.7390730e-26]], dtype=float32)
Alsatian answered 28/8, 2020 at 15:51 Comment(10)
thank you for your help, but I am struggling to put things properly. Do I put x.values in the place of xx at load(xx)? And if ds returns only one expression then what I should put at .predict([X_uu_val, X_vv_val]) to get predictions? Please simplify your answer a bit more :xDSwollen
OK I'm afraid I didn't simplify it but I made it reproducible. I made bogus pictures and loaded them back iteratively. Now you can customize by replacing the filenames. Hope that works.Alsatian
it helped, thanks, but hist = base_model.fit(ds_train, epochs=epochs, verbose=1, validation_data=ds_val, shuffle=True) gives an AttributeError: 'BatchDataset' object has no attribute 'ndim' due to batch_size.Swollen
Maybe remove the shuffle=True?Alsatian
I posted a related question again here, please have a look. Thanks.Swollen
Oh you don't have TF2.X! Then I have no idea. Besides, the code in what you posted in the other question doesn't really look like what I suggested.Alsatian
well, I tried the code in both tf versions (2.1.0 on local jupyter notebook) and 1.9.0 on remote ubuntu server. I have been experimenting on both. So you can still contribute right here. Thanks.Swollen
Did you get the code to run by copy/pasting my example above?Alsatian
If my example runs, and doesn't work for you, it's quite difficult to know what you did differently...Alsatian
I edited the question, have a look. Let me know if there is something you might needSwollen

© 2022 - 2024 — McMap. All rights reserved.