model.predict() with multiple datasets as inputs
Asked Answered
A

2

5

I'm trying to feed TensorFlow dataset (which is read from .csv files) into multi-input tf.keras model defined with functional API. Training works fine when I pass these datasets zipped together with labels. When I want to call predict() (presumably on some different dataset that is not labelled) it throws an error (both in eager and non-eager execution)

Here's my current code:

import tensorflow as tf
import numpy as np

tf.enable_eager_execution()

# Define model.
input_A = tf.keras.layers.Input(shape=(None, 5), name='sensor_A_input')
x_1 = tf.keras.layers.LSTM(5, return_sequences=False, recurrent_initializer='glorot_uniform')(input_A)

input_B = tf.keras.layers.Input(shape=(None, 4), name='sensor_B_input')
x_2 = tf.keras.layers.LSTM(5, return_sequences=False, recurrent_initializer='glorot_uniform')(input_B)

x = tf.keras.layers.concatenate([x_1, x_2], name='concat_test')
output = tf.keras.layers.Dense(1, activation='sigmoid', name='output')(x)

model = tf.keras.Model(inputs=[input_A, input_B], outputs=output)

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.run_eagerly = tf.executing_eagerly()

# Define input data.
# Dataset 1 is read from 10 .csv files where one file is one timeseries observation sequence of length 100 and 5 dimensions.
dataset_1 = tf.data.Dataset.from_tensor_slices(np.random.rand(10, 100, 5))
# Dataset 2 is read from 10 .csv files where one file is one timeseries observation sequence of length 300 and 4 dimensions.
dataset_2 = tf.data.Dataset.from_tensor_slices(np.random.rand(10, 300, 4))

# Define labels.
labels = tf.data.Dataset.from_tensor_slices(np.random.randint(0, 2, (10, 1)))

# Zip inputs and output into one dataset.
input_with_labels = tf.data.Dataset.zip(((dataset_1, dataset_2), labels)).batch(10)
model.fit(input_with_labels)

# Here's the problem - how should the input be arranged?
zipped_input = tf.data.Dataset.zip((dataset_1, dataset_2)).batch(10)
predictions = model.predict_generator(zipped_input)
print(predictions)

Here's the error:

ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [<tf.Tensor: id=71049, shape=(10, 100, 5), dtype=float64, numpy=
array([[[0.54049765, 0.64218937, 0.31734092, 0.81307839, 0.75465237],
        [0.32371089, 0.85923477, 0.60619924, 0.68692891, 0.186234...

Full traceback:

Traceback (most recent call last):
  File "C:/xxx/debug_multiple_input_model.py", line 39, in <module>
    model.predict(zipped_input)
  File "C:\env_path\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1054, in predict
    callbacks=callbacks)
  File "C:\env_path\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "C:\env_path\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 536, in predict_on_batch
    return model.predict_on_batch(x)
  File "C:\env_path\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1281, in predict_on_batch
    x, extract_tensors_from_dataset=True)
  File "C:\env_path\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2651, in _standardize_user_data
    exception_prefix='input')
  File "C:\env_path\lib\site-packages\tensorflow\python\keras\engine\training_utils.py", line 346, in standardize_input_data
    str(len(data)) + ' arrays: ' + str(data)[:200] + '...')
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [<tf.Tensor: id=71049, shape=(10, 100, 5), dtype=float64, numpy=
array([[[0.54049765, 0.64218937, 0.31734092, 0.81307839, 0.75465237],
        [0.32371089, 0.85923477, 0.60619924, 0.68692891, 0.186234...

I've also tried calling predict() function like this:

1:

model.predict_generator(zipped_input)

results in the same error.

2:

model.predict((dataset_1, dataset_2))

throws this error:

AttributeError: 'DatasetV1Adapter' object has no attribute 'shape'
Alcott answered 20/7, 2019 at 13:15 Comment(2)
Are you sure you want zip, not concatenate?Spathose
@Spathose as far as I know concatenate creates one dataset by appending multiple datasets together. That's not what I want - dataset_1 needs to go into sensor_A_input and dataset_2 into sensor_B_input as they've got different shapes (length of the sequence coming into LSTM).Alcott
A
7

It turned out that a 1-element tuple should be given as the input to tf.data.Dataset.zip() function:

zipped_input = tf.data.Dataset.zip(((dataset_1, dataset_2), )).batch(10)

This produces an expected result.

Alcott answered 26/7, 2019 at 22:43 Comment(3)
Are all the ( ) parentheses necessary or can it just be tf.data.Dataset.zip((dataset_1, dataset_2)).batch(10)?Frydman
Hmmm it's been a while. As far as I remember the shape of data had to match whatever tf.keras.Mode.predict or tf.keras.Mode.predict expect (see case 2 in the second part of my question where I mention AttributeError). Doesn't the model throw any exceptions when you feed it with data of such a shape?Alcott
Yes, I confirmed, the extra parentheses are needed. Thanks!Frydman
E
1

One more possible workaround:

ds1_iter= dataset_1.make_one_shot_iterator()
ds1_next= ds1_iter.get_next()

ds2_iter= dataset_2.make_one_shot_iterator()
ds2_next= ds2_iter.get_next()

prediction = model.predict(x={'first_input_name':ds1_next, 'second_input_name':ds2_next}, steps=N)

where N is a length of the datasets.

Euphemie answered 27/8, 2019 at 21:34 Comment(1)
That's then limited to the scenario in which I know N, right? Wouldn't work if I'd be reading the dataset from all files in a directory where every file has a different length, e.g. using tf.data.experimental.CsvDataset.Alcott

© 2022 - 2024 — McMap. All rights reserved.