tf.data with multiple inputs / outputs in Keras

Asked 30/9, 2018 at 21:19 Answered 7/10, 2018 at 12:29

Solved tensorflow keras tensorflow-datasets

For the application, such as pair text similarity, the input data is similar to: pair_1, pair_2. In these problems, we usually have multiple input data. Previously, I implemented my models successfully:

model.fit([pair_1, pair_2], labels, epochs=50)

I decided to replace my input pipeline with tf.data API. To this end, I create a Dataset similar to:

dataset = tf.data.Dataset.from_tensor_slices((pair_1, pair2, labels))

It compiles successfully but when start to train it throws the following exception:

AttributeError: 'tuple' object has no attribute 'ndim'

My Keras and Tensorflow version respectively are 2.1.6 and 1.11.0. I found a similar issue in Tensorflow repository: tf.keras multi-input models don't work when using tf.data.Dataset.

Does anyone know how to fix the issue?

Here is some main part of the code:

(q1_test, q2_test, label_test) = test
(q1_train, q2_train, label_train) = train

    def tfdata_generator(sent1, sent2, labels, is_training):
        '''Construct a data generator using tf.Dataset'''

        dataset = tf.data.Dataset.from_tensor_slices((sent1, sent2, labels))
        if is_training:
            dataset = dataset.shuffle(1000)  # depends on sample size

        dataset = dataset.repeat()
        dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

        return dataset

train_dataset = tfdata_generator(q1_train, q2_train, label_train, is_training=True, batch_size=_BATCH_SIZE)
test_dataset = tfdata_generator(q1_test, q2_test, label_test, is_training=False, batch_size=_BATCH_SIZE)


inps1 = keras.layers.Input(shape=(50,))
inps2 = keras.layers.Input(shape=(50,))

embed = keras.layers.Embedding(input_dim=nb_vocab, output_dim=300, weights=[embedding], trainable=False)
embed1 = embed(inps1)
embed2 = embed(inps2)

gru = keras.layers.CuDNNGRU(256)
gru1 = gru(embed1)
gru2 = gru(embed2)

concat = keras.layers.concatenate([gru1, gru2])

preds = keras.layers.Dense(1, 'sigmoid')(concat)

model = keras.models.Model(inputs=[inps1, inps2], outputs=preds)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

model.fit(
    train_dataset.make_one_shot_iterator(),
    steps_per_epoch=len(q1_train) // _BATCH_SIZE,
    epochs=50,
    validation_data=test_dataset.make_one_shot_iterator(),
    validation_steps=len(q1_test) // _BATCH_SIZE,
    verbose=1)

Mano answered 30/9, 2018 at 21:19 Comment(5)

Maybe the error is related to nesting tuple inside another tuple? It does not recognize the inner tuple as a Tensor object? Can you try feeding it something like (pair1, pair2, labels) and then feed the pairs yourself to the fit to see if that works? – Lugsail 2/10, 2018 at 2:32

I modified my example code, which should work now. Instead of tuples, you can pass a dictionary with the keys: "input_1" and "input_2" . – Irrespective 10/10, 2018 at 8:1

@Irrespective Can I do the same things with from_tensor_slices()? – Mano 10/10, 2018 at 22:2

@AmirHadifar yes, see my edit – Irrespective 11/10, 2018 at 8:40

Try dataset = tf.data.Dataset.from_tensor_slices(((pair_1, pair2), labels)) – Confess 11/5, 2020 at 19:14

I'm not using Keras but I would go with an tf.data.Dataset.from_generator() - like:

def _input_fn():
  sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
  sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
  sent1 = np.reshape(sent1, (8, 1, 1))
  sent2 = np.reshape(sent2, (8, 1, 1))

  labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
  labels = np.reshape(labels, (8, 1))

  def generator():
    for s1, s2, l in zip(sent1, sent2, labels):
      yield {"input_1": s1, "input_2": s2}, l

  dataset = tf.data.Dataset.from_generator(generator, output_types=({"input_1": tf.int64, "input_2": tf.int64}, tf.int64))
  dataset = dataset.batch(2)
  return dataset

...

model.fit(_input_fn(), epochs=10, steps_per_epoch=4)

This generator can iterate over your e.g text-files / numpy arrays and yield on every call a example. In this example, I assume that the word of the sentences are already converted to the indices in the vocabulary.

Edit: Since OP asked, it should be also possible with Dataset.from_tensor_slices():

def _input_fn():
  sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
  sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
  sent1 = np.reshape(sent1, (8, 1))
  sent2 = np.reshape(sent2, (8, 1))

  labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
  labels = np.reshape(labels, (8))

  dataset = tf.data.Dataset.from_tensor_slices(({"input_1": sent1, "input_2": sent2}, labels))
  dataset = dataset.batch(2, drop_remainder=True)
  return dataset

Irrespective answered 5/10, 2018 at 8:13 Comment(5)

Thanks for your response. my dataset is relatively small and I prefer to keep all of that in memory do you have any suggestion to fix the issue with ** from_tensor_slices** – Mano 5/10, 2018 at 20:13

Hi Amir. 2 questions, and sorry if they are kind of .. stupid: One of the guys at the issue on github, mentioned: 'So the new features of feeding the iterator directly to model.fit() is valid only when you are using tf.Keras not the standalone Keras.' (he had the same error like you, and fixed it, by including the "correct" keras.) The other question is, you postet two times from_tensor_slices() one with a tuple and one with a triplet, which one is line you use? – Irrespective 5/10, 2018 at 21:32

I used tf.keras API. You are right, but in the both situation, tuple or triplet, not worked. – Mano 7/10, 2018 at 14:50

Thanks this saved me a lot of effort – Outface 21/9, 2020 at 20:0

Thanks for your response but something is not working for me on tensorflow 2.3.2. If it is not too much to ask, could you please update your answer to include the model so that it is copy/paste testing? thanks again – Foregone 25/1, 2021 at 17:4

One way to solve your issue could be to use the zip dataset to combine your various inputs:

sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.float32)
sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.float32)
sent1 = np.reshape(sent1, (8, 1, 1))
sent2 = np.reshape(sent2, (8, 1, 1))

labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.float32)
labels = np.reshape(labels, (8, 1))

dataset_12 = tf.data.Dataset.from_tensor_slices((sent_1, sent_2))
dataset_label = tf.data.Dataset.from_tensor_slices(labels)

dataset = tf.data.Dataset.zip((dataset_12, dataset_label)).batch(2).repeat()
model.fit(dataset, epochs=10, steps_per_epoch=4)

will print: Epoch 1/10 4/4 [==============================] - 2s 503ms/step...

Runty answered 7/10, 2018 at 12:29 Comment(3)

Thank you @pfm. It sounds a good idea. I'll accept it if nobody gives another elegant way to solve the issue. – Mano 7/10, 2018 at 14:56

@Runty I have a similar issue, could you help me here – Doze 1/9, 2020 at 4:36

can you please create the same without label? I am trying for the test data and I couldn't figure out how to do it. – Stuyvesant 18/11, 2022 at 0:58

Recommended topics

Hot tags