WARNING: WARNING:tensorflow:Model was constructed with shape (None, 150) , but it was called on an input with incompatible shape (None, 1)
Asked Answered
V

1

6

So I'm trying to build a word embedding model but I keep getting this error. During training, the accuracy does not change and the val_loss remains "nan"

The raw shape of the data is

x.shape, y.shape
((94556,), (94556, 2557))

Then I reshape it so:

xr= np.asarray(x).astype('float32').reshape((-1,1))
yr= np.asarray(y).astype('float32').reshape((-1,1))
((94556, 1), (241779692, 1))

Then I run it through my model

model = Sequential()
model.add(Embedding(2557, 64, input_length=150, embeddings_initializer='glorot_uniform'))
model.add(Flatten())
model.add(Reshape((64,), input_shape=(94556, 1)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(1, activation='relu'))
# compile the mode
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# summarize the model
print(model.summary())
plot_model(model, show_shapes = True, show_layer_names=False)

After training, I get a constant accuracy and a val_loss nan for every epoch

history=model.fit(xr, yr, epochs=20, batch_size=32, validation_split=3/9)

Epoch 1/20
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1960/1970 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.9996WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 2/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 3/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 4/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 5/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 6/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 7/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 8/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 9/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 10/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 11/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 12/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 13/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 14/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 15/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 16/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 17/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 18/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 19/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 20/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996

I think it has to do whit the input/output shape but I'm not certain. I tried modifying the model in various ways, adding layers/ removing layers/ different optimizers/ different batch sizes and nothing worked so far.

Verdieverdigris answered 7/5, 2020 at 11:17 Comment(3)
Clearly, there are a few things wrong, at least in the way you do the calls to reshape, but I am not familiar enough with the domain to understand exactly what. Can you explain more precisely what kind of data is contained in x and y? (integer or float, which range, what do they represent, etc...)Asyut
I am trying to create a skipgram word embedding model. x and y contain words converted to unique number identifiers(integers). I have a function that outputs them as NumPy arrays: x (input, i.e. target word) and y (output, i.e. context word).Verdieverdigris
model.add(Reshape((64,), input_shape=(94556, 1))) input_shape Shape tuple (not including the batch axis), or TensorShape instance (not including the batch axis). tensorflow.org/api_docs/python/tf/keras/layers/InputLayerRamiform
A
11

Ok so, here is what I understood, correct me if I'm wrong:

  • x contains 94556 integers, each being the index of one out of 2557 words.
  • y contains 94556 vectors of 2557 integers, each containing also the index of one word, but this time it is a one-hot encoding instead of a categorical encoding.
  • Finally, a corresponding pair of words from x and y represents two words that are close by in the original text.

If I am correct so far, then the following runs correctly:

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *

x = np.random.randint(0,2557,94556)
y = np.eye((2557))[np.random.randint(0,2557,94556)]
xr = x.reshape((-1,1))


print("x.shape: {}\nxr.shape:{}\ny.shape: {}".format(x.shape, xr.shape, y.shape))


model = Sequential()
model.add(Embedding(2557, 64, input_length=1, embeddings_initializer='glorot_uniform'))
model.add(Reshape((64,)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(2557, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

history=model.fit(xr, y, epochs=20, batch_size=32, validation_split=3/9)

The most import modifications:

  • The y reshaping was losing the relationship between elements from x and y.
  • The input_length in the Embedding layer should correspond to the second dimension of xr.
  • The output of the last layer from the network should be the same dimension as the second dimension of y.

I am actually surprised the code ran without crashing.

Finally, from my research, it seems that people are not training skipgrams like this in practice, but rather they are trying to predict whether a training example is correct (the two words are close by) or not. Maybe this is the reason you came up with an output of dimension one.

Here is a model inspired from https://github.com/PacktPublishing/Deep-Learning-with-Keras/blob/master/Chapter05/keras_skipgram.py :

word_model = Sequential()
word_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
word_model.add(Reshape((embed_size,)))

context_model = Sequential()
context_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
context_model.add(Reshape((64,)))

model = Sequential()
model.add(Merge([word_model, context_model], mode="dot", dot_axes=0))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))

In that case, you would have 3 vectors, all from the same size (94556, 1) (or probably even bigger than 94556, since you might have to generate additional negative samples):

  • x containing integers from 0 to 2556
  • y containing integers from 0 to 2556
  • output containing 0s and 1s, whether each pair from x and y is a negative or a positive example

and the training would look like:

history = model.fit([x, y], output, epochs=20, batch_size=32, validation_split=3/9)
Asyut answered 7/5, 2020 at 15:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.