I try to use the LSTM model for the next basket recommendation. I would like to apply the same approach that this article : A Dynamic Recurrent Model for Next Basket Recommendation
In my case, I have some users, which are buying some items at different times. So I have design my X data like :
user ID timestep sequence items
user1 1 array(1, 20)
user1 2 ...
user2 1 ...
user2 2 ...
user2 3 ...
user3 1 ...
user3 1 ...
The sequences items represent an array with the shape (1,20). These vectors are the mean representation of each items (generate with word2vec) purchased during each sequence.
Then I design my label y like :
user ID label
user1 np.array(1, 6000)
user2 ...
user3 ...
The label user represents the next order of each user, after their past orders which represent in X data. In addition, the labels are vectors like [1 0 1 0 0 0 .. 1 ] where 1 indicate that the user purchased the item, and otherwise 0.
So, I would like use the LSTM to train the past sequences of each user to predict the next purchase sequences. Below, I define an LSTM model, where I don't return the sequence because I have one label by user.
model_rnn = Sequential()
model_rnn.add(LSTM(20, return_sequences=False, input_shape=(None, 20)))
model_rnn.add(Dropout(0.2))
model_rnn.add(Dense(nb_classes))
model_rnn.add(Activation("sigmoid"))
model_rnn.compile(loss='binary_crossentropy', optimizer="Adagrad")
n_index = X.index.values
n_sample = int(len(X.index.values)*0.7)
user_index = np.random.choice(n_index, n_sample, replace=False)
n_epochs = 10
for _ in range(n_epochs):
for index in user_index:
X_train = X.ix[index, "sequence_items"]
X_train.reshape(1, X_train.shape[0], X_train.shape[1])
y_train = y[index, :].toarray()
model_rnn.fit(X_train, y_train, batch_size=1, epochs=1, shuffle=1)
As you can see, I train my LSTM with batch_size = 1 because the timestep is different between the users. I fit the model on 70% of the users and I test the model on the rest.
My results are very poor, the top-n items recommended by the model for each user test is very similar. For example, for a specific user, the model recommend items that never appear in its old sequences. While normaly, it must predict items compared by the last sequences, so, it should predict high probabilities for the items which are purchased in the past.
Evidently, my approach seems wrong. Maybe the design and training data aren't adapted for my goal. Have you any idea or advice to fit the data, to reach my goal ?
Note : When I fit a LSTM model with only one user, with his sequences and his labels at each time (represents the next order at each time sequences), I get good results to predict the next order with the last user order. But this approach, force me to train N-user LSTM model, so isn't right.
Thanks you,