How training LSTM model for sequences items ?
Asked Answered
U

2

8

I try to use the LSTM model for the next basket recommendation. I would like to apply the same approach that this article : A Dynamic Recurrent Model for Next Basket Recommendation

In my case, I have some users, which are buying some items at different times. So I have design my X data like :

    user ID       timestep     sequence items    
    user1            1          array(1, 20)
    user1            2            ...       

    user2            1            ...
    user2            2            ...
    user2            3            ...

    user3            1            ...
    user3            1            ...

The sequences items represent an array with the shape (1,20). These vectors are the mean representation of each items (generate with word2vec) purchased during each sequence.

Then I design my label y like :

    user ID       label    
    user1         np.array(1, 6000)
    user2         ...
    user3         ... 

The label user represents the next order of each user, after their past orders which represent in X data. In addition, the labels are vectors like [1 0 1 0 0 0 .. 1 ] where 1 indicate that the user purchased the item, and otherwise 0.

So, I would like use the LSTM to train the past sequences of each user to predict the next purchase sequences. Below, I define an LSTM model, where I don't return the sequence because I have one label by user.

  model_rnn = Sequential()
  model_rnn.add(LSTM(20, return_sequences=False, input_shape=(None, 20)))
  model_rnn.add(Dropout(0.2))
  model_rnn.add(Dense(nb_classes)) 
  model_rnn.add(Activation("sigmoid"))                

  model_rnn.compile(loss='binary_crossentropy', optimizer="Adagrad")
  n_index = X.index.values
  n_sample = int(len(X.index.values)*0.7)
  user_index = np.random.choice(n_index, n_sample, replace=False)
  n_epochs = 10      
  for _ in range(n_epochs):
       for index in user_index:
          X_train = X.ix[index, "sequence_items"]
          X_train.reshape(1, X_train.shape[0], X_train.shape[1])
          y_train = y[index, :].toarray()
          model_rnn.fit(X_train, y_train, batch_size=1, epochs=1, shuffle=1)

As you can see, I train my LSTM with batch_size = 1 because the timestep is different between the users. I fit the model on 70% of the users and I test the model on the rest.

My results are very poor, the top-n items recommended by the model for each user test is very similar. For example, for a specific user, the model recommend items that never appear in its old sequences. While normaly, it must predict items compared by the last sequences, so, it should predict high probabilities for the items which are purchased in the past.

Evidently, my approach seems wrong. Maybe the design and training data aren't adapted for my goal. Have you any idea or advice to fit the data, to reach my goal ?

Note : When I fit a LSTM model with only one user, with his sequences and his labels at each time (represents the next order at each time sequences), I get good results to predict the next order with the last user order. But this approach, force me to train N-user LSTM model, so isn't right.

Thanks you,

Unwatched answered 13/6, 2017 at 20:20 Comment(0)
A
1

I am not an expert but I am not sure about the batch size. As I know Keras LSTM reset its state after each batch. So when your batch size if 1, LSTM resets its memory. So you are forgetting what user 1 did at timestep 1 when processing timestep 2. Maximum number of purchases can be your batch size. You can use masking to avoid effect of padding.

Allfired answered 19/6, 2017 at 7:48 Comment(0)
S
0

By fitting the network to all users in your loop, you are creating a generalized model for all users. That's probably the reason why you are getting similar results for test data.

The paper you mentioned aims to capture: 1) general interest of each user from data of past baskets AND 2) sequential information in purchases (E.g.: bought bread, next time will buy butter)

Take a look at the description of Figure 1)

The input layer comprises a series of basket representations of a user. Dynamic representation of the user can be obtained in the hidden layer. Finally the output layer shows scores of this user towards all items.

I believe they train a model for each user on the fly and predict from that. The way they make this feasible, is pooling of items in each basket.

For their data, max_pooling worked better, but you could also try out avg_pooling just like in the paper. Hope this helps. Trying to implement this paper myself, so if you have any progress please let us know.

Ss answered 24/1, 2018 at 0:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.