GRU/LSTM in Keras with input sequence of varying length
Asked Answered
E

1

6

I'm working on a smaller project to better understand RNN, in particualr LSTM and GRU. I'm not at all an expert, so please bear that in mind.

The problem I'm facing is given as data in the form of:

>>> import numpy as np
>>> import pandas as pd
>>> pd.DataFrame([[1, 2, 3],[1, 2, 1], [1, 3, 2],[2, 3, 1],[3, 1, 1],[3, 3, 2],[4, 3, 3]], columns=['person', 'interaction', 'group'])
   person  interaction  group
0       1            2      3
1       1            2      1
2       1            3      2
3       2            3      1
4       3            1      1
5       3            3      2
6       4            3      3

this is just for explanation. We have different person interacting with different groups in different ways. I've already encoded the various features. The last interaction of a user is always a 3, which means selecting a certain group. In the short example above person 1 chooses group 2, person 2 chooses group 1 and so on.

My whole data set is much bigger but I would like to understand first the conceptual part before throwing models at it. The task I would like to learn is given a sequence of interaction, which group is chosen by the person. A bit more concrete, I would like to have an output a list with all groups (there are 3 groups, 1, 2, 3) sorted by the most likely choice, followed by the second and third likest group. The loss function is therefore a mean reciprocal rank.

I know that in Keras Grus/LSTM can handle various length input. So my three questions are.

The input is of the format:

(samples, timesteps, features)

writing high level code:

import keras.layers as L
import keras.models as M
model_input = L.Input(shape=(?, None, 2))

timestep=None should imply the varying size and 2 is for the feature interaction and group. But what about the samples? How do I define the batches?

For the output I'm a bit puzzled how this should look like in this example? I think for each last interaction of a person I would like to have a list of length 3. Assuming I've set up the output

model_output = L.LSTM(3, return_sequences=False)

I then want to compile it. Is there a way of using the mean reciprocal rank?

model.compile('adam', '?')

I know the questions are fairly high level, but I would like to understand first the big picture and start to play around. Any help would therefore be appreciated.

Elfreda answered 2/4, 2019 at 20:19 Comment(0)
T
10

The concept you've drawn in your question is a pretty good start already. I'll add a few things to make it work, as well as a code example below:

  • You can specify LSTM(n_hidden, input_shape=(None, 2)) directly, instead of inserting an extra Input layer; the batch dimension is to be omitted for the definition.
  • Since your model is going to perform some kind of classification (based on time series data) the final layer is what we'd expect from "normal" classification as well, a Dense(num_classes, action='softmax'). Chaining the LSTM and the Dense layer together will first pass the time series input through the LSTM layer and then feed its output (determined by the number of hidden units) into the Dense layer. activation='softmax' allows to compute a class score for each class (we're going to use one-hot-encoding in a data preprocessing step, see code example below). This means class scores are not ordered, but you can always do so via np.argsort or np.argmax.
  • Categorical crossentropy loss is suited for comparing the classification score, so we'll use that one: model.compile(loss='categorical_crossentropy', optimizer='adam').
  • Since the number of interactions. i.e. the length of model input, varies from sample to sample we'll use a batch size of 1 and feed in one sample at a time.

The following is a sample implementation w.r.t to the above considerations. Note that I modified your sample data a bit, in order to provide more "reasoning" behind group choices. Also each person needs to perform at least one interaction before choosing a group (i.e. the input sequence cannot be empty); if this is not the case for your data, then introducing an additional no-op interaction (e.g. 0) can help.

import pandas as pd
import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(10, input_shape=(None, 2)))  # LSTM for arbitrary length series.
model.add(tf.keras.layers.Dense(3, activation='softmax'))   # Softmax for class probabilities.
model.compile(loss='categorical_crossentropy', optimizer='adam')

# Example interactions:
#   * 1: Likes the group,
#   * 2: Dislikes the group,
#   * 3: Chooses the group.
df = pd.DataFrame([
    [1, 1, 3],
    [1, 1, 3],
    [1, 2, 2],
    [1, 3, 3],
    [2, 2, 1],
    [2, 2, 3],
    [2, 1, 2],
    [2, 3, 2],
    [3, 1, 1],
    [3, 1, 1],
    [3, 1, 1],
    [3, 2, 3],
    [3, 2, 2],
    [3, 3, 1]],
    columns=['person', 'interaction', 'group']
)
data = [person[1][['interaction', 'group']].values for person in df.groupby('person')]
x_train = [x[:-1] for x in data]
y_train = tf.keras.utils.to_categorical([x[-1, 1]-1 for x in data])  # Expects class labels from 0 to n (-> subtract 1).
print(x_train)
print(y_train)

class TrainGenerator(tf.keras.utils.Sequence):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, index):
        # Need to expand arrays to have batch size 1.
        return self.x[index][None, :, :], self.y[index][None, :]

model.fit_generator(TrainGenerator(x_train, y_train), epochs=1000)
pred = [model.predict(x[None, :, :]).ravel() for x in x_train]
for p, y in zip(pred, y_train):
    print(p, y)

And the corresponding sample output:

[...]
Epoch 1000/1000
3/3 [==============================] - 0s 40ms/step - loss: 0.0037
[0.00213619 0.00241093 0.9954529 ] [0. 0. 1.]
[0.00123938 0.99718493 0.00157572] [0. 1. 0.]
[9.9632275e-01 7.5039308e-04 2.9268670e-03] [1. 0. 0.]

Using custom generator expressions: According to the documentation we can use any generator to yield the data. The generator is expected to yield batches of the data and loop over the whole data set indefinitely. When using tf.keras.utils.Sequence we do not need to specify the parameter steps_per_epoch as this will default to len(train_generator). Hence, when using a custom generator, we shall provide this parameter as well:

import itertools as it

model.fit_generator(((x_train[i % len(x_train)][None, :, :],
                      y_train[i % len(y_train)][None, :]) for i in it.count()),
                    epochs=1000,
                    steps_per_epoch=len(x_train))
Turgeon answered 5/4, 2019 at 23:48 Comment(4)
many thanks! That did help a lot. One question, is it always necessary to define your own class for the fit_generator? I tried to apply it to my real data set and it looks like I managed to train it. I might have a small question over next couple of days in which case I just drop a comment here. But I want award the bounty before it expires as your answer really helped me. thanks againElfreda
@Elfreda According to the documentation we can use any generator for that purpose but then also need to specify steps_per_epoch for fit_generator, since otherwise the training loop doesn't know when one epoch finishes; for the Sequence utility class this defaults to len(train_generator). In general I find that using this class helps structure the code. Please see my updated answer for an example with a generator.Turgeon
I was just wondering, is there like a upper bound of amount of groups from a practical point of view?Elfreda
@Elfreda If you have more groups you most likely need more samples in order to train the network. This clearly comes at the cost of increased training times. Also you might need a more complex network architecture (more hidden nodes for example) and also here you might reach practical bounds (both for compute and memory). I guess the most immediate challenge comes from the need of increased data sets, also because if for some groups you have only a few samples involving them, predictions concerning these groups might be poor. But in the end all that really depends on your specific use case.Turgeon

© 2022 - 2024 — McMap. All rights reserved.