I'm trying to solve a time series problem. In short, for each customer and material (SKU code), I have different orders placed in the past. I need to build a model that predict the number of days before the next order for each customer and material.
What I'm trying to do is to build an LSTM model in Keras, where for each customer and material I have a 50 max padded timesteps of history, and I'm using a mix of numeric (# of days since previous order, AVG days between orders in last 60 days etc...) and categorical features (SKU code, customer code, type of SKU etc...).
For the categorical, I'm trying to use the popular entity embedding technique. I started from an example published on Github, that was not using LSTM (it was embedding using input_lengh = 1) and generalized it to work with higher input emebdding that I could feed to LSTM.
Below my code.
from keras.regularizers import l2,l1
input_models=[]
output_embeddings=[]
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
## features is this list features =['CAT_Cliente_le','CAT_Famiglia_le','CAT_Materiale_le','CAT_Settimana','CAT_Sotto_Famiglia_le','NUM_Data_diff_comprato','NUM_Data_diff_comprato_avg','NUM_Data_diff_comprato_avg_sf','NUM_Qty','NUM_Rank']
for categorical_var in np.arange(len(features)-5):
#Name of the categorical variable that will be used in the Keras Embedding layer
cat_emb_name= features[categorical_var]+'_Embedding'
# Define the embedding_size, max size is 10
no_of_unique_cat = dataset_train.loc[:,features[categorical_var]].nunique()
embedding_size = int(min(np.ceil((no_of_unique_cat+1)/2), 10 ))
#One Embedding Layer for each categorical variable
input_model = Input(shape=(MAX_TIMESTEP,))
output_model = Embedding(no_of_unique_cat+1, embedding_size, name=cat_emb_name,input_length=MAX_TIMESTEP,mask_zero=True)(input_model)
#Appending all the categorical inputs
input_models.append(input_model)
#Appending all the embeddings
output_embeddings.append(output_model)
#Other non-categorical data columns (numerical). I have 5 of them
input_numeric = Input(shape=(MAX_TIMESTEP,len(['1','2','3','4','5']),))
mask_numeric = Masking(mask_value=0., input_shape=(MAX_TIMESTEP,5))(input_numeric)
input_models.append(input_numeric)
output_embeddings.append(mask_numeric)
output = Concatenate(axis=2)(output_embeddings)
output = LSTM(
units= 25,
input_shape=(MAX_TIMESTEP, 4),
use_bias=True,
kernel_initializer=he_normal(seed=14),
recurrent_initializer=he_normal(seed=14),
unit_forget_bias = True,
return_sequences=True)(output)
output = TimeDistributed(Dense(1))(output)
model = Model(inputs=input_models, outputs=output)
model.compile(loss='mae', optimizer=SGD(lr=0.2, decay=0.001, momentum=0.9, nesterov=False),
#clipvalue=0.75), epsilon=None, decay=0.00000, amsgrad=False),
metrics=['mape'])`
What I observed it that: -the model show good performance with numeric features only -adding categorical does nothing to improve performances (I would at least expect the model to overfit by producing very specific rules, like client X ordered material Y in week Z after 5 days), but this never happens
My question is, is there something conceptually wrong in using entity embedding in LSTM like this? Should I change something?
Thanks a lot in advance