How to configure a very simple LSTM with Keras / Theano for Regression
Asked Answered
U

3

7

I am struggling to configure a Keras LSTM for a simple regression task. There is some very basic explanation at the official page: Keras RNN documentation

But to fully understand, example configurations with example data would be extremely helpful.

I have barely found examples for regression with Keras-LSTM. Most examples are about classification (text or images). I've studied the LSTM examples which come with the Keras distribution and one example I found through Google search: http://danielhnyk.cz/ It offers some insight, though the author admitts the approach is quite memory-inefficient, since data samples have to be stored very redundantly.

Although, an improvement was introduced by a commentor (Taha), data-storage is still redundant, I doubt this is the way it was meant to be by the Keras developers.

I've downloaded some simple example sequential data, which happens to be stock data from Yahoo finance. It is freely available from Yahoo Finance Data

Date,       Open,      High,      Low,       Close,     Volume,   Adj Close
2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998
2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998
2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997
2016-05-13, 90.00,     91.669998, 90.00,     90.519997, 44188200, 90.519997

The table consists of more than 8900 such lines of Apple stock data. There are 7 columns = data points for each day. The value to predict would be "AdjClose", which is the value at the end of the day

So the goal would be to predict the AdjClose for the next day, based on the sequence of a the previous few days. (This is probably next to impossible, but it is always good to see how a tool behaves under challenging conditions.)

I think this should be a very standard prediction/regression case for LSTM and easily transferrable to other problem domains.

So, how should the data be formatted (X_train, y_train) for minimum redundancy and how do I initialize the Sequential model with only one LSTM layer and a couple of hidden neurons?

Kind Regards, Theo

PS: I started coding this:

...
X_train
Out[6]: 
array([[  2.87500000e+01,   2.88750000e+01,   2.87500000e+01,
      2.87500000e+01,   1.17258400e+08,   4.31358010e-01],
   [  2.73750019e+01,   2.73750019e+01,   2.72500000e+01,
      2.72500000e+01,   4.39712000e+07,   4.08852011e-01],
   [  2.53750000e+01,   2.53750000e+01,   2.52500000e+01,
      2.52500000e+01,   2.64320000e+07,   3.78845006e-01],
   ..., 
   [  9.23899994e+01,   9.43899994e+01,   9.16500015e+01,
      9.38799973e+01,   6.11406000e+07,   9.38799973e+01],
   [  9.45500031e+01,   9.46999969e+01,   9.30100021e+01,
      9.34899979e+01,   4.65074000e+07,   9.34899979e+01],
   [  9.41600037e+01,   9.52099991e+01,   9.38899994e+01,
      9.45599976e+01,   4.19231000e+07,   9.45599976e+01]], dtype=float32)

y_train
Out[7]: 
array([  0.40885201,   0.37884501,   0.38822201, ...,  93.87999725,
   93.48999786,  94.55999756], dtype=float32)

So far, the data is ready. There is no redundancy introduced. Now the question is, how to describe a Keras LSTM model / training process on this data.

EDIT 3:

Here is the updated code with the 3D data structure required for recurrent networks. (See answer by Lorrit). It does not work, though.

EDIT 4: removed the extra comma after Activation('sigmoid'), shaped Y_train in the correct way. Still the same error.

import numpy as np

from keras.models import Sequential
from keras.layers import Dense,  Activation, LSTM

nb_timesteps    =  4
nb_features     =  5
batch_size      = 32

# load file
X_train = np.genfromtxt('table.csv', 
                        delimiter=',',  
                        names=None, 
                        unpack=False,
                        dtype=None)

# delete the first row with the names
X_train = np.delete(X_train, (0), axis=0)

# invert the order of the rows, so that the oldest
# entry is in the first row and the newest entry
# comes last
X_train = np.flipud(X_train)

# the last column is our Y
Y_train = X_train[:,6].astype(np.float32)

Y_train = np.delete(Y_train, range(0,6))
Y_train = np.array(Y_train)
Y_train.shape = (len(Y_train), 1)

# we don't use the timestamps. convert the rest to Float32
X_train = X_train[:, 1:6].astype(np.float32)

# shape X_train
X_train.shape = (1,len(X_train), nb_features)


# Now comes Lorrit's code for shaping the 3D-input-data
# https://mcmap.net/q/408415/-keras-how-should-i-prepare-input-data-for-rnn
flag = 0

for sample in range(X_train.shape[0]):
    tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)])

    if flag==0:
        new_input = tmp
        flag = 1

    else:
        new_input = np.concatenate((new_input,tmp))

X_train = np.delete(new_input, len(new_input) - 1, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
# X successfully shaped

# free some memory
tmp = None
new_input = None


# split data for training, validation and test
# 50:25:25
X_train, X_test = np.split(X_train, 2, axis=0)
X_valid, X_test = np.split(X_test, 2, axis=0)

Y_train, Y_test = np.split(Y_train, 2, axis=0)
Y_valid, Y_test = np.split(Y_test, 2, axis=0)


print('Build model...')

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

model.compile(loss='mse',
              optimizer='RMSprop',
              metrics=['accuracy'])

print('Train...')
print(X_train.shape)
print(Y_train.shape)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15,
          validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test,
                            batch_size=batch_size)

print('Test score:', score)
print('Test accuracy:', acc)

There still seems to be an issue with the data, Keras says:

Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model...

Traceback (most recent call last):

  File "<ipython-input-1-3a6e9e045167>", line 1, in <module>
    runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm')

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module>
    Activation('sigmoid')

  File "d:\git\keras\keras\models.py", line 93, in __init__
    self.add(layer)

  File "d:\git\keras\keras\models.py", line 146, in add
    output_tensor = layer(self.outputs[0])

  File "d:\git\keras\keras\engine\topology.py", line 441, in __call__
    self.assert_input_compatibility(x)

  File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility
    str(K.ndim(x)))

Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2
Unprofessional answered 19/5, 2016 at 10:29 Comment(1)
C
2

In your model definition you placed a Dense layer before LSTM layer. You need to use TimeDistributed layer on Dense layer.

Try to change

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

to

model = Sequential([
    TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])
Caseate answered 30/5, 2016 at 9:46 Comment(0)
L
1

You are still missing one preprocessing step before feeding the data to the LSTM. You will have to decide how many previous data samples (previous days) you want to include in the calculation of the current day's AdjClose. See my answer here on how to do that. Your data should then be 3-dimensional of shape (nb_samples, nb_included_previous_days, features).

Then you can feed the 3D to a standard LSTM layer with one output. This value you can compare to y_train and try to minimize the error. Remember to pick a loss function that is appropriate for regression, e.g. mean squared error.

Lithopone answered 19/5, 2016 at 14:5 Comment(9)
So compared to the linked example, in my case the number of samples would be 1. If I only want to predict the output using the most recent 5 inputs/timesteps, the shape of my data should be (8900, 5, 6). Doesn't that mean a redundancy factor of almost 5 for storing the data?!Unprofessional
Yes, it does. In this example, the redundancy should not be a problem, since a (8900, 5, 6) dataset of floats only occupies around 1Mb of RAM. When working with larger datasets (especially ones that have more features), you might want to consider using a lookup table for the actual values and only reference them in the input of your LSTM. The Keras Embedding layer can help you to do this.Lithopone
Ok, if redundancy is not avoidable, so be it. I've used your code to preprocess X_train and Y_train now. Every sample of X_train is now a sequence of 4 timesteps and 5 features. And there are 4 Y-values (as example output - one for each timestep), accordingly. Please see the updated code. It does not work. Keras says "Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2"; Btw, data is freely available from Yahoo Finance real-chart.finance.yahoo.com/table.csv?s=AAPL&a=11&b=12&c=1980&d=04&e=23&f=2016&g=d&ignore=.csvUnprofessional
Each sample should consist of 4 timesteps of 5 features each as an input and only 1 output, which should be the AdjClose of the next day. After all, this is what you want to predict.Lithopone
Regarding your error, you have probably made a mistake during the reshaping of the input data. Check X_train.shape right before you call the fit() function to be sure it has the shape (nb_training_samples, nb_included_previous_days, features).Lithopone
Thanks, Lorrit, really saluting you for your patience. So if example inputs have the dimensions (8900, 5, 6), then the example outputs have the dimension (8900) or does it need to be something else like (8900,1,1)? I also checked the X_train.shape before fit() and it actually is (nb_training_samples, nb_included_previous_days, features). It is (4466L, 4L, 5L)Unprofessional
1) The output should be of shape (8900, 1). You are deleting some lines from X in your code. You should do the same with Y, else they will not be of the same length. 2) The error does not come from the dimensions of the input but from an extra comma in the line where you build you model: Activation('sigmoid') ,Lithopone
Thank you! I fixed 1) the output shape and 2) the comma, but its still the same error. You can dl the data from the link and try the script for yourself :-)Unprofessional
See @Caseate answer: The first two layers of your NN (Dense and Activation) expect a 2D input and give a 2D output. However, you have a 3D input and you want a 3D output to pass to the LSTM (hence the error message). So either omit the first two layers or replace them by a TimeDistributedDense layer (3D->3D) as sytrus suggests.Lithopone
A
0

Not sure if this is still relevant, but there is a great example of how to use LSTM networks for predicting time series on Dr. Jason Brownlees blog here

I prepared an example on three noisy phase shifted sinusoids with different amplitudes. Not market data, but I assume, you assume one stock would say something about another.

import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Reshape
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# generate sine wavepip
def make_sine_with_noise(_start, _stop, _step, _phase_shift, gain):
    x = numpy.arange(_start, _stop, step = _step)
    noise = numpy.random.uniform(-0.1, 0.1, size = len(x))
    y = gain*0.5*numpy.sin(x+_phase_shift)
    y = numpy.add(noise, y)
    return x, y
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1, look_ahead=1):
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back - look_ahead - 1):
        a = dataset[i:(i + look_back), :]
        dataX.append(a)
        b = dataset[(i + look_back):(i + look_back + look_ahead), :]
        dataY.append(b)
    return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# generate sine wave
x1, y1 = make_sine_with_noise(0, 200, 1/24, 0, 1)
x2, y2 = make_sine_with_noise(0, 200, 1/24, math.pi/4, 3)
x3, y3 = make_sine_with_noise(0, 200, 1/24, math.pi/2, 20)
# plt.plot(x1, y1)
# plt.plot(x2, y2)
# plt.plot(x3, y3)
# plt.show()
#transform to pandas dataframe
dataframe = pandas.DataFrame({'y1': y1, 'y2': y2, 'x3': y3})
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
#split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 10
look_ahead = 5
trainX, trainY = create_dataset(train, look_back, look_ahead)
testX, testY = create_dataset(test, look_back, look_ahead)
print(trainX.shape)
print(trainY.shape)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2]))
testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(look_ahead, input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True))
model.add(LSTM(look_ahead, input_shape=(look_ahead, trainX.shape[2])))
model.add(Dense(trainY.shape[1]*trainY.shape[2]))
model.add(Reshape((trainY.shape[1], trainY.shape[2])))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=1)
# make prediction
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

#save model
model.save('my_sin_prediction_model.h5')

trainPredictPlottable = trainPredict[::look_ahead]
trainPredictPlottable = [item for sublist in trainPredictPlottable for item in sublist]
trainPredictPlottable = scaler.inverse_transform(numpy.array(trainPredictPlottable))
# create single testPredict array concatenating every 'look_ahed' prediction array
testPredictPlottable = testPredict[::look_ahead]
testPredictPlottable = [item for sublist in testPredictPlottable for item in sublist]
testPredictPlottable = scaler.inverse_transform(numpy.array(testPredictPlottable))
# testPredictPlottable = testPredictPlottable[:-look_ahead]
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredictPlottable)+look_back, :] = trainPredictPlottable
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(dataset)-len(testPredictPlottable):len(dataset), :] = testPredictPlottable
# plot baseline and predictions
dataset = scaler.inverse_transform(dataset)
plt.plot(dataset, color='k')
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()
Adalia answered 29/8, 2017 at 9:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.