Prediction using Recurrent Neural Network on Time series dataset
Asked Answered
C

1

12

Description

Given a dataset that has 10 sequences - a sequence corresponds to a day of stock value recordings - where each constitutes 50 sample recordings of stock values that are separated by 5 minute intervals starting from the morning or 9:05 am. However, there is one extra recording (the 51th sample) that is only available in the training set which is 2 hours later, not 5 minutes, than the last recorded sample in the 50 sample recordings. That 51th sample is required to be predicted for the testing set where the first 50 samples are also given.

I am using the pybrain recurrent neural network for this problem that groups sequences together, and the label (or commonly known as the target y) of each sample x_i is the sample of the next time step x_(i+1) - a typical formulation in time series prediction.

Example

A sequence for one day is something like:

    Signal id    Time      value
        1     -  9:05   -   23
        2     -  9:10   -   31
        3     -  9:15   -   24
       ...    -  ...    -   ...
       50     -  13:15  -   15

Below is the 2 hour later label 'target' given for the training set 
and is required to be predicted for the testing set
       51     -  15:15   -   11

Question

Now that my recurrent neural network (RNN) has trained on these 10 sequences, if it confronts another sequence, how would I use the RNN to predict the stock values 2 hours after the last sample in the sequence ?

Please note that I also have "2 hours later than the last sample stock values" for each of the training sequences but I am not sure how to incorporate that in training the RNN since it expects identical time intervals between samples. Thanks!

Callus answered 7/9, 2013 at 6:40 Comment(3)
I didn't quite understand your explanation. Does all your training set values contain 50 input signals, and it is the same time delta between all the samples? Is your question actually: How do you predict the next sequence output ?Mead
Yes the first 50 input signals have the same time delta - 5 minute difference, however, for each sequence there are in fact 51 input signals, the last signal has a much larger delta - 2 hours difference - than the rest, and I'm required to predict that last signal given the first 50 signals. So the questions are how to train the RNN with a signal that is of different delta than others and how to predict that signal given the first 50 signals.Callus
@jorgenkg, I have updated the question to clarify, sorry for the confusionCallus
M
23

I hope I this will help you out

The recurrent network structure

enter image description here


A few tips

Choosing your recurrent network

The more mature Long Short Time Memory (LSTM) neural network is a great fit for this kind of task. LSTM is able to detect common "shapes" and "variations" in the stock value "graph", and there is A LOT of research which tries to prove that such shapes actually occur in real life! See this link for an example.

Accuracy

If you want the network to achieve higher accuracy, I would recommend you to also feed the network the stock values from the previous year (at the exact same date), so that the number of inputs doubles from 50 to 100. Though the network might be well optimised on your dataset, it will never be able to predict the unpredictable behaviour of the future ;)

Mead answered 9/9, 2013 at 6:24 Comment(5)
Thanks a tonne, wonderful description! One small concern though, since only around the first half of the samples and the last sample (the 50th) have a two hour later target sample, is it okay to combine them together in a sequence in LSTM? because the first half of the samples are 5 minutes apart but the last sample (the 50th sample) is far from that first half, what do you think? The 51th sample is the 2 hour later value for the 50th sample.Callus
That shouldn't be a problem in theory. Just make sure you have enough data samples! It should probably contain at least 20.000 entries (split into 2/3 training set and 1/3 validation set) to ensure a more accurate and robust network! Best wishes!Mead
I am also curious is the alternative way to LSTM is including lagged variables?Oshaughnessy
Did you create the image yourself? If not, could you please provide a source? This is visually one of the best images avoiding unnecessary complexity.Jari
Yes, I made it with Omnigraffle but that file is unfortunately deletedMead

© 2022 - 2024 — McMap. All rights reserved.