Simple MLP time series training yields unexpeced mean line results

I'm trying to play around with simple time series predictions. Given number of inputs (1Min ticks) Net should attempt to predict next one. I've trained 3 nets with different settings to illustrate my problem:

On the right you can see 3 trainer MLP's - randomly named and color coded, with some training stats. On the left - plot of predictions made by those nets and actual validation data in white. This plot was made by going through each tick of validation data (white), feeding 30|4|60 (Nancy|Kathy|Wayne) previous ticks to net and plotting what it will predict on place of current tick.

Multilayer perceptron's settings (Nancy|Kathy|Wayne settings):

Geometry: 2x30|4|60 input nodes -> 30|4|60 hidden layer nodes -> 2 outputs
Number of epochs: 10|5|10
Learning rate: 0.01
Momentum: 0.5|0.9|0.5
Nonlinearity: Rectify
Loss: Squared Error

It seems that with more training applied - predictions are converging in to some kind of mean line, which is not what I was expecting at all. I was expecting predictions to stand somewhat close to validation data with some margin of error.
Am I picking wrong model, misunderstanding some core concepts of machine learning or doing something wrong in lasagne/theano?

Quick links to most relevant (in my opinion) code parts:

And here's full, more or less, sources:

Data used for training in format - date;open;high;low;close;volume - only date, high and low are used
MLP module
Gui module's relevant MLP interaction parts

First of all, I want to commend you for usage non linear rectifying. According to what Geoffrey Hinton inventor of Boltzmann machine believe, non linear rectifier is a best feet for activities of human brain.

But for other parts you've chosen I propose you to change NN architecture. For predictions of stock market you should use some recurrent NN: easiest candidates could be Elman or Jordan networks. Or you can try more complicated, like LSTM network.

Another part of advice, I propose to modify what you feed in NN. In general, I recommend you to apply scaling and normalization. For example don't feed in NN raw price. Modify it in one of the following ways ( those proposals are not written in stone ): 1. feed in NN percentages of changes of price. 2. If you feed in NN 30 values, and want to predict two values, then subtract from 30 + 2 values minimums of all 32 values, and try to predict 2 values, but basing on 30. Then just add to result the minimum of 32 values.

Don't feed just dates in the NN. It says to NN nothing about making prediction. Instead feed in NN date and time as categorical value. Categorical means that you transform datetime in more then one entry. For example instead of giving to NN 2016/09/10 you can consider some of the following.

year of trading most probably will not give any useful information. So you can omit year of trading.
09 stands for number of month or about September. You have choice either feed in NN number of month, but I strongly recommend you make 12 inputs in NN, and in case of January give at first NN input 1, and zeros for other eleven. In this way you'll train your network to separate trading period in January from trading period in June or December. Also I propose to do categorical input of day of week in the same way. Because trading in Monday differs from trading on Friday, especially in the day of NFP.
For hours I propose to use encoding by periods of 6 - 8 hours. It will help you to train network to take into account different trading sessions: Asia, Frankfurt, London, New-York.
If you decide to feed in NN some indicators then for some indicators consider thermometer encoding. As usually thermometer encoding is needed for indicators like ADX.

According to your question in comments about how to use minimum I'll give you simplified example. Let's say you want to use for training NN following close prices for eur/usd:
1.1122, 1.1132, 1.1152, 1.1156, 1.1166, 1.1173, 1.1153, 1.1150, 1.1152, 1.1159. Instead of windows size for learning 30 I'll demonstrate learning with window size 3 ( just for simplicity sake ) and prediction window size 2.
In total data used for prediction equals to 3. Output will be 2. For learning we will use first 5 values, or:
1.1122, 1.1132, 1.1152, 1.1156, 1.1166
then another 5 values or:
1.1132, 1.1152, 1.1156, 1.1166, 1.1173
In the first window minimal value is: 1.1122.
Then you subtract 1.1122 from each value:
0, 0.002, 0.003, 0.0033, 0.0034. As input you feed in NN 0, 0.002, 0.003. As output from NN you expect 0.0033, 0.0034. If you want to make it learn much faster, feed in NN normalized and scaled values. Then each time you'll need to make de-normalization and de-scaling of inputs.

Another way, feed in NN percentage of changes of price. Let me know if you need sample for it.

And one more important piece of advice. Don't use just NN for making trading. Never!!! Better way to do it is invent some system with some percentage of success. For example 30%. Then use NN in order to increase success percentage of success to 60%.

I also want to provide for you also example of thermometer encoding for some indicators. Consider ADX indicator and following examples:

a.>10 >20 >30 >40
1 0 0 0
b. >10 >20 >30 >40
1 1 0 0
example a provides input in NN with ADX greater then 10. Example b provides input in NN with ADX greater then 20.
You can modify thermometer encoding for providing inputs for stochastic. As usually stochastic has meaning in ranges 0 - 20, and 80 - 100 and in seldom cases in range 20 - 80. But as always you can try and see.

Recommended topics

Hot tags