In an attempt to further explore the keras-tf RNN capabilities and different parameters, i decided to solve a toy problem as described -
- build a source data set composed of a sequence of random numbers
- build a "labels" data set comprised of the EWMA formula performed on the source dataset.
The idea behind it is that EWMA has a very clear and simple definition of how it uses the "history" of the sequence -
EWMAt = (1-alpha)*averaget-1 + alpha*xt
My assumption is, that when looking at a simple RNN cell with a single neuron for current input and a single one for the previous state, the (1-alpha) part of the equation can directly be the weight of the previous hidden state, and the alpha part can be the weight of current input, once the network is fully trained.
so for example for alpha = 0.2, i expect the weights of the network once trained to be:
Waa = [0.8] (weight parameter for previous state)
Wxa = [0.2] (weight parameter for current input)
i simulated the data set and labels in a pretty much straight forward way using numpy.
currently i have implemented my own simple rnn with back propagation. i used MSE for loss, and SGD, and it converges to the said parameters pretty fast. it works on a single input at a time.
iv'e tried different network configurations using keras and tensorflow, but none seem to hit the nail on the head. i am wondering what is your best suggested way to replicate the behavior of the toy RNN.
here is my toy neural network -
import numpy as np
np.random.seed(1337) # for reproducibility
def run_avg(signal, alpha=0.2):
avg_signal = []
avg = np.mean(signal)
for i, sample in enumerate(signal):
if np.isnan(sample) or sample == 0:
sample = avg
avg = (1 - alpha) * avg + alpha * sample
avg_signal.append(avg)
return np.array(avg_signal)
X = np.random.rand(10000)
Y = run_avg(X)
def train(X,Y):
W_a = np.random.rand()
W_x = np.random.rand()
b = np.random.rand()
a = np.random.rand()
lr = 0.001
for i in range(100):
for x,y in zip(X,Y):
y_hat = W_x * x + W_a * a + b
L = (y-y_hat)**2
dL_dW_a = (y - y_hat) * a
dL_dW_x = (y - y_hat) * x
dL_db = (y - y_hat) * 1
W_a = W_a + dL_dW_a*lr
W_x = W_x + dL_dW_x*lr
b = b + dL_db*lr
a = y_hat
print("epoch " ,str(i), " LOSS = ", L, " W_a = ", W_a, " W_x = ", W_x , " b = " ,b)
train(X,Y)
a few remarks on the implementation, compared to keras-tf simpleRNN -
- the "timesteps" of this network is 1 and "batch size" is also 1.
- this network is probably similar to what tensorflow suggests with the "stateful" parameter. due to the fact that the last state prediction is being used in the current step ( "a = y_hat" in the loop ).
- i think it is safe to say this is a "one-to-one" kind of training, in terms of input used per label.
There is of course a lot to be added on the nature of the EWMA algorithm, given the fact that it holds information on the entire history of the sequence, and not just the window, but to keep things shorter and to conclude, how would you go about predicting EWMA with a simple RNN or any neural network for that matter?
how can i replicate the behavior of the toy neural network in keras?
update: it seems as if the main problem preventing me from solving this is due to using "native" keras (import keras) and not the tensorflow implementation (from tensorflow import keras). posted a more specific question about it here.