What is the best activation function to use for time series prediction

Asked 8/11, 2019 at 6:2 Answered 8/11, 2019 at 8:32

Solved python keras neural-network sequential activation-function

I am using the Sequential model from Keras, with the DENSE layer type. I wrote a function that recursively calculates predictions, but the predictions are way off. I am wondering what is the best activation function to use for my data. Currently I am using hard_sigmoid function. The output data values range from 5 to 25. The input data has the shape (6,1) and the output data is a single value. When I plot the predictions they never decrease. Thank you for the help!!

# create and fit Multilayer Perceptron model
model = Sequential();
model.add(Dense(20, input_dim=look_back, activation='hard_sigmoid'))
model.add(Dense(16, activation='hard_sigmoid'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=200, batch_size=2, verbose=0)

#function to predict using predicted values
numOfPredictions = 96;

for i in range(numOfPredictions):
    temp = [[origAndPredictions[i,0],origAndPredictions[i,1],origAndPredictions[i,2],origAndPredictions[i,3],origAndPredictions[i,4],origAndPredictions[i,5]]]
    temp = numpy.array(temp)
    temp1 = model.predict(temp)   
    predictions = numpy.append(predictions, temp1, axis=0)
    temp2 = []
    temp2 = [[origAndPredictions[i,1],origAndPredictions[i,2],origAndPredictions[i,3],origAndPredictions[i,4],origAndPredictions[i,5],predictions[i,0]]]
    temp2 = numpy.array(temp2)
    origAndPredictions = numpy.vstack((origAndPredictions, temp2))

update: I used this code to implement the swish.

from keras.backend import sigmoid
def swish1(x, beta = 1):
    return (x * sigmoid(beta * x))
def swish2(x, beta = 1):
    return (x * sigmoid(beta * x))
from keras.utils.generic_utils import get_custom_objects
from keras.layers import Activation
get_custom_objects().update({'swish': Activation(swish)})

model.add(Activation(custom_activation,name = "swish1"))

update: Using this code:

from keras.backend import sigmoid
from keras import backend as K
def swish1(x):
        return (K.sigmoid(x) * x)
def swish2(x):
        return (K.sigmoid(x) * x)

Thanks for all the help!!

Tried answered 8/11, 2019 at 6:2 Comment(1)

What makes you think there is a best activation function given some data? There isn't – Leventhal 8/11, 2019 at 7:45

Although there is no best activation function as such, I find Swish to work particularly well for Time-Series problems. AFAIK keras doesn't provide Swish builtin, you can use:

from keras.utils.generic_utils import get_custom_objects
from keras import backend as K
from keras.layers import Activation

def custom_activation(x, beta = 1):
        return (K.sigmoid(beta * x) * x)

get_custom_objects().update({'custom_activation': Activation(custom_activation)})

Then use it in model:

model.add(Activation(custom_activation,name = "Swish"))

Frugivorous answered 8/11, 2019 at 7:54 Comment(4)

Can't find the paper at the moment, at least for my usage Swish has consistently beaten every other Activation function for TimeSeries analysis. I think it ows to the fact it has properties of ReLU as well as continuous derivative at zero. So it tackles the 'Dying ReLU problem' better than PReLU, LReLU and others. – Frugivorous 8/11, 2019 at 8:3

Hi thanks so much for the help!! I ran the above code with the added line "from keras.utils.generic_utils import get_custom_objects". I am getting the error "NameError: name 'Activation' is not defined" – Tried 8/11, 2019 at 8:32

Use from keras.layers import Activation – Frugivorous 8/11, 2019 at 8:59

the Swish function can be accessed through Keras tf.keras.activations.swish – Dooryard 29/3, 2023 at 15:3

Your output data ranges from 5 to 25 and your output ReLU activation will give you values from 0 to inf. So what you try is to "parameterize" your outputs or normalize your labels. This means, using sigmoid as activation (outputs in (0,1)) and transform your labels by subtracting 5 and dividing by 20, so they will be in (almost) the same interval as your outputs, [0,1]. Or you can use sigmoid and multiply your outputs by 20 and add 5 before calculating the loss.

Would be interesting to see the results.

Joann answered 8/11, 2019 at 8:32 Comment(0)

Recommended topics

Hot tags