I am testing LSTM networks on Keras and I am getting much faster training on CPU (5 seconds/epoch on i2600k 16GB) than on GPU (35secs on Nvidia 1060 6GB). GPU utilisation runs at around 15%, and I never see it over 30% when trying other LSTM networks including the Keras examples. When I run other types of networks MLP and CNN the GPU is much faster. I am using the latest theano 0.9.0dev4 and keras 1.2.0
The sequence has 50,000 timesteps with 3 inputs (ints).
If the inputs are descending (3,2,1) the output is 0, and 1 if ascending, except if the last two were also ascending, then the output is 0 instead of 1.
After 250 epochs I get 99.97% accuracy, but why is the GPU so much slower? am I doing something wrong in the model? I tried various batch settings and still had the same issue.
def generate_data():
X=[]
Y=[]
for i in range(50000):
start=random.randint(1,100)
d=random.randrange(-1,2,2) #-1 or 1
param=[(start),(start+d),(start+d+d)]
X.append(np.array(param))
if d<0:
Y.append([1,0])
elif len(Y)>2 and d>0 and Y[-1][1]==1 and Y[-2][1]==1:
Y.append([1,0])
elif d>0:
Y.append([0,1])
X=np.array(X)
Y=np.array(Y)
return X,Y
X,Y = generate_data()
X=np.asarray(X,'float32')
Y=np.asarray(Y,'float32')
X=np.reshape(X,(1,len(X),3))
Y=np.reshape(Y,(1,len(Y),2))
model=Sequential()
model.add(LSTM(20, input_shape=(50000,3), return_sequences=True))
model.add(Dense(2))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])
history = model.fit(X, Y,batch_size=100, nb_epoch=250, verbose=2)
Any thoughts? Thank you!