Keras - Input a 3 channel image into LSTM

Asked 6/12, 2017 at 10:12 Answered 12/3, 2021 at 10:51

Solved python keras lstm recurrent-neural-network

I have read a sequence of images into a numpy array with shape (7338, 225, 1024, 3) where 7338 is the sample size, 225 are the time steps and 1024 (32x32) are flattened image pixels, in 3 channels (RGB).

I have a sequential model with an LSTM layer:

model = Sequential()
model.add(LSTM(128, input_shape=(225, 1024, 3))

But this results in the error:

Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

The documentation mentions that the input tensor for LSTM layer should be a 3D tensor with shape (batch_size, timesteps, input_dim), but in my case my input_dim is 2D.

What is the suggested way to input a 3 channel image into an LSTM layer in Keras?

Sonni answered 6/12, 2017 at 10:12 Comment(2)

have you tried giving input_shape=X_train.shape[1:] . Assuming that X_train is your input array – Dorking 6/12, 2017 at 10:44

Yes, I have. X_train.shape[1:] gives me (225, 1024, 3) which is what was hard-coded as the input_shape param – Sonni 6/12, 2017 at 10:53

If you want the number of images to be a sequence (like a movie with frames), you need to put pixels AND channels as features:

input_shape = (225,3072)  #a 3D input where the batch size 7338 wasn't informed

If you want more processing before throwing 3072 features into an LSTM, you can combine or interleave 2D convolutions and LSTMs for a more refined model (not necessarily better, though, each application has its particular behavior).

You can also try to use the new ConvLSTM2D, which will take the five dimensional input:

input_shape=(225,32,32,3) #a 5D input where the batch size 7338 wasn't informed

I'd probably create a convolutional net with several TimeDistributed(Conv2D(...)) and TimeDistributed(MaxPooling2D(...)) before adding a TimeDistributed(Flatten()) and finally the LSTM(). This will very probably improve both your image understanding and the performance of the LSTM.

Nordstrom answered 6/12, 2017 at 11:30 Comment(3)

I thought of reshaping my data from (1024, 3) to 3072, but I already had the data in batch size of 7338, and reshaping was taking a lot of time. And the LSTM is part of an auto encoder, so wasn't sure if this reshaping would help my cause. Will try reshaping first, then with ConvLSTM2D and TimeDistributed layers. Thanks for your answer. – Sonni 6/12, 2017 at 12:44

Reshaping taking time??? That doesn't sound ok.... the LSTM would be very very slow, though.... – Summon 6/12, 2017 at 12:54

Yes, I think that's cause I'll be reshaping 1651050 (7738*225) instances. So, instead of doing it all together, I resorted to Keras model method of fit_generator(), where I create a generator method to reshape the data set, while training. – Sonni 8/12, 2017 at 15:23

There is now a guide how to create RNNs with nested structures in the keras guide which enable arbitrary input types for each timestep: https://www.tensorflow.org/guide/keras/rnn#rnns_with_listdict_inputs_or_nested_inputs

Agustin answered 12/3, 2021 at 10:51 Comment(0)

Recommended topics

Hot tags