How exactly does Keras take dimension argumentsfor LSTM / time series problems?

I can't seem to find a concrete answer to the question of how to feed data into Keras. Most examples seem to work off image / text data and have clearly defined data points.

I'm trying to feed music into an LSTM neural network. I want the network to take ~3 seconds of music and nominate the next 2 seconds. I have my music prepared into .wav files and partitioned into 5 second intervals that I've decomposed into my X (first 3 seconds) and Y (last two seconds). I've sampled my music at 44,100 hz so my X is 132,300 observations 'long' and my Y is '88,200' observations long.

But I can't figure out exactly how to bridge Keras to my data structure. I'm using a Tensorflow backend.

In the interest of generalizing the problem and answer, I'll use A,B,C to denote dimensions. The only difference between this example data and my real data is that these are random values distributed from 0 to 1, and my data is an array of integers.

import numpy as np
#using variables to make it easy to generalize the answer

#a = the number of observations I have
a       = 411

#b = the duration of the sample, 44.1k observations per second of music
b_train = 132300
b_test  = 88200

#c = the number of channels in the music, this is 2 channel stereo
c       = 2

#now create sample data with the dimensionality given above:
X = np.random.rand(a,b_train,c)
y = np.random.rand(a,b_test ,c)

#split the data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.20, random_state=42)

However, I don't really know how to configure a model to understand that the 'first' (A) dimension contains observations and that I want to more or less break out the music (B) by channel (C).

I know that it'd probably be easier to convert this to mono (and a 2d problem) but I'm very curious to see whether or not this has a 'simple' solution - whether that mostly takes the shape of what I have below or whether I should think of the model in another way.

The primary question is this: how would I construct a model that would allow me to transform my X data into my Y data?

Ideally, an answer would show how to modify the model below to fit the data structure above.

import keras
import math, time
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
from keras.models import load_model

def build_model(layers):
    d = 0.3
    model = Sequential()

    model.add(LSTM(256, input_shape=(layers), return_sequences=True))
    model.add(Dropout(d))

    model.add(LSTM(256, input_shape=(layers), return_sequences=False))
    model.add(Dropout(d))

    model.add(Dense(32,kernel_initializer="uniform",activation='relu'))        
    model.add(Dense(1,kernel_initializer="uniform",activation='linear'))


    start = time.time()
    model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
    print("Compilation Time : ", time.time() - start)
    return model


#build model... 
model = build_model([328,132300,2])

model.fit(X_train,y_train,batch_size=512,epochs=30,validation_split=0.1,verbose=1)

However, this yields an error (at the model = ... step):

 ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=4

I can't figure out where Keras gets the expectation to see ndim=4 data. Also, I don't know to how to ensure that I feed data into the model such that the model 'understands' observations are distributed across the A-axis and the data itself is distributed on the B- and C-axis.

If anything is unclear, please leave a comment. I'll watch this diligently until Sept '17 or so and I'll be sure to update this question to reflect advice / comments left.

Thanks!

Recommended topics

Hot tags