How exactly does Keras take dimension argumentsfor LSTM / time series problems?
Asked Answered
W

1

6

I can't seem to find a concrete answer to the question of how to feed data into Keras. Most examples seem to work off image / text data and have clearly defined data points.

I'm trying to feed music into an LSTM neural network. I want the network to take ~3 seconds of music and nominate the next 2 seconds. I have my music prepared into .wav files and partitioned into 5 second intervals that I've decomposed into my X (first 3 seconds) and Y (last two seconds). I've sampled my music at 44,100 hz so my X is 132,300 observations 'long' and my Y is '88,200' observations long.

But I can't figure out exactly how to bridge Keras to my data structure. I'm using a Tensorflow backend.

In the interest of generalizing the problem and answer, I'll use A,B,C to denote dimensions. The only difference between this example data and my real data is that these are random values distributed from 0 to 1, and my data is an array of integers.

import numpy as np
#using variables to make it easy to generalize the answer

#a = the number of observations I have
a       = 411

#b = the duration of the sample, 44.1k observations per second of music
b_train = 132300
b_test  = 88200

#c = the number of channels in the music, this is 2 channel stereo
c       = 2

#now create sample data with the dimensionality given above:
X = np.random.rand(a,b_train,c)
y = np.random.rand(a,b_test ,c)

#split the data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.20, random_state=42)    

However, I don't really know how to configure a model to understand that the 'first' (A) dimension contains observations and that I want to more or less break out the music (B) by channel (C).

I know that it'd probably be easier to convert this to mono (and a 2d problem) but I'm very curious to see whether or not this has a 'simple' solution - whether that mostly takes the shape of what I have below or whether I should think of the model in another way.

The primary question is this: how would I construct a model that would allow me to transform my X data into my Y data?

Ideally, an answer would show how to modify the model below to fit the data structure above.

import keras
import math, time
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
from keras.models import load_model

def build_model(layers):
    d = 0.3
    model = Sequential()

    model.add(LSTM(256, input_shape=(layers), return_sequences=True))
    model.add(Dropout(d))

    model.add(LSTM(256, input_shape=(layers), return_sequences=False))
    model.add(Dropout(d))

    model.add(Dense(32,kernel_initializer="uniform",activation='relu'))        
    model.add(Dense(1,kernel_initializer="uniform",activation='linear'))


    start = time.time()
    model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
    print("Compilation Time : ", time.time() - start)
    return model


#build model... 
model = build_model([328,132300,2])

model.fit(X_train,y_train,batch_size=512,epochs=30,validation_split=0.1,verbose=1)

However, this yields an error (at the model = ... step):

 ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=4

I can't figure out where Keras gets the expectation to see ndim=4 data. Also, I don't know to how to ensure that I feed data into the model such that the model 'understands' observations are distributed across the A-axis and the data itself is distributed on the B- and C-axis.

If anything is unclear, please leave a comment. I'll watch this diligently until Sept '17 or so and I'll be sure to update this question to reflect advice / comments left.

Thanks!

Williswillison answered 15/8, 2017 at 5:36 Comment(7)
If I understand correctly,132300 is it the total number of observations for X, and since you devide X into triples, then the total number of data points is 132300/3=44100 right ? What is a then? You write that a is the number of observations, but don't you have 44100 observations and not 411?Smackdab
Could you also clarify if your question is theoretical or technical (or both?)? Does the above code throw errors and you are asking for help on that front? Or are you asking theoretically how music should be represented as a matrix / fed into a network?Morez
I believe you would want to output 2 instead of 1. Considering you see the first three seconds and then predicted the next two seconds. However, For the problem, I would suggest trying to just predict the next second instead of bothJoeljoela
Nicole - great question: I clarified it above. The code doesn't work, I'd like to figure out how to modify the model to work.Williswillison
Miriam - I'm not dividing X into 'triples', the X data is comprised of 3 seconds of observations. Each second of music produces 44.1k datapoints per channel.Williswillison
Can you include the errors in the question?Morez
The error is coming from the input for build_model because OP is passing in a 4d shapeJoeljoela
V
1

Keras convention is that the batch dimension is typically omitted in the input_shape arguments. From the guide:

Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.

So changing model = build_model([132300,2]) should solve the problem.

Vector answered 16/8, 2017 at 20:31 Comment(2)
That certainly addresses the error that was given. However, the next line errors out. The "model.fit" statement throws this error: "Error when checking target: expected dense_2 to have 2 dimensions, but got array with shape (328, 88200, 2)"Williswillison
The output of the model should match the shapes of y_train. Use model.summary() to print all the layer shapes and change the model accordingly. I can see after 2nd LSTM you are not returning a sequence.Vector

© 2022 - 2024 — McMap. All rights reserved.