Keras: What if the size of data is not divisible by batch_size?
Asked Answered
D

1

14

I am new to Keras and just started working on some examples. I am dealing with the following problem: I have 4032 samples and use about 650 of them as for the fit or basically the training state and then use the rest for testing the model. The problem is that I keep getting the following error:

Exception: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size.

I understand why I am getting this error, my question is, what if the size of my data is not divisible by batch_size? I used to work with Deeplearning4j LSTM and did not have to deal with this problem. Is there anyway to get around with this?

Thanks

Demitria answered 22/6, 2016 at 17:5 Comment(10)
As far as getting around it is concerned, change the batch size. If the number of samples is a prime number, drop 1 or 2 examples. Regarding, why this error occurs in Keras and not in Deeplearning4j, I am not sure.Issuant
Thanks for the suggestion but I was kind of hoping to get results without having to drop some samples.Demitria
You don't have to drop samples. 650 is not a prime number. If your total number of samples is a prime number, then it won't matter what batch size you choose, it will not be divisible. In your case, you can choose batch size 5, 10, 65, etc. Is that a real issue for you? In my experience, changing batch size in reasonable limits won't affect performance too much.Issuant
Sometime the input size may be a prime number where in that case I have to choose a different batch size.Demitria
Also, this is a requirement only in stateful networks in Keras. I worked with Keras extensively for implementing CNNs. I didn't have any such requirement then.Issuant
So does that mean if I switch to stateful=False then this would no longer be an issue? Btw, if stateful is False, is the model still an LSTM? I am using the network from one of the examples (stateful_lstm.py). Sorry if my questions are simple but I am a newbie :) ThanksDemitria
No. Don't make any changes in the network architecture. In my opinion, you are overthinking about this issue. If you have 650 training samples, make batch size 50, 65 etc. On the other hand, drop one or two samples to make it divisible by batch size (example, 743 samples, its prime, so no batch size will help, so drop one sample, make it 742, that is divisible). Neural network performance won't be affected by one or ten samples more or less. If you have a dataset where removing 10 samples means removing 10% of the data, maybe you should think of some other method than neural networks.Issuant
The thing is that I am dealing with 50 datasets each have a different size and basically I must use certain amount of samples as for test (due to some benchmark restrictions). For now, I'll stick to the batch size 64 and try to make number of samples divisible by that. Also, any useful reference so I can read more about stateful networks? Once again, thank you so much.Demitria
Curious: Did you stop using DL4J? If so, why?Branle
@tremstat No, I just wanted to compare the results of bothDemitria
H
5

The simplest solution is to use fit_generator instead of fit. I write a simple dataloader class that can be inherited to do more complex stuff. It would look something like this with get_next_batch_data redefined to whatever your data is including stuff like augmentation etc..

class BatchedLoader():
    def __init__(self):
        self.possible_indices = [0,1,2,...N] #(say N = 33)
        self.cur_it = 0
        self.cur_epoch = 0

    def get_batch_indices(self):
        batch_indices = self.possible_indices [cur_it : cur_it + batchsize]
        # If len(batch_indices) < batchsize, the you've reached the end
        # In that case, reset cur_it to 0 and increase cur_epoch and shuffle possible_indices if wanted
        # And add remaining K = batchsize - len(batch_indices) to batch_indices


    def get_next_batch_data(self):
        # batch_indices = self.get_batch_indices()
        # The data points corresponding to those indices will be your next batch data
Huntington answered 18/4, 2018 at 16:59 Comment(2)
does it mean that you repeat the training with some of the samples from the beginning of the data?Stabile
Yeah, either that or you can just have a smaller final batch. Vast majority of OPs are agnostic to first dimension (i.e., batch size) anywaysHuntington

© 2022 - 2024 — McMap. All rights reserved.