Keras loads data onto the GPU batch-by-batch (noted by the author here).
For small datasets, this is very inefficient. Is there a way to modify Keras or call Theano functions directly (after defining the model in Keras) to allow all batches to be moved to the GPU up front, and training done using the batches already in GPU memory?
(Someone asked the same question on the Keras list a few weeks ago, but has no replies so far.)