This is certainly late reply for this post, but hopefully help who stumble upon this post.
Here's an article I found online Image Data Pre-Processing for Neural Networks, I though this certainly was a good in article into how the network should be trained.
Main gist of the article says
1) As data(Images) few into the NN should be scaled according the image size that the NN is designed to take, usually a square i.e 100x100,250x250
2) Consider the MEAN(Left Image) and STANDARD DEVIATION(Right Image) value of all the input images in your collection of a particular set of images
3) Normalizing image inputs done by subtracting the mean from each pixel and then dividing the result by the standard deviation, which makes convergence faster while training the network. This would resemble a Gaussian curve centred at zero
4)Dimensionality reduction RGB to Grayscale image, neural network performance is allowed to be invariant to that dimension, or to make the training problem more tractable