I have trained the Cifar100 dataset using ResNet18 backbone with the proposed technique for the research propose, and I ends up getting some surprising results. I have gone for the two attempts first with the 640 batch size and second one with 320 batch size. The rest all hyperparameters were kept similar.
The accuracy I got for the 640 batch size is: 76.45% The accuracy I got for the 320 batch size is: 78.64%
Can you tell me why this is happening?
According to me, this is just because of covariate shift. The distribution for the each iteration to complete the complete samples can affect on the accuracy. I think, the distribution for the 320 batch size is similar to each other as compare to the 640 batch size, and which leads to the higher accuracy.
Can you explain, and what can be the solution for it?