Batch normalization destroys validation performances

Asked 27/11, 2019 at 18:6 Answered 13/10, 2020 at 1:7

Solved tensorflow keras conv-neural-network batch-normalization

I'm adding some batch normalization to my model in order to improve the training time, following some tutorials. This is my model:

model = Sequential()

model.add(Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=(64,64,3)))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Flatten())

model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))

#NB: adding more parameters increases the probability of overfitting!! Try to cut instead of adding neurons!! 
model.add(Dense(units=512, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(units=20, activation='softmax'))

Without batch normalization, i get around 50% accuracy on my data. Adding batch normalization destroys my performance, with a validation accuracy reduced to 10%.

Why is this happening?

Sophronia answered 27/11, 2019 at 18:6 Comment(0)

I'm not sure if this is what you are asking, but batch normalization is still active during validation, it's just that the parameters are defined and set during training and not altered during validation.

As for why batch normalization is not good for your model/problem in general, it's like any hyper parameter, some work well with some scenarios, not well with others. Do you know if this is the best placement for BN within your network? Other than that would need to know more about your data and problem to give any further guesses.

Underlinen answered 27/11, 2019 at 18:10 Comment(4)

I followed a tutorial on Medium where the user placed a batch normalization layer after each conv layer, claiming that this would improve the performance/training time. I didn't expect at all such a drop in my validation performances – Sophronia 27/11, 2019 at 18:20

Anyway, i'm trying to classify 20 class given a dataset of about 1500 images (so 75 images per class more or less, the distribution is not uniform, some classes may have some extra image) – Sophronia 27/11, 2019 at 18:20

BN is usually used to help with overfitting. From my experience it's very sensitive to network architecture (i use it before activation, not sure where it would fit in with a pooling layer). Any particular reason why you wanted to use BN in your network? – Underlinen 27/11, 2019 at 18:29

I had no particular reason. I'm trying to improve my accuracy (i'm stuck at 50%-54% more or less) and i was trying many methods in order to get better results. 75 images for class are actually not enough for a proper training, but i was trying to achieve the best i could with my own network (before moving to transfer learning) – Sophronia 27/11, 2019 at 18:36

Try using lesser number of batch normalization layers. And it is a general practice to use it at the last convolution layer. Start with just one of them and add more if it improves the validation accuracy.

Morpheme answered 13/10, 2020 at 1:7 Comment(0)

Recommended topics

Hot tags