Higher validation accuracy, than training accurracy using Tensorflow and Keras [closed]

Asked 15/5, 2017 at 12:22 Answered 14/4, 2023 at 7:31

Solved tensorflow machine-learning neural-network keras classification

I'm trying to use deep learning to predict income from 15 self reported attributes from a dating site.

We're getting rather odd results, where our validation data is getting better accuracy and lower loss, than our training data. And this is consistent across different sizes of hidden layers. This is our model:

for hl1 in [250, 200, 150, 100, 75, 50, 25, 15, 10, 7]:
    def baseline_model():
        model = Sequential()
        model.add(Dense(hl1, input_dim=299, kernel_initializer='normal', activation='relu', kernel_regularizer=regularizers.l1_l2(0.001)))
        model.add(Dropout(0.5, seed=seed))
        model.add(Dense(3, kernel_initializer='normal', activation='sigmoid'))

        model.compile(loss='categorical_crossentropy', optimizer='adamax', metrics=['accuracy'])
        return model

    history_logs = LossHistory()
    model = baseline_model()
    history = model.fit(X, Y, validation_split=0.3, shuffle=False, epochs=50, batch_size=10, verbose=2, callbacks=[history_logs])

And this is an example of the accuracy and losses: Accuracy with hidden layer of 250 neurons and the loss .

We've tried to remove regularization and dropout, which, as expected, ended in overfitting (training acc: ~85%). We've even tried to decrease the learning rate drastically, with similiar results.

Has anyone seen similar results?

Allianora answered 15/5, 2017 at 12:22 Comment(1)

I have encountered the same problem multiple times now. stats.stackexchange.com/questions/372146/… .. any help is appreciated – Piero 16/10, 2018 at 21:40

162

This happens when you use Dropout, since the behaviour when training and testing are different.

When training, a percentage of the features are set to zero (50% in your case since you are using Dropout(0.5)). When testing, all features are used (and are scaled appropriately). So the model at test time is more robust - and can lead to higher testing accuracies.

Phototherapy answered 15/5, 2017 at 14:56 Comment(7)

So you saying that if val_acc being a bit higher than trn_acc is ok ? – Concepcion 24/8, 2017 at 6:19

Good explanation for testing error being inferior to training error! It's now in the FAQ of Keras keras.io/getting-started/faq/…, but the original question was about validation accuracy being higher than training accuracy or validation error being inferior to training error. – Goddaughter 11/3, 2018 at 19:44

@Phototherapy I also observe when I build my model. But I am wondering is this **guaranteed ** to happen when using dropout? Is there any theoretical rationale behind this? – Ballarat 31/7, 2019 at 23:44

@ClaudeCOULOMBE Which of the FAQs? – Vierno 13/7, 2021 at 10:56

@Vierno Small change to the Keras FAQ URL (underscore is replacing hyphen): keras.io/getting_started/faq/… – Goddaughter 14/7, 2021 at 8:0

@ClaudeCOULOMBE But doesn't that FAQ address testing error < training error rather than (OP's) testing error > training error..? – Vierno 14/7, 2021 at 8:40

@Vierno - my understanding is that the question was validation or testing accuracy > training accuracy. In other words, if we take error or loss, training error > testing error and the FAQ is exactly about training error > testing error (which seems strange, since usually training error < testing error, hence the explanation). – Goddaughter 17/7, 2021 at 8:8

You can check the Keras FAQ and especially the section "Why is the training loss much higher than the testing loss?".

I would also suggest you to take some time and read this very good article regarding some "sanity checks" you should always take into consideration when building a NN.

In addition, whenever possible, check if your results make sense. For example, in case of a n-class classification with categorical cross entropy the loss on the first epoch should be -ln(1/n).

Apart your specific case, I believe that apart from the Dropout the dataset split may sometimes result in this situation. Especially if the dataset split is not random (in case where temporal or spatial patterns exist) the validation set may be fundamentally different, i.e less noise or less variance, from the train and thus easier to to predict leading to higher accuracy on the validation set than on training.

Moreover, if the validation set is very small compared to the training then by random the model fits better the validation set than the training.]

Barrier answered 24/8, 2017 at 8:25 Comment(0)

This indicates the presence of high bias in your dataset. It is underfitting. The solutions to issue are:-

Probably the network is struggling to fit the training data. Hence, try a little bit bigger network.
Try a different Deep Neural Network. I mean to say change the architecture a bit.
Train for longer time.
Try using advanced optimization algorithms.

Concepcion answered 24/8, 2017 at 6:25 Comment(0)

This actually a pretty often situation. When there is not so much variance in your dataset you could have the behaviour like this. Here you could find an explaination why this might happen.

Tristich answered 16/5, 2017 at 21:57 Comment(0)

There are a number of reasons this can happen.You do not shown any information on the size of the data for training, validation and test. If the validation set is to small it does not adequately represent the probability distribution of the data. If your training set is small there is not enough data to adequately train the model. Also your model is very basic and may not be adequate to cover the complexity of the data. A drop out of 50% is high for such a limited model. Try using an established model like MobileNet version 1. It will be more than adequate for even very complex data relationships. Once that works then you can be confident in the data and build your own model if you wish. Fact is validation loss and accuracy do not have real meaning until your training accuracy gets reasonably high say 85%.

Filip answered 7/2, 2020 at 4:54 Comment(0)

I solved this by simply increasing the number of epochs

Dairyman answered 26/5, 2021 at 17:34 Comment(0)

I don't think that it is a drop out layer problem.

I think that it is more related to the number of images in your dataset.

The point here is that you are working on a large training Set and a too small validation/test set so that this latter is way too easy to computed.

Try data augmentation and other technique to get your dataset bigger!

Roundtheclock answered 21/9, 2022 at 10:44 Comment(0)

Adding dropout to your model gives it more generalization, but it doesn't have to be the cause. It could be because your data is unbalanced (has bias) and that's what I think..

Stain answered 6/9, 2021 at 2:28 Comment(1)

Please add further details to expand on your answer, such as working code or documentation citations. – Weixel 6/9, 2021 at 2:35

I agree with @Anas answer, the situation might be solved after you increase the epoch times. Everything is ok, but sometimes, it is just a coincidence that the initialized model exhibits a better performance in the validation/test dataset compared to the training dataset.

Trichomoniasis answered 5/1, 2023 at 19:0 Comment(0)

Base on my own observation, first dataset ratio is one of the reason that makes evaluation accuracy higher than training accuracy. For instance, in your case, validation_split was set to 0.3 (30% of whole dataset). If your dataset is not big enough, this setting could bring that results. Second, I agreed with @yhenon, high value of dropout is also a reason when you have small training dataset.

My point of view, let try to set validation_split = 0.2 (20% of whole dataset) and smaller down the Dropout value, the result should be changed.

Kittrell answered 14/4, 2023 at 7:31 Comment(0)

Recommended topics

Hot tags