Sudden drop in accuracy while training a deep neural net
Asked Answered
A

5

18

I am using mxnet to train a 11-class image classifier. I am observing a weird behavior training accuracy was increasing slowly and went upto 39% and in next epoch it went down to 9% and then it stays close to 9% for rest of the training. I restarted the training with saved model (with 39% training accuracy) keeping all other parameter same . Now training accuracy is increasing again. What can be the reason here ? I am not able to understand it . And its getting difficult to train the model this way as it requires me to see training accuracy values constantly.

learning rate is constant at 0.01

Amphidiploid answered 5/5, 2016 at 7:10 Comment(3)
most likely your learning rate is too high and the model is jumping around. Hard to tell without knowing your hyperparametersAhmad
learning rate in 0.01Amphidiploid
Once I had similiar issue when, by accident, I set a linear activation and used categorical cross-entropy as cost function.Prandial
F
24

as you can see your late accuracy is near random one. there is 2 common issue in this kind of cases.

  • your learning rate is high. try to lower it
  • The error (or entropy) you are trying to use is giving you NaN value. if you are trying to use entropies with log functions you must use them precisely.
Ford answered 5/5, 2016 at 8:53 Comment(2)
In my case (for a different model) it was NaN, caused by large parameters. I fixed it by bounding the parameters by some reasonable value.Lepido
I confirm, in my case it was a learning rate way too high. Thanks!Ovariectomy
D
6

It is common during training of neural networks for accuracy to improve for a while and then get worse -- in general this is caused by over-fitting. It's also fairly common for the network to "get unlucky" and get knocked into a bad part of parameter space corresponding to a sudden decrease in accuracy -- sometimes it can recover from this quickly, but sometimes not.

In general, lowering your learning rate is a good approach to this kind of problem. Also, setting a learning rate schedule like FactorScheduler can help you achieve more stable convergence by lowering the learning rate every few epochs. In fact, this can sometimes cover up mistakes in picking an initial learning rate that is too high.

Domination answered 13/11, 2016 at 4:38 Comment(0)
S
0

I faced the same problem.And I solved it by use (y-a)^a loss function instead of the cross-entropy function(because of log(0)).I hope there is better solution for this problem.

Sigismundo answered 30/8, 2017 at 6:40 Comment(0)
P
0

These problems often come up. I observed that this may happen due to one of the following reasons:

  1. Something returning NaN
  2. The inputs of the network are not as expected - many modern frameworks do not raise errors in some of such cases
  3. The model layers get incompatible shapes at some point
Psychopathology answered 9/4, 2021 at 4:49 Comment(0)
D
-2

It happened probably because 0log0 returns NaN.

You might avoid it by;

cross_entropy = -tf.reduce_sum(labels*tf.log(tf.clip_by_value(logits,1e-10,1.0)))

Dibranchiate answered 11/5, 2016 at 21:59 Comment(1)
You shouldn't restrict the gradient by limiting the loss by clipping the logits, this actually creates a gradient of 0 in those intervals and the network gets stuck. You should directly clip the gradient instead.Poinsettia

© 2022 - 2024 — McMap. All rights reserved.