The following content comes from Keras tutorial
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.
Why we should freeze the layer when fine-tuning a convolutional neural network? Is it because some mechanisms in tensorflow keras or because of the algorithm of batch normalization? I run an experiment myself and I found that if trainable is not set to false the model tends to catastrophic forgetting what has been learned before and returns very large loss at first few epochs. What's the reason for that?