What is the difference between cuda.amp and model.half()?
Asked Answered
R

1

15

According to https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/

We can use:

   with torch.cuda.amp.autocast():
      loss = model(data)

In order to casts operations to mixed precision.

Another thing is that we can use model.half() to convert all the model weights to half precision.

  1. What is the difference between these 2 commands ?
  2. If I want to take advantage of FP16 (in order to create larger models and shorter training time), what do I need ? Do I need to use model.half() or using torch.cuda.amp (according the link above) ?
Roley answered 16/11, 2021 at 19:1 Comment(0)
B
16

If you convert the entire model to fp16, there is a chance that some of the activations functions and batchnorm layers will cause the fp16 weights to underflow, i.e., become zero. So it is always recommended to use autocast which internally converts the weights to fp32 in problematic layers.

model.half() in the end will save weight in fp16 where as autocast weights will be still in fp32. Training in fp16 will be faster than autocast but higher chance for instability if you are not careful. While using autocast you also need to scale up the gradient during the back propagation

If fp16 requirement is on the inference side, I recommend using autocast and then converting to fp16 using ONNX and tensorrt.

Balaam answered 3/6, 2022 at 4:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.