I have a VMSS with instance count say 3.Lets say I specified that if CPU utilization is <20%, then reduce instance by 3 to 1. Assume that these 3 instances were serving some request and let's say each request take 60 seconds to complete.
Assume at this moment CPU utilization reached 15%, so instance count should reduce by 2. So at this moment what will happen with the existing request which was serving by other two instances. Do these instances shift their ongoing process to other instance or it would not reduce the count until they complete the ongoing request?
I already have attached the scale set with Application Gateway and enabled the connection draining so that ongoing process should not drop. But it is dropping. As it fails I am trying to do something using API management Revision & Version.
Expectation: Once scale down/scale in happens, ongoing requests should not drop.