How to prevent scale down of newly scaled up pod for specific period of time which was created by HPA in Kubernetes?
Asked Answered
E

1

7

I have a Kubernetes cluster set up in DigitalOcean. The cluster is configured to auto-scale using HPA(Horizontal Pod Autoscaler). I want to prevent termination of a pod that got scaled up in the last 1 hour to avoid thrashing and saving the bill. Following are the two reasons for the same:

  1. Due to unpredictable traffic, sometimes new pods scale up and down multiple times in an hour. Because of the nature of the application, 50-60 new users need a new pod to handle the traffic.
  2. DigitalOcean droplets are charged per hour. Even if the droplet was up for 15 minutes, They would charge it for an hour. So, sometimes we are paying for 5 droplets in an hour which could have been paid for just 1 droplet.

From the documentation, I could not find anything related to this. Any hack for the same would be helpful.

Evalyn answered 11/6, 2020 at 15:37 Comment(0)
S
2

Yes we can do this. I currently doing this experimentation almost related to your question.

Try to find Following things while autoscaling.

  1. Time taken for HPA to calculate Replica needed
  2. Time taken for pod to Spin up.
  3. Time taken to Droplet spin up.
  4. Time taken for pods spin down.
  5. Time taken to Droplet Spin down.

Case 1: Time taken for HPA to calculate Replica needed (HPA)

HPA detect the changes, As soon as get metrics immediately or atleast within 15 secs. Depends on horizontal-pod-autoscaler-sync-period By default it is set to 15 secs. As soon HPA get Metric, it calculates Replica Needed.

Case 2: Time taken for pod to Spin up. (HPA)

As soon as HPA calculate Desired Replicas, Pods start spin up. But it depends on ScaleUp Policy. You can set this as per your use case.And also depend on Droplet available, cluster autoscaler

For Example: You can tell HPA, Hey, please spin up 4 pods in 15 secs OR Spin up 100 % of current available pods in 20 secs.

Now HPA, will take decision to select anyone policy, which make more impact(Most changes in replica count). If 100% pods > 4 pods ,Second policy takeover, otherwise first Policy can take over. Process repeats until reach the desried replica.

If you need scaled up Pod count immediately, you set policy as spin up 100 % pods in 1 secs, hence it try to spin up 100 % of current replica count in every secs until match the Desired Replica count.

Case 3: Time taken to Droplet spin up. (Cluster Autoscaler)

Time Taken For:

  • Cluster autoscaler to detect pending pods and start spinning droplet: 1 min 05 secs (approx)
  • Droplet spin up , but Not Ready State: 1 min 20 secs
  • Droplet to each READY STATE: 10 - 20 secs

Total Time taken to droplet Available: 2 min 40 secs (approx)

Case 4: Time taken for pod to spin down. (HPA)

It depends on ScalDown Policy, as like as Case 2.

Case 5: Time taken to Droplet Spin down. (Cluster Autoscaler)

After all the Target pods terminated from the Droplet(Time taken depends on case 4).

Digital Ocean set Taints to node like DeletionCandidate...=<timestamp>:NopreferSchedule

Ten mins from taint set, droplet starts spin down.

Conclusion:

If you need Node for one hour to stay alive (utilize as max because of hourly charge) And Not cross one hour(if above 1 hr, it billed as 2 hr)

You can set, StabilizatioWindowSeconds = 1 hr - DigitalOcean Time Interval to delete

Theoretically, StabilizatioWindowSeconds = 1 hr - 10 mins = 50 mins (3000 secs)

Practically Time taken for all Pods to terminate may vary depend on the scale down policy, your application etc...

So I set approx(according to my case) StabilizatioWindowSeconds = 1 hr - 20 mins = 40 mins (2400 secs)

Thus, your Scaled up pods can now alive for 40 mins, And starts terminating after 40 mins (In my case all pods terminated within max of 5 mins). So balance 15 mins for digital ocean to destroy the droplet.

CAUTION: Time calculated are depending on my use case and environment etc..

Add HPA behavior config for reference

behavior:
    scaleDown:
      stabilizationWindowSeconds: 2400
      selectPolicy: Max
      policies:
      - type: percent
        value: 100
        periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
      - type: Percent
        value: 100
        periodSeconds: 1
Sargasso answered 10/6, 2021 at 14:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.