Kubernetes HPA is flapping replicas regardless of stabilisation window
Asked Answered
T

1

8

According to the K8s documentation, to avoid flapping of replicas property stabilizationWindowSeconds can be used

The stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating. The stabilization window is used by the autoscaling algorithm to consider the computed desired state from the past to prevent scaling.

When the metrics indicate that the target should be scaled down the algorithm looks into previously computed desired states and uses the highest value from the specified interval.

From what I understand from documentation, with the following hpa configuration:

  horizontalPodAutoscaler:
    enabled: true
    minReplicas: 2
    maxReplicas: 14
    targetCPUUtilizationPercentage: 70
    behavior:
      scaleDown:
        stabilizationWindowSeconds: 1800
        policies:
          - type: Pods
            value: 1
            periodSeconds: 300
      scaleUp:
        stabilizationWindowSeconds: 60
        policies:
          - type: Pods
            value: 2
            periodSeconds: 60

Scaling down of my deployment (let's say from 7 pods to 6) shouldn't happen, if at any time during the last 1800 seconds (30 minutes) hpa calculated target pods number equal to 7 pods. But I'm still observing the flapping of replicas in the deployment.

Deployment Replicas

What I misunderstood in the documentation and how to avoid continuous scaling up/down of 1 pod?

Kubernetes v1.20

HPA description:

CreationTimestamp:                                     Thu, 14 Oct 2021 12:14:37 +0200
Reference:                                             Deployment/my-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  64% (1621m) / 70%
Min replicas:                                          2
Max replicas:                                          14
Behavior:
  Scale Up:
    Stabilization Window: 60 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 60 seconds
  Scale Down:
    Stabilization Window: 1800 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 1  Period: 300 seconds
Deployment pods:    3 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>
Theseus answered 29/10, 2021 at 12:33 Comment(7)
Which version of Kubernetes did you use and what exactly did you deploy inside your cluster?Winnick
@MikołajGłodziak K8s version is 1.20, deployment is a Spring Boot application that serves rest api.Theseus
Could you share your deployment and HPA configuration yaml files? Could you paste output of the kubectl describe hpa command? Could you check that all pods from this deployment are in Running states, and there are no pods that are stuck somehow (for example in Terminating state)?Samuelson
Did you try to run your deployment on the newer Kubernetes version - v.1.20 has only one more month of support?Samuelson
@MikolajS. I've added hpa description to the question. Flapping of replicas happens not always, hard to catch a state before scaling. Don't see terminating pods and no errors in logs, so I believe it is because autoscaling. Had no pods restarts before HPA enabled. I didn't try newer version of K8s, version might be a reason.Theseus
If possible, could you please share sample yaml files used for deployment and HPA (even with different container image, I just want to try to replicate this issue in my local environment)? Which apiVersion are you using in HPA definition? There are some differences between them, some fields are replaced with different ones.Samuelson
There was a bug in version, I added an answer. Upgrade of k8s helped.Theseus
T
7

There is a bug in k8s HPA in v1.20, check the issue. Upgrading to v1.21 fixed the problem, deployment is scaling without flapping after the upgrade.

On the picture scaling of the deployment over 2 days:

Deployment scale over 2 days

Theseus answered 4/1, 2022 at 7:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.