The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed metrics (like CPU
or Memory
).
There is an official walkthrough focusing on HPA
and it's scaling:
The algorithm that scales the amount of replicas is the following:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
An example (of already rendered) autoscaling can be implemented with a YAML
manifest like below:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: HPA-NAME
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: DEPLOYMENT-NAME
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
A side note!
HPA
will use calculate both metrics and chose the one with bigger desiredReplicas
!
Addressing a comment I wrote under the question:
I think we misunderstood each other. It's perfectly okay to "scale up when CPU >= 90" but due to logic behind the formula I don't think it will be possible to say "scale down when CPU <=70". According to the formula it would be something in the midst of: scale up when CPU >= 90 and scale down when CPU =< 45.
This example could be misleading and not 100% true in all scenarios. Taking a look on following example:
HPA
set to averageUtilization
of 75%
.
Quick calculations with some degree of approximation (default tolerance for HPA
is 0.1
):
2
replicas:
scale-up
(by 1
) should happen when: currentMetricValue
is >=80%
:
x = ceil[2 * (80/75)]
, x = ceil[2,1(3)]
, x = 3
scale-down
(by 1
) should happen when currentMetricValue
is <=33%
:
x = ceil[2 * (33/75)]
, x = ceil[0,88]
, x = 1
8
replicas:
scale-up
(by 1
) should happen when currentMetricValue
is >=76%
:
x = ceil[8 * (76/75)]
, x = ceil[8,10(6)]
, x = 9
scale-down
(by 1
) should happen when currentMetricValue
is <=64%
:
x = ceil[8 * (64/75)]
, x = ceil[6,82(6)]
, x = 7
Following this example, having 8
replicas with their currentMetricValue
at 55
(desiredMetricValue
set to 75
) should scale-down
to 6
replicas.
More information that describes the decision making of HPA
(for example why it's doesn't scale) can be found by running:
$ kubectl describe hpa HPA-NAME
Name: nginx-scaler
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Sun, 07 Mar 2021 22:48:58 +0100
Reference: Deployment/nginx-scaling
Metrics: ( current / target )
resource memory on pods (as a percentage of request): 5% (61903667200m) / 75%
resource cpu on pods (as a percentage of request): 79% (199m) / 75%
Min replicas: 1
Max replicas: 10
Deployment pods: 5 current / 5 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 4m48s (x4 over 5m3s) horizontal-pod-autoscaler did not receive metrics for any ready pods
Normal SuccessfulRescale 103s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 71s horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 71s horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
HPA
scaling procedures can be modified by the changes introduced in Kubernetes version 1.18
and newer where the:
Support for configurable scaling behavior
Starting from v1.18 the v2beta2
API allows scaling behavior to be configured through the HPA behavior
field. Behaviors are specified separately for scaling up and down in scaleUp
or scaleDown
section under the behavior
field. A stabilization window can be specified for both directions which prevents the flapping of the number of the replicas in the scaling target. Similarly specifying scaling policies controls the rate of change of replicas while scaling.
Kubernetes.io: Docs: Tasks: Run application: Horizontal pod autoscale: Support for configurable scaling behavior
I'd reckon you could used newly introduced field like behavior
and stabilizationWindowSeconds
to tune your workload to your specific needs.
I also do recommend reaching out to EKS
documentation for more reference, support for metrics and examples.
ECS
values could be hard to achieve. Could you please tell which Kubernetes version are you running (due to changes toHPA
in1.18
)? Also, I'd reckon that you could try to set your average utilization inHPA
in between your values (the bigger value will be considered when scaling) and set thestabilizationWindow
: kubernetes.io/docs/tasks/run-application/… – NomothetictargetAverageUtilization
value is used for scaling up, and by default the1/2 * targetAverageUtilization
is used for scaling down? If you put your thoughts into an answer I'll accept it. Thanks! – Torture