AWS/ECS CPUUtilization average vs maximum
A

1

25

After reading AWS documentation I am still not clear about cloudwatch metrics statistics average and maximum, specifically for ECS CPUUtilization.

I have a AWS ECS cluster fargate setup, a service with minimum count of 2 healthy task. I have enabled autoscaling using AWS/ECS CPUUtilization for ClusterName my and ServiceName. A Cloudwatch alarm triggers is configured to trigger when average cpu utilization is more than 75% for one minute period for 3 data points.

I also have a health check setup with a frequency of 30 seconds and timeout of 5 mins and

I ran a performance script to test the autoscaling behavior, but I am noticing the service gets marked as unhealthy and new tasks gets created. When I check the cpuutilization metric, for average statistics it shows around 44% utilization but for maximum statistics it shows more than hundred percent, screenshots attached.

Average

Average

Maximum maximum

So what is average and maximum here, does this mean average is average cpu utilization of both my instances? and maximum shows one of my instance's cpu utilization more than 100?

Aurangzeb answered 22/7, 2019 at 16:27 Comment(4)
It is the average or maximum of the samples collected during the Period selected (1 minute in the screenshots given).Ezekielezell
@Ezekielezell thanks for replying, so in my case because there is so much difference between average and maximum , should I consider setting the autoscaler based on maximum rather than average ? I see recommendation from amazon to set autoscaling based on average.Aurangzeb
why the f. is this not a programming question?Lavettelavigne
@Lavettelavigne May be still there are people who does not know that infrastructure as code is a thing.Aurangzeb
D
13

Average and maximum here measures the average CPU usage over 1 minute period and the max CPU usage over 1 minute period.

In terms of configuring autoscaling rules, you want to use the average metric.

The maximum metric usually is random short burst spikes that can be caused by things like garbage collection.

The average metric however is the p50 CPU usage, so half of the time the CPU usage is more than that, half is less. (Yeah, technically that is the median, but for now, it doesn't matter as much).

You most likely want to be scaling up using average metric when say your CPU goes to say 75-85% (keep in mind, you need to give time for new tasks to warm up).

Max metric can generally be ignored for autoscaling usecases.

Disarmament answered 16/12, 2021 at 22:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.