What is the impact if my service exceeds 100% "Service CPU utilization"
Asked Answered
S

1

12

AS per AWS docs:

Service CPU utilization = 
(Total CPU units used by tasks in service) x 100
------------------------------------------------------
(Total CPU units specified in task definition) x (number of tasks in service)

...so is there any adverse impact of letting a service operate at 500% Service CPU Utilization beyond that my service is doing more CPU intensive work than it is configured for?

Put another way: My overall cluster is running at 5% CPU but my service is at 500% CPU Util - does this affect underlying service performance in any manner?

Thanks

Southwester answered 30/1, 2018 at 19:48 Comment(0)
K
16

The Service CPU Utilization is a convenience metric specifying the CPU Usage of your service, as in, the ECS Cluster Service. This shouldn't be confused with the actual CPU usage per host, which are managed by that service.

The CPU units that you set is defined in your task - you set the limit for what you want a healthy task to look like; ECS and CloudWatch use that metric to help you keep your cluster in what you consider a "healthy" state.

AWS Service Utilization Documentation:

For example, the task definition for a service specifies a total of 512 CPU units and 1,024 MiB of memory (with the hard limit memory parameter) for all of its containers. The service has a desired count of 1 running task, the service is running on a cluster with 1 c4.large container instance (with 2,048 CPU units and 3,768 MiB of total memory), and there are no other tasks running on the cluster. Although the task specifies 512 CPU units, because it is the only running task on a container instance with 2,048 CPU units, it has the ability to use up to four times the specified amount (2,048 / 512); however, the specified memory of 1,024 MiB is a hard limit and it cannot be exceeded, so in this case, service memory utilization cannot exceed 100%.

[ ... ]

If this task is performing CPU-intensive work during a period and using all 2,048 of the available CPU units and 512 MiB of memory, then the service reports 400% CPU utilization and 50% memory utilization. If the task is idle and using 128 CPU units and 128 MiB of memory, then the service reports 25% CPU utilization and 12.5% memory utilization.

So to directly answer your question of whether it impacts the service performance, the answer is... maybe. The service can be configured to only know about or consider some hosts in your cluster (more details). If your service reports usage of 500% based on the limits you've set, but the underlying hosts that the service has access to are healthy at the host level, then you can maybe consider your service to be "healthy".

I would, however, consider tweaking your task configurations to better align with what would be a normal off-peak limit for your CPU units allowed.

Keep in mind though that while your cluster may be showing you 5% usage, it's entirely possible that your cluster has 20 hosts, 19 of which are idle, and 1 is entirely overloaded by your service (again, dependent on how you've configured your task placement constraints).

Knothole answered 30/1, 2018 at 20:22 Comment(3)
Thanks @MrDuk. Yeah was aware of difference between CPU and Service Utilization but I dont understand what is the convenience factor here. Why can't AWS simply show a usage metric that traverses across Cluster >> Host >> Service. So My cluster is 10% utilized with 25 hosts each running @ 10-30% and services on these hosts running at some x%Southwester
Consider that a single ecs service can place multiple ecs tasks on a single host. The convenience factor here is being able to specify 25% limits on CPU for, say, 4 different tasks that are all co-hosted on a single box. Likewise, you can have multiple ecs services per cluster. If we consider multiple ecs services each coordinating multiple different tasks, it becomes clearer why ecs service usage is it's own metric in the cluster.Knothole
Worth adding, on the same AWS ECS Docs: "The CPU utilization will only go above 100% when the CPU units are defined at the container level. If you define CPU units at the task level, the utilization will not go above the defined task-level limit."Primogenial

© 2022 - 2024 — McMap. All rights reserved.