AWS ECS Task Memory Hard and Soft Limits

Asked 26/6, 2017 at 16:18 Answered 26/9, 2023 at 20:12

Solved amazon-web-services memory cluster-computing amazon-ecs

I'm confused about the purpose of having both hard and soft memory limits for ECS task definitions.

IIRC the soft limit is how much memory the scheduler reserves on an instance for the task to run, and the hard limit is how much memory a container can use before it is murdered.

My issue is that if the ECS scheduler allocates tasks to instances based on the soft limit, you could have a situation where a task that is using memory above the soft limit but below the hard limit could cause the instance to exceed its max memory (assuming all other tasks are using memory slightly below or equal to their soft limit).

Is this correct?

Thanks

Twelfth answered 26/6, 2017 at 16:18 Comment(0)

If you expect to run a compute workload that is primarily memory bound instead of CPU bound then you should use only the hard limit, not the soft limit. From the docs:

You must specify a non-zero integer for one or both of memory or memoryReservation in container definitions. If you specify both, memory must be greater than memoryReservation. If you specify memoryReservation, then that value is subtracted from the available memory resources for the container instance on which the container is placed; otherwise, the value of memory is used.

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html

By specifying only a hard memory limit for your tasks you avoid running out of memory because ECS stops placing tasks on the instance, and docker kills any containers that try to go over the hard limit.

The soft memory limit feature is designed for CPU bound applications where you want to reserve a small minimum of memory (the soft limit) but allow occasional bursts up to the hard limit. In this type of CPU heavy workload you don't really care about the specific value of memory usage for the containers that much because the containers will run out of CPU long before they exhaust the memory of the instance, so you can place tasks based on CPU reservation and the soft memory limit. In this setup the hard limit is just a failsafe in case something goes out of control or there is a memory leak.

So in summary you should evaluate your workload using load tests and see whether it tends to run out of CPU first or out of memory first. If you are CPU bound then you can use the soft memory limit with an optional hard limit just as a failsafe. If you are memory bound then you will need to use just the hard limit with no soft limit.

Torruella answered 26/6, 2017 at 16:55 Comment(2)

I disagree that memory bound tasks don't need a soft limit. A reasonable soft limit helps ECS to place not on instances with insufficient memory in the first place. – Tutt 27/7, 2018 at 12:47

@Tutt Task placement takes both hard and soft limits into account – Torruella 22/8, 2018 at 17:24

@nathanpeck is the authority here, but I just wanted to address a specific scenario that you brought up:

My issue is that if the ECS scheduler allocates tasks to instances based on the soft limit, you could have a situation where a task that is using memory above the soft limit but below the hard limit could cause the instance to exceed its max memory (assuming all other tasks are using memory slightly below or equal to their soft limit).

This post from AWS explains what occurs in such a scenario:

If containers try to consume memory between these two values (or between the soft limit and the host capacity if a hard limit is not set), they may compete with each other. In this case, what happens depends on the heuristics used by the Linux kernel’s OOM (Out of Memory) killer. ECS and Docker are both uninvolved here; it’s the Linux kernel reacting to memory pressure. If something is above its soft limit, it’s more likely to be killed than something below its soft limit, but figuring out which process gets killed requires knowing all the other processes on the system and what they are doing with their memory as well. Again the new memory feature we announced can come to rescue here. While the OOM behavior isn’t changing, now containers can be configured to swap out to disk in a memory pressure scenario. This can potentially alleviate the need for the OOM killer to kick in (if containers are configured to swap).

Wightman answered 23/5, 2022 at 18:40 Comment(0)

These are all the options you have available and what happens when you pick one of them:

1.) if you only set a soft limit, that represents the reservation and the ceiling is represented by the container instance total memory

2.) if you set the soft limit and the hard limit, the soft limit represents the reservation and the ceiling is represented by the hard limit you set.

3.) if you only set the hard limit, that represents both the reservation and the ceiling.

View the memory allocations of a container instance

Firstly, open the Amazon ECS console.
Then, in the navigation pane, choose Clusters.
Next, choose the cluster that you created.
Choose the ECS Instances view, then choose the container instance included with the cluster you created from the Container Instance column. Please note that the Details pane shows that the memory in the Available column is equal to that in the Registered column.
For statistics on the resource usage of the instance, connect to the instance using SSH, and then run the docker stats command.

Inexpugnable answered 22/6, 2023 at 10:16 Comment(0)

Very Tricky Question :)

Example you have an ECS/EC2 Server with 4GB Memory and then you assigned 1GB soft limit and 2GB hard limit. Then the scheduler will allow you to create a maximum of 4 tasks on that server.

However, assuming 3 tasks are consuming 1 GB Memory and then the 4th task uses 1.5 GB of Memory, then you have a total of 4.5 GB Memory usage, right?

The problem is you only have 4GB so how is that possible?

Obviously this is a problem scenario and even if this is possible, you do not want this to happen because this breaks the server.

The reason AWS allowed this setup is to fully maximize your Memory resource in case the other tasks have very low usages, then the remaining tasks are allowed to go beyond the soft limit to avail of the free Memory and this is a good and efficient setup, right?

However, as a result, you gave birth to another problem which is the potential to exceed the Maximum Memory setup as brought up by maambmb?

So what do you think is the solution?

Simple, don't let your server exceed the maximum memory. Use your auto-scaling and Cloudwatch alarm triggers to scale out the server when a threshold is breached.

Hope this helps. :)

Dovev answered 26/9, 2023 at 20:12 Comment(0)

Recommended topics

Hot tags