My understanding is that for the same level of two cgroups (only), e.g.:
foo
|
+- bar
|
+- baz
Then bar and baz will firstly share the cpu occupied on foo according to cpu.share
.
Let's say the overall cpu is 1 core with cpu.cfs_period_us
set as 100ms. cpu.share
is set as 1024 for bar and 1024 for baz.
if both bar and baz are setting cpu.cfs_quota_us
more than 50ms, for example, 75ms. Then both cgroup will share the cpu by half, with exact value of 50ms.
if both of them has cpu.cfs_quota_us
set less than 50ms, for example, 25ms. They will still share cpu 1:1, but with the exact value of 25ms.
what if bar is setting quota as 25ms, while baz setting quota as 75ms (this is exactly what I wondered). Since bar and baz has the same cpu.share
, and bar has upper limit set as 25ms, this means bar should never exceeds 25ms. So given a 100ms period, bar will consume 25ms. Will baz also consume 25ms as bar because of same cpu.share
? If yes, then how do we put the 50ms left?
According to RHEL6's document:
Because the CFS does not demand equal usage of CPU, it is hard to predict how much CPU time a cgroup will be allowed to utilize. When tasks in one cgroup are idle and are not using any CPU time, the leftover time is collected in a global pool of unused CPU cycles. Other cgroups are allowed to borrow CPU cycles from this pool.
We can see that it is legal for baz to consume the unsed CPU cycles, i.e. baz will consume 75ms in one period.
Conclusion
cpu.share
and cpu.cfs_quota_us
are working together.
Given a total cpu quota, we should firstly distribute the cpu.share
of each cgroup. Then find the cgroups whose exact quota exceeds their cpu.cfs_quota_us
, find all such cgroups and keep their quota as their cpu.cfs_quota_us
, and collect the exceeded part as unused cpu pool. Distribute these unused cpu pool among other cgroups by cpu.share
again, and iterate as above, until no cgroup is exceeding the upper limit.