What is the difference between “container_memory_working_set_bytes” and “container_memory_rss” metric on the container

Asked 23/12, 2020 at 17:36 Answered 25/6, 2023 at 12:23

Solved kubernetes memory containers monitoring

I need to monitor my container memory usage running on kubernetes cluster. After read some articles there're two recommendations: container_memory_rss, container_memory_working_set_bytes

The definitions of both metrics are said (from the cAdvisor code)

container_memory_rss : The amount of anonymous and swap cache memory
container_memory_working_set_bytes: The amount of working set memory, this includes recently accessed memory, dirty memory, and kernel memory

I think both metrics are represent the bytes size on the physical memory that process uses. But there are some differences between the two values from my Grafana dashboard.

My question is:

What is the difference between two metrics?
Which metrics are much proper to monitor memory usage? Some post said both because one of those metrics reaches to the limit, then that container is OOM killed.

Millepore answered 23/12, 2020 at 17:36 Comment(1)

I belive this one addresses your question. github.com/google/cadvisor/issues/2582#issuecomment-644883028 – Brewton 24/12, 2020 at 22:3

You are right. I will try to address your questions in more detail.

What is the difference between two metrics?

container_memory_rss equals to the value of total_rss from /sys/fs/cgroups/memory/memory.status file:

// The amount of anonymous and swap cache memory (includes transparent
// hugepages).
// Units: Bytes.
RSS uint64 `json:"rss"`

The total amount of anonymous and swap cache memory (it includes transparent hugepages), and it equals to the value of total_rss from memory.status file. This should not be confused with the true resident set size or the amount of physical memory used by the cgroup. rss + file_mapped will give you the resident set size of cgroup. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.

container_memory_working_set_bytes (as already mentioned by Olesya) is the total usage - inactive file. It is an estimate of how much memory cannot be evicted:

// The amount of working set memory, this includes recently accessed memory,
// dirty memory, and kernel memory. Working set is <= "usage".
// Units: Bytes.
WorkingSet uint64 `json:"working_set"`

Working Set is the current size, in bytes, of the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process.

Which metrics are much proper to monitor memory usage? Some post said both because one of those metrics reaches to the limit, then that container is oom killed.

If you are limiting the resource usage for your pods than you should monitor both as they will cause an oom-kill if they reach a particular resource limit.

I also recommend this article which shows an example explaining the below assertion:

You might think that memory utilization is easily tracked with container_memory_usage_bytes, however, this metric also includes cached (think filesystem cache) items that can be evicted under memory pressure. The better metric is container_memory_working_set_bytes as this is what the OOM killer is watching for.

EDIT:

Adding some additional sources as a supplement:

Pontifical answered 24/3, 2021 at 10:10 Comment(4)

I posted another question related with this. Could you also review that? – Millepore 27/3, 2021 at 14:35

I just did. Feel free to take a look. – Knotgrass 29/3, 2021 at 9:13

I have a similar question I feel the answer does not clearly distinguish the relation on container_memory_rss and container_memory_working_set since in my case the working_set is 10 times more than container_memory_rss. Also, the OP question has a comment has a GitHub link that states working_set does not include inactive_file (so no swap cache?) which is again does not explain why rss is so low compared to working_set since rss include swap cache as well – Moue 15/7, 2021 at 6:16

#68290144 – Moue 15/7, 2021 at 6:17

Actually this is not an answer, but as an enhancement to the accepted answer. And the following comments apply to cgroup v1, might be not applicable to cgroup v2.

container_memory_working_set_bytes (as already mentioned by Olesya) is the total usage - inactive file. It is an estimate of how much memory cannot be evicted:

The first sentence is true, but the cannot be evicted comment is not true: at least, container_memory_working_set_bytes includes the value presented by total_active_file, which can be evicted by:

system auto reclaiming due to low available memory
echo 1/2/3 > drop_caches mentioned by this issue, and see this link for the meaning of the values 1/2/3
echo 0 > memory.force_empty mentioned by the cgroup doc, section 5.1

And as a result, the following conclusion might be also not true:

If you are limiting the resource usage for your pods than you should monitor both as they will cause an oom-kill if they reach a particular resource limit.

container_memory_working_set_bytes reaching the limit actually may not lead to an oom-kill, at least it didn't get an oom-kill in our environment. In our environment, we monitored that total_active_file kept increasing and thus container_memory_working_set_bytes kept increasing, after container_memory_working_set_bytes reached the limit, total_active_file dropped to a lower value due to memory reclaiming and thus container_memory_working_set_bytes also dropped to a lower value, the pod was running all the time without been killed.

Actually there are already two issues (this and this) about the container_memory_working_set_bytes metric, however, none of them are solved. In our environment, we are now monitoring container_memory_rss instead of container_memory_working_set_bytes due to the false alert mentioned above.

Catchweight answered 25/6, 2023 at 12:23 Comment(0)

Recommended topics

Hot tags