Memory usage discrepancy: cgroup memory.usage_in_bytes vs. RSS inside docker container
Asked Answered
P

2

45

"Kubernetes" (v1.10.2) says that my pod (which contains one container) is using about 5GB memory. Inside the container, RSS is saying more like 681MiB. Can anypony explain how to get from 681MiB to 5GB with the following data (or describe how to make up the difference with another command I've omitted, either from the container or from the docker host that is running this container in kubernetes)?

kubectl top pods says 5GB:

% kubectl top pods -l app=myapp
NAME                             CPU(cores)   MEMORY(bytes)
myapp-56b947bf6d-2lcr7           39m          5039Mi

Cadvisor reports a similar number (might have been from a slightly different time, so please ignore small differences):

container_memory_usage_bytes{pod_name=~".*myapp.*"}      5309456384

5309456384 / 1024.0 / 1024 ~= 5063 ~= 5039

Inside the container, this file appears to be where cadvisor is getting its data:

% kubectl exec -it myapp-56b947bf6d-2lcr7 bash
meme@myapp-56b947bf6d-2lcr7:/app# cat /sys/fs/cgroup/memory/memory.usage_in_bytes
5309456384

The resident set size (RSS) inside the container does NOT match up (less than 1GB):

meme@myapp-56b947bf6d-2lcr7:/app# kb=$(ps aux | grep -v grep | grep -v 'ps aux' | grep -v bash | grep -v awk | grep -v RSS | awk '{print $6}' | awk '{s+=$1} END {printf "%.0f", s}'); mb=$(expr $kb / 1024); printf "Kb: $kb\nMb: $mb\n"
Kb: 698076
Mb: 681

Full ps aux in case that is helpful:

meme@myapp-56b947bf6d-2lcr7:/app# ps aux | grep -v grep | grep -v 'ps aux' | grep -v bash | grep -v awk
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
meme         1  0.0  0.0 151840 10984 ?        Ss   Jun04   0:29 /usr/sbin/apache2 -D FOREGROUND
www-data    10  0.0  0.0 147340  4652 ?        S    Jun04   0:00 /usr/sbin/apache2 -D FOREGROUND
www-data    11  0.0  0.0 148556  4392 ?        S    Jun04   0:16 /usr/sbin/apache2 -D FOREGROUND
www-data    12  0.2  0.0 2080632 11348 ?       Sl   Jun04  31:58 /usr/sbin/apache2 -D FOREGROUND
www-data    13  0.1  0.0 2080384 10980 ?       Sl   Jun04  18:12 /usr/sbin/apache2 -D FOREGROUND
www-data    68  0.3  0.0 349048 94272 ?        Sl   Jun04  47:09 hotapp
www-data   176  0.2  0.0 349624 92888 ?        Sl   Jun04  43:11 hotapp
www-data   179  0.2  0.0 349196 94456 ?        Sl   Jun04  42:20 hotapp
www-data   180  0.3  0.0 349828 95112 ?        Sl   Jun04  44:14 hotapp
www-data   185  0.3  0.0 346644 91948 ?        Sl   Jun04  43:49 hotapp
www-data   186  0.3  0.0 346208 91568 ?        Sl   Jun04  44:27 hotapp
www-data   189  0.2  0.0 350208 95476 ?        Sl   Jun04  41:47 hotapp

Memory section from docker's container stats API:

curl --unix-socket /var/run/docker.sock 'http:/v1.24/containers/a45fc651e7b12f527b677e6a46e2902786bee6620484922016a135e317a42b4e/stats?stream=false' | jq . # yields:

"memory_stats": {
  "usage": 5327712256,
  "max_usage": 5368344576,
  "stats": {
    "active_anon": 609095680,
    "active_file": 74457088,
    "cache": 109944832,
    "dirty": 28672,
    "hierarchical_memory_limit": 5368709120,
    "inactive_anon": 1687552,
    "inactive_file": 29974528,
    "mapped_file": 1675264,
    "pgfault": 295316278,
    "pgmajfault": 77,
    "pgpgin": 85138921,
    "pgpgout": 84964308,
    "rss": 605270016,
    "rss_huge": 0,
    "shmem": 5513216,
    "total_active_anon": 609095680,
    "total_active_file": 74457088,
    "total_cache": 109944832,
    "total_dirty": 28672,
    "total_inactive_anon": 1687552,
    "total_inactive_file": 29974528,
    "total_mapped_file": 1675264,
    "total_pgfault": 295316278,
    "total_pgmajfault": 77,
    "total_pgpgin": 85138921,
    "total_pgpgout": 84964308,
    "total_rss": 605270016,
    "total_rss_huge": 0,
    "total_shmem": 5513216,
    "total_unevictable": 0,
    "total_writeback": 0,
    "unevictable": 0,
    "writeback": 0
  },
  "limit": 5368709120
},

A comment on https://github.com/google/cadvisor/issues/638 asserts:

Total (memory.usage_in_bytes) = rss + cache

https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt says:

usage_in_bytes: For efficiency, as other kernel components, memory cgroup uses some optimization to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz value for efficient access. (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2).

https://docs.docker.com/engine/reference/commandline/stats/#parent-command says:

Note: On Linux, the Docker CLI reports memory usage by subtracting page cache usage from the total memory usage. The API does not perform such a calculation but rather provides the total memory usage and the amount from the page cache so that clients can use the data as needed.

And indeed, most of the stuff in /sys/fs/cgroup/memory/memory.stat in the container shows up in the above docker stats api response (slight differences are from taking the samples at a different time, sorry):

meme@myapp-56b947bf6d-2lcr7:/app# cat /sys/fs/cgroup/memory/memory.stat
cache 119492608
rss 607436800
rss_huge 0
shmem 5525504
mapped_file 1675264
dirty 69632
writeback 0
pgpgin 85573974
pgpgout 85396501
pgfault 296366011
pgmajfault 80
inactive_anon 1687552
active_anon 611213312
inactive_file 32800768
active_file 81166336
unevictable 0
hierarchical_memory_limit 5368709120
total_cache 119492608
total_rss 607436800
total_rss_huge 0
total_shmem 5525504
total_mapped_file 1675264
total_dirty 69632
total_writeback 0
total_pgpgin 85573974
total_pgpgout 85396501
total_pgfault 296366011
total_pgmajfault 80
total_inactive_anon 1687552
total_active_anon 611213312
total_inactive_file 32800768
total_active_file 81166336
total_unevictable 0

Memory info from kubectl describe pod <pod>:

Limits:
  memory:  5Gi
Requests:
  memory:   4Gi

Here's what pmap says inside the container. In this one-liner, I get all process ids, run pmap -x on them, and pull the Kbytes column from the pmap results. The total result is 256 Megabytes (much less than ps's RSS, partially, I think, because many of the processes return no output from pmap -x):

ps aux | awk '{print $2}' | grep -v PID | xargs sudo pmap -x | grep total | grep -v grep | awk '{print $3}' | awk '{s+=$1} END {printf "%.0f", s}'; echo
256820

ps_mem.py is mentioned at https://mcmap.net/q/53462/-how-can-i-measure-the-actual-memory-usage-of-an-application-or-process. It inspects /proc/$pid/statm and /proc/$pid/smaps. No illumination here (again, it seems to be ignoring some processes):

# python ps_mem.py
Private  +   Shared  =  RAM used    Program

  1.7 MiB +   1.0 MiB =   2.7 MiB   apache2
  2.0 MiB +   1.0 MiB =   3.0 MiB   bash (3)
---------------------------------
                          5.7 MiB
=================================

There is another question similar to this (but with less information) at Incorrect reporting of container memory usage by cadvisor. Thanks!

Paroicous answered 14/6, 2018 at 20:52 Comment(9)
$ bytes=$(ps aux | grep -v grep | grep -v 'ps aux' | grep -v bash | grep -v awk | grep -v RSS | awk '{s+=$1} END {printf "%.0f", s}'); megabytes=$(expr $bytes / 1024); printf "bytes: $bytes\nmibibytes: $megabytes\n" - The command you've posted should calculate the sum of the first column, which is username, but RSS is $6. Are you sure you've calculated the correct values?Roundworm
you're right, thanks, corrected! i typo'd the command when posting into the question. the numbers in the output of the command were based on column 6 and are correct.Paroicous
Can you show us the kubectl describe pod <pod>? Specially any memory setting that might appear.Corvette
added towards end of question!Paroicous
maybe this one helps: #131803Auten
Which Kubernetes version are you using? I just tried to reproduce this but it works as expected for me with 1.10.Obscene
i'm using kubernetes v1.10.2. it doesn't happen with all of our apps, but it's not obvious why certain apps demonstrate this symptom and others do not.Paroicous
facing the same issue, have you figure it outGaffrigged
no. let me know. :DParoicous
S
12

One thing I didn't see you check here is kernel memory. This is also accounted for in the memory.usage_in_bytes figure, but doesn't appear in memory.stat. You can find that by looking at /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes.

I saw a similar thing happening for one of our .NET core applications, once, and I couldn't figure out what exactly was happening (perhaps a memory leak in .NET core since it's unmanaged memory our app doesn't control).

Perhaps it's another breadcrumb for you. It would depend on your application whether that use was normal or not, but in terms of cgroups I believe kernel memory use is unconstrained by default.

Skiascope answered 18/10, 2018 at 9:2 Comment(1)
To know more about this, I suggest reading this great article : medium.com/@bobzsj87/demist-the-memory-ghost-d6b7cf45dd2aYonyona
I
12

I don't know if you already find your answer or not but let me give you some information that may help.

  • cAdvisor extract many memory-related metrics. We will focus on:

    1. container_memory_usage_bytes = value in /sys/fs/cgroup/memory/memory.usage_in_bytes file. (Usage of the memory)

    2. container_memory_working_set_bytes = container_memory_usage_bytes - total_inactive_file (from /sys/fs/cgroup/memory/memory.stat), this is calculated in cAdvisor and is <= container_memory_usage_bytes

    3. container_memory_rss = total_rss value from /sys/fs/cgroup/memory/memory.stat

  • Now you know how those metrics are gathered, you need to know that when you use the kubectl top pods command, you get the value of container_memory_working_set_bytes not container_memory_usage_bytes metric.

    so from your values:

    5039Mi "working set fro kubectl command" ~= 5064 "from memory.usage file" - 28 "total_inactive_file from Memory section from docker's container stats API"

  • It is also worth to mention that when the value of container_memory_usage_bytes reaches to the limits, your pod will NOT get oom-killed. BUT if container_memory_working_set_bytes or container_memory_rss reached to the limits, the pod will be killed.

Incandesce answered 24/7, 2020 at 21:2 Comment(3)
given working_set_bytes = memory_usage_in_bytes - total_inactive_file then the total_rss should already be part of the memory_usage_in_bytes anyways as far as I understand it. So working_set_bytes should be the only metric to care about regarding OOM kills? Also described here: faun.pub/…Sining
what happens when container_memory_usage_bytes reaches the limit? Performance impact? or nothing?Michellemichels
The conclusion container_memory_working_set_bytes or container_memory_rss reached to the limits, the pod will be killed. is incorrect: container_memory_working_set_bytes includes total_active_file, when it reaches the limit, total_active_file will be reclaimed, and the pod will NOT be affected. See github.com/kubernetes/kubernetes/issues/43916 and github.com/kubernetes/kubernetes/issues/104533.Moray

© 2022 - 2024 — McMap. All rights reserved.