Why can a user-process invoke the Linux OOM-killer due to memory fragmentation, even though plenty of RAM is available?
Asked Answered
D

1

5

I've got a headless ARM-based Linux (v3.10.53-1.1.1) system with no swap space enabled, and I occasionally see processes get killed by the OOM-killer even though there is plenty of RAM available.

Running echo 1 > /proc/sys/vm/compact_memory periodically seems to keep the OOM-killer at bay, which makes me think that memory fragmentation is the culprit, but I don't understand why a user process would ever need physically-contiguous blocks anyway; as I understand it, even in the worst-case scenario (complete fragmentation, with only individual 4K blocks available), the kernel could simply allocate the necessary number of individual 4K blocks and then use the Magic of Virtual Memory (tm) to make them look contiguous to the user-process.

Can someone explain why the OOM-killer would be invoked in response to memory fragmentation? Is it just a buggy kernel or is there a genuine reason? (And even if the kernel did need to de-frag memory in order to satisfy a request, shouldn't it do that automatically rather than giving up and OOM'ing?)

I've pasted an example OOM-killer invocation below, in case it sheds any light on things. I can reproduce the fault at will; this invocation occurred while the computer still had ~120MB of RAM available (according to free), in response to my test-program allocating memory, 10000 400-byte allocations at a time.

May 28 18:51:34 g2 user.warn kernel: [ 4228.307769] cored invoked oom-killer: gfp_mask=0x2084d0, order=0, oom_score_adj=0
May 28 18:51:35 g2 user.warn kernel: [ 4228.315295] CPU: 2 PID: 19687 Comm: cored Tainted: G           O 3.10.53-1.1.1_ga+gf57416a #1
May 28 18:51:35 g2 user.warn kernel: [ 4228.323843] Backtrace:
May 28 18:51:35 g2 user.warn kernel: [ 4228.326340] [<c0011c54>] (dump_backtrace+0x0/0x10c) from [<c0011e68>] (show_stack+0x18/0x1c)
May 28 18:51:35 g2 user.warn kernel: [ 4228.334804]  r6:00000000 r5:00000000 r4:c9784000 r3:00000000
May 28 18:51:35 g2 user.warn kernel: [ 4228.340566] [<c0011e50>] (show_stack+0x0/0x1c) from [<c04d0dd8>] (dump_stack+0x24/0x28)
May 28 18:51:35 g2 user.warn kernel: [ 4228.348684] [<c04d0db4>] (dump_stack+0x0/0x28) from [<c009b474>] (dump_header.isra.10+0x84/0x19c)
May 28 18:51:35 g2 user.warn kernel: [ 4228.357616] [<c009b3f0>] (dump_header.isra.10+0x0/0x19c) from [<c009ba3c>] (oom_kill_process+0x288/0x3f4)
May 28 18:51:35 g2 user.warn kernel: [ 4228.367230] [<c009b7b4>] (oom_kill_process+0x0/0x3f4) from [<c009bf8c>] (out_of_memory+0x208/0x2cc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.376323] [<c009bd84>] (out_of_memory+0x0/0x2cc) from [<c00a0278>] (__alloc_pages_nodemask+0x8f8/0x910)
May 28 18:51:35 g2 user.warn kernel: [ 4228.385921] [<c009f980>] (__alloc_pages_nodemask+0x0/0x910) from [<c00b6c34>] (__pte_alloc+0x2c/0x158)
May 28 18:51:35 g2 user.warn kernel: [ 4228.395263] [<c00b6c08>] (__pte_alloc+0x0/0x158) from [<c00b9fe0>] (handle_mm_fault+0xd4/0xfc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.403914]  r6:c981a5d8 r5:cc421a40 r4:10400000 r3:10400000
May 28 18:51:35 g2 user.warn kernel: [ 4228.409689] [<c00b9f0c>] (handle_mm_fault+0x0/0xfc) from [<c0019a00>] (do_page_fault+0x174/0x3dc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.418575] [<c001988c>] (do_page_fault+0x0/0x3dc) from [<c0019dc0>] (do_translation_fault+0xb4/0xb8)
May 28 18:51:35 g2 user.warn kernel: [ 4228.427824] [<c0019d0c>] (do_translation_fault+0x0/0xb8) from [<c00083ac>] (do_DataAbort+0x40/0xa0)
May 28 18:51:35 g2 user.warn kernel: [ 4228.436896]  r6:c0019d0c r5:00000805 r4:c06a33d0 r3:103ffea8
May 28 18:51:35 g2 user.warn kernel: [ 4228.442643] [<c000836c>] (do_DataAbort+0x0/0xa0) from [<c000e138>] (__dabt_usr+0x38/0x40)
May 28 18:51:35 g2 user.warn kernel: [ 4228.450850] Exception stack(0xc9785fb0 to 0xc9785ff8)
May 28 18:51:35 g2 user.warn kernel: [ 4228.455918] 5fa0:                                     103ffea8 00000000 b6d56708 00000199
May 28 18:51:35 g2 user.warn kernel: [ 4228.464116] 5fc0: 00000001 b6d557c0 0001ffc8 b6d557f0 103ffea0 b6d55228 10400038 00000064
May 28 18:51:35 g2 user.warn kernel: [ 4228.472327] 5fe0: 0001ffc9 beb04990 00000199 b6c95d84 600f0010 ffffffff
May 28 18:51:35 g2 user.warn kernel: [ 4228.478952]  r8:103ffea0 r7:b6d557f0 r6:ffffffff r5:600f0010 r4:b6c95d84
May 28 18:51:35 g2 user.warn kernel: [ 4228.485759] Mem-info:
May 28 18:51:35 g2 user.warn kernel: [ 4228.488038] DMA per-cpu:
May 28 18:51:35 g2 user.warn kernel: [ 4228.490589] CPU    0: hi:   90, btch:  15 usd:   5
May 28 18:51:35 g2 user.warn kernel: [ 4228.495389] CPU    1: hi:   90, btch:  15 usd:  13
May 28 18:51:35 g2 user.warn kernel: [ 4228.500205] CPU    2: hi:   90, btch:  15 usd:  17
May 28 18:51:35 g2 user.warn kernel: [ 4228.505003] CPU    3: hi:   90, btch:  15 usd:  65
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] active_anon:92679 inactive_anon:47 isolated_anon:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823]  active_file:162 inactive_file:1436 isolated_file:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823]  unevictable:0 dirty:0 writeback:0 unstable:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823]  free:28999 slab_reclaimable:841 slab_unreclaimable:2103
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823]  mapped:343 shmem:89 pagetables:573 bounce:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823]  free_cma:29019
May 28 18:51:35 g2 user.warn kernel: [ 4228.541416] DMA free:115636kB min:1996kB low:2492kB high:2992kB active_anon:370716kB inactive_anon:188kB active_file:752kB inactive_file:6040kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:2
May 28 18:51:35 g2 user.warn kernel: [ 4228.583833] lowmem_reserve[]: 0 0 0 0
May 28 18:51:35 g2 user.warn kernel: [ 4228.587577] DMA: 2335*4kB (UMC) 1266*8kB (UMC) 1034*16kB (UMC) 835*32kB (UC) 444*64kB (C) 28*128kB (C) 103*256kB (C) 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 121100kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.604979] 502 total pagecache pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.608649] 0 pages in swap cache
May 28 18:51:35 g2 user.warn kernel: [ 4228.611979] Swap cache stats: add 0, delete 0, find 0/0
May 28 18:51:35 g2 user.warn kernel: [ 4228.617210] Free swap  = 0kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.620110] Total swap = 0kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.635245] 131072 pages of RAM
May 28 18:51:35 g2 user.warn kernel: [ 4228.638394] 30575 free pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.641293] 3081 reserved pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.644437] 1708 slab pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.647239] 265328 pages shared
May 28 18:51:35 g2 user.warn kernel: [ 4228.650399] 0 pages swap cached
May 28 18:51:35 g2 user.info kernel: [ 4228.653546] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
May 28 18:51:35 g2 user.info kernel: [ 4228.661408] [  115]     0   115      761      128       5        0         -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.669347] [  237]     0   237      731       98       5        0         -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.677278] [  238]     0   238      731      100       5        0         -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.685224] [  581]     0   581     1134       78       5        0         -1000 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.693074] [  592]     0   592      662       15       4        0             0 syslogd
May 28 18:51:35 g2 user.info kernel: [ 4228.701184] [  595]     0   595      662       19       4        0             0 klogd
May 28 18:51:35 g2 user.info kernel: [ 4228.709113] [  633]     0   633     6413      212      12        0             0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.716877] [  641]     0   641      663       16       3        0             0 getty
May 28 18:51:35 g2 user.info kernel: [ 4228.724827] [  642]     0   642      663       16       5        0             0 getty
May 28 18:51:35 g2 user.info kernel: [ 4228.732770] [  646]     0   646     6413      215      12        0             0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.740540] [  650]     0   650    10791      572      10        0             0 avbd
May 28 18:51:35 g2 user.info kernel: [ 4228.748385] [  651]     0   651     9432     2365      21        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.756322] [  652]     0   652    52971     4547      42        0             0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.764104] [  712]     0   712    14135     2458      24        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.772053] [  746]     0   746     1380      248       6        0             0 dhclient
May 28 18:51:35 g2 user.info kernel: [ 4228.780251] [  779]     0   779     9419     2383      21        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.788187] [  780]     0   780     9350     2348      21        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.796127] [  781]     0   781     9349     2347      21        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.804074] [  782]     0   782     9353     2354      21        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.812012] [  783]     0   783    18807     2573      27        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.819955] [  784]     0   784    17103     3233      28        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.827882] [  785]     0   785    13990     2436      24        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.835819] [  786]     0   786     9349     2350      21        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.843764] [  807]     0   807    13255     4125      25        0             0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.851702] [ 1492]   999  1492      512       27       5        0             0 avahi-autoipd
May 28 18:51:35 g2 user.info kernel: [ 4228.860334] [ 1493]     0  1493      433       14       5        0             0 avahi-autoipd
May 28 18:51:35 g2 user.info kernel: [ 4228.868955] [ 1494]     0  1494     1380      246       7        0             0 dhclient
May 28 18:51:35 g2 user.info kernel: [ 4228.877163] [19170]     0 19170     1175      131       6        0             0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.885022] [19183]     0 19183      750       70       4        0             0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.892701] [19228]     0 19228      663       16       5        0             0 watch
May 28 18:51:35 g2 user.info kernel: [ 4228.900636] [19301]     0 19301     1175      131       5        0             0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.908475] [19315]     0 19315      751       69       5        0             0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.916154] [19365]     0 19365      663       16       5        0             0 watch
May 28 18:51:35 g2 user.info kernel: [ 4228.924099] [19443]     0 19443     1175      153       5        0             0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.931948] [19449]     0 19449      750       70       5        0             0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.939626] [19487]     0 19487     1175      132       5        0             0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.947467] [19500]     0 19500      750       70       3        0             0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.955148] [19540]     0 19540      662       17       5        0             0 tail
May 28 18:51:35 g2 user.info kernel: [ 4228.963002] [19687]     0 19687    63719    56396     127        0             0 cored
May 28 18:51:35 g2 user.err kernel: [ 4228.970936] Out of memory: Kill process 19687 (cored) score 428 or sacrifice child
May 28 18:51:35 g2 user.err kernel: [ 4228.978513] Killed process 19687 (cored) total-vm:254876kB, anon-rss:225560kB, file-rss:24kB

Also here is my test-program I use to stress the system and invoke the OOM-killer (with the echo 1 > /proc/sys/vm/compact_memory command being run every so often, the OOM-killer appears when free reports system RAM close to zero, as expected; without it, the OOM-killer appears well before that, when free reports 130+MB of RAM available but after cat /proc/buddyinfo shows the RAM becoming fragmented):

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char ** argv)
{
  while(1)
  {
     printf("PRESS RETURN TO ALLOCATE BUFFERS\n");
     const int numBytes = 400;
     char buf[64]; fgets(buf, sizeof(buf), stdin);
     for (int i=0; i<10000; i++)
     {
        void * ptr = malloc(numBytes); // yes, a deliberate memory leak
        if (ptr)
        {
           memset(ptr, 'J', numBytes);  // force the virtual memory system to actually allocate the RAM, and not only the address space
        }
        else printf("malloc() failed!\n");
     }
     fprintf(stderr, "Deliberately leaked 10000*%i bytes!\n", numBytes);
  }
  return 0;
}
Disini answered 29/5, 2020 at 2:12 Comment(0)
S
6

You are on the right track, Jeremy. The identical thing happened to me on my CentOS desktop system. I am a computer consultant, and I have worked with Linux since 1995. And I pound my Linux systems mercilessly with many file downloads and all sorts of other activities that stretch them to their limits. After my main desktop had been up for about 4 days, it got real slow (like slower than 1/10 of normal speed), the OOM killed kicked in, and I was sitting there wondering why my system was acting that way. It had plenty of RAM, but the OOM killer was kicking when it had no business doing so. So I rebooted it, and it acted fine...for about 4 days, then the problem returned. Bugged the snot out of me not knowing why.

So I put on my test engineer hat and ran all sorts of stress tests on the machine to see if I could reproduce the symptoms on purpose. After several months of this, I was able to recreate the problem at will and prove that my solution for it would work every time.

"Cache turnover" in this context is when a system has to tear down existing cache to create more cache space to support new file writes. Since the system is in a hurry to redeploy the RAM, it does not take the time to defragment the memory it is freeing. So over time, as more and more file writes occur, the cache turns over repeatedly. And the memory in which it resides keeps getting more and more fragmented. In my tests, I found that after the disk cache has turned over about 15 times, the memory becomes so fragmented that the system cannot tear down and then allocate the memory fast enough to keep the OOM killer from being triggered due to lack of free RAM in the system when a spike in memory demand occurs. Such a spike could be caused by executing something as simple as

find /dev /etc /home /opt /tmp /usr -xdev > /dev/null

On my system, that command creates a demand for about 50MB of new cache. That was what

free -mt 

shows, anyway.

The solution for this problem involves expanding on what you already discovered.

/bin/echo 3 > /proc/sys/vm/drop_caches
export CONFIG_COMPACTION=1
echo 1 > /proc/sys/vm/compact_memory

And yes, I totally agree that dropping cache will force your system to re-read some data from disk. But at a rate of once per day or even once per hour, the negative effect of dropping cache is absolutely negligible compared to everything else your system is doing, no matter what that might be. The negative effect is so small that I cannot even measure it, and I made my living as a test engineer for 5+ years figuring out how to measure things like that.

If you set up a cron job to execute those once a day, that should eliminate your OOM killer problem. If you still see problems with the OOM killer after that, consider executing them more frequently. It will vary depending on how much file writing you do compared to the amount of system RAM your unit has.

Symbology answered 3/9, 2020 at 21:24 Comment(2)
Thanks for the info. Your conclusions are the same as the conclusions I came to, although my solution (well, work-around) was based on monitoring the current fragmentation level to figure out when action is necessary, rather than running periodically from a cron-job. See details here: https://mcmap.net/q/625755/-how-to-mmap-a-large-file-without-risking-the-oom-killerDisini
We have an issue in the field where oom-reaper is killing processes. I can run sync; echo 3 > /proc/sys/vm/drop_caches and acquire ~200MB of memory when we're hovering around ~10MB free. I didn't think about fragmentation being the problem but I was quite certain the kernel wasn't doing a good enough job before unleashing the oom-reaper. I made a before/after video(youtu.be/GBtalyt10hg) of various memory info that I can't make sense of. I might try the above cmd when I get a system back down to ~10MB to see if I can get oom-reaper to reap.Semimonthly

© 2022 - 2024 — McMap. All rights reserved.