Return code when OOM killer kills a process
Asked Answered
J

1

10

I am running a multiprogrammed workload (based on SPEC CPU2006 benchmarks) on a POWER7 system using SUSE SLES 11.

Sometimes, each application in the workload consumes a significant amount of memory and the total memory footprint exceeds the available memory installed in the system (32 GB).

I disabled the swap since otherwise the measurements could be heavily affected for the processes using the swap. I know that by doing that the kernel, through the OOM killer, may kill some of the processes. That is totally fine. The problem is that I would expect that a thread killed by the kernel exited with an error condition (e.g., the process was terminated by a signal).

I have a framework that launches all the processes and then waits for them using

waitpid(pid, &status, 0);

Even if a thread is killed by the OOM killer (I know that since I get a message in the screen and in /var/log/messages), the call

WIFEXITED(status);

returns one, and the call

WEXITSTATUS(status);

returns zero. Therefore, I am not able to distinguish when a process finishes correctly and when it is killed by the OOM killer.

Am I doing anything wrong? Do you know any way to detect when a process has been killed by the OOM killer.

I found this post asking pretty much the same question. However, since it is an old post and answers were not satisfactory, I decided to post a new question.

Jesusa answered 24/8, 2011 at 19:13 Comment(1)
What about WIFSIGNALLED()? Is false?Coffer
A
10

The Linux OOM killer works by sending SIGKILL. If your process is killed by the OOM it's fishy that WIFEXITED returns 1.

TLPI

To kill the selected process, the OOM killer delivers a SIGKILL signal.

So you should be able to test this using:

if (WIFSIGNALED(status)) {
    if (WTERMSIG(status) == SIGKILL)
        printf("Killed by SIGKILL\n");
}
Arroba answered 24/8, 2011 at 19:30 Comment(2)
That is what I expected, but after trying that it did not work. WIFSIGNALED(status) returns zero for all the processes (including the ones that were killed). Any idea?Jesusa
@Victor Just for testing; try killing them with kill(..., SIGKILL). See what happens.Arroba

© 2022 - 2024 — McMap. All rights reserved.