SEH Equivalent in Linux or How do I handle OS Signals (like SIGSERV) and yet keep continuing
Asked Answered
L

2

11

I am currently working on a Unit Testing framework where users can create Test Cases and register with the framework.

I would also like to ensure that if any of the User Test Code causes a Crash, it should not Crash the entire framework but should be flagged as failed. To Make this work, I wrote up the following Code so that I can run the Users Code within the Sandbox function

bool SandBox(void *(*fn)(void *),void *arg, void *rc)
{
#ifdef WIN32
    __try
    {
        if (rc)
            rc = fn(arg);
        else
            fn(arg);
        return true;
    }
    __except (EXCEPTION_EXECUTE_HANDLER)
    {
        return false;
    }

#else
#endif
}

This works perfectly on Windows, but I would like my framework to be portable and in order to be so, I would like to ensure a similar functionality for posix environments.

I know C Signal Handlers can intercept an OS Signal, but to translate the Signal Handling Mechanism to an SEH framework has certain challenges that I am unable to solve

  1. How to continue execution even when my program receives a signal?
  2. How to Jump the control of execution from the failed location to a block (similar to except) that I can use for error handling?
  3. How can I clean-up the resources?

Another possibility I was thinking in running the User Test Code on a separate thread with its own signal handler and terminate the thread from the signal handler, but again not sure whether this can possibly work.

So before I think beyond, I would like the help of the community if they are aware of a better solution to tackle this problem/situation.

Leonie answered 14/8, 2014 at 9:57 Comment(0)
L
7

As you said, you could catch SIGSEGV via signal() or sigaction().

Continuing is not really advisable, as this would be undefined behaviour, i.e. your memory might be corrupted, which might let other test cases fail as well (or even terminate your whole process prematurely).

Would it be possible to run the test cases one by one as a sub process? This way, you could check the exit status and will detect if it terminated cleanly, with an error or due to a signal.

Running the test cases in a separate thread will have the same problem: you do not have memory protection between your test cases and the code driving the test cases.

The suggested approach would be:

fork() to create a child process.

In the child process, you execve() your test case. This could be the same binary with different arguments to select a certain test case).

In the parent process, you call waitpid() to wait for the termination of the test case. You received the pid from the fork() call in the parent process.

Evaluate the sub-process status with the WIFEXITED, WEXITSTATUS, WIFSIGNALED, WTERMSIG macros.

If you need timeouts for your test cases, you can also install a handler for SIGCHLD. If the timeout elapses first, kill() the child process. Be aware that you may only call certain functions from signal handlers.

Just a further note: execve() is not really required. You can just proceed and call your specified testcase directly.

Lucas answered 14/8, 2014 at 10:9 Comment(7)
Your requirement of portability will be difficult to fulfill - or needs more encapsulation at least.Lucas
Thank you for your answer. I like your suggested approach. It will take some time for me to test this, so wait for the community to comment after which I would possibly accept this solution. On a separate context, why do you think, using fork might not fulfill the portability?Leonie
I was just considering that you might want to stick with SEH and need to use a different approach (fork/exec/wait) on Linux/Unix. But this might be solvable by encapsulating the differences, i.e. how the test cases are executed.Lucas
Yes, your consideration is correct as that what how I am contemplating (SEH on Windows, fork on posix). Do you still foresee portability issue?Leonie
+1 for suggesting a multi-process approach... / "which might let other test cases fail as well" is less of a concern than having them pass improperly....Sextuple
@Leonie I don't see any further problems. But the problem with memory corruption remains with SEH on Windows. So maybe you could work with a subprocess there as well (and catch the SE in the subprocess).Lucas
@TonyD That's certainly true. I just assumed failing (with a segfault) after memory corruption is much more likely than passing, but this could indeed happen.Lucas
D
4

To complement sstn's answer, on Linux, you could have processor and system specific C code which:

  • installs a signal handler using sigaction(2) with SA_SIGINFO
  • use the third argument to that signal handler, it is a (machine specific) ucontext_t* pointer
  • analyze the machine specific context state (i.e. the machine registers mcontext_t* from that ucontext_t*) - see getcontext(3) for details; by "disassembling" the code pointer you will be able to know which operation failed and you can get the faulting address.

  • modify and repair that machine state, this means changing the process address space by calling mmap(2) and/or modify some machine registers thru that mcontext_t*

  • return from your signal handler into a "repaired" state, perhaps at a different instruction address.

This of course is non portable and painful to code and debug. You may need to disable some compiler optimizations, use asm instructions or volatile pointers, etc...

On Debian or Ubuntu see the /usr/include/x86_64-linux-gnu/sys/ucontext.h header fle.

IIRC some old version of SML/NJ played such tricks.

Read very carefully signal(7) and study the ABI specification for your processor, e.g. the x86-64 ABI specification


In practice, you might also use (more easily) siglongjmp(3) from the signal handler. You might also deliberately violate the signal(7) rules. You could use Ian Taylor (working on GCC at Google) libbacktrace library, it works better if your applications and its libraries have debug info (e.g. is compiled with g++ -O1 -g2). See also GNU libc backtrace(3) and dladdr(3)


Handling SIGEGV is rumored to be not very efficient on Linux. On GNU/Hurd you would use its external pager mechanism.


Another possibility is to run the tested program from the gdb debugger. Recent versions of gdb can be scripted in Python, so you could automate a lot of things. This might be practically the most portable approach (since recent gdb has been ported on many systems).

addenda

Recent (june 2016) 4.6 or future or patched kernels might be able to handle page faults in user space and notably userfaultfd; but I don't know much the details. See also this question.

Dinin answered 14/8, 2014 at 10:21 Comment(3)
The problem I see is that you only get a SIGSEGV for a memory access to an unmapped region (or read-only region), e.g. a null-pointer dereference. But there might have been other invalid memory accesses to mapped regions, which might have invalidated your whole program state.Lucas
Yes, you need to make some hypothesis on the possible fault causes.Dinin
Portability might be an issue here as I am targeting a larger subset of platforms including but not limited to AIX, HP, Solaris and zOS.Leonie

© 2022 - 2024 — McMap. All rights reserved.