"terminate called without an active exception" after pthread_cancel
Asked Answered
C

1

8

In probing the conditions of this question, a problem arose, exemplified by the code below.

#include <iostream>
#include <thread>
#include <chrono>
#include <stdexcept>
#include <cxxabi.h>

using namespace std;

// mocking external library call stuck in a strictly user-land infinite loop
int buggy_function_simulation()
{
    // cout << "In buggy function" << endl; // (1)
    int counter = 0;
    while (true)
    {
        if ( ++counter == 1000000 ) { counter = 0; }
    }
    return 0;
}

int main(int argc, char **argv) {
    cout << "Hello, world!" << endl;

    auto lambda = []() {
        pthread_setcanceltype( PTHREAD_CANCEL_ASYNCHRONOUS, nullptr );
        // cout << "ID: "<<pthread_self() <<endl; // (2)
        try
        {
            cout << "ID: "<<pthread_self() <<endl; // (3)
            buggy_function_simulation();
        }
        catch ( abi::__forced_unwind& )
        {
            cout << "thread cancelled!" << endl; // (4)
            throw;
        }
    };

    std::thread th(lambda);

    pthread_t id = th.native_handle();
    cout << id << endl;

    this_thread::sleep_for(chrono::seconds(1));
    cout << "cancelling ID: "<< id << endl;

    pthread_cancel(id);
    th.join();

    cout << "cancelled: "<< id << endl;

    return 0;
}

Compiling and running results in an abort:

$ g++ -g -Og -std=c++11 -pthread -o test test.cpp -lpthread
$ ./test
Hello, world!
139841296869120
ID: 139841296869120
cancelling ID: 139841296869120
terminate called without an active exception
Aborted (core dumped)
$

Note that the diagnostic output (4) does not appear.

If I comment out (3) and uncomment (2), the result is:

$ ./test
Hello, world!
139933357348608
ID: 139933357348608
cancelling ID: 139933357348608
cancelled: 139933357348608
$

Again, the output at (4) does not appear (why?), but the abort has been obviated.

If, alternately, I retain (3), leave (2) commented out, and uncomment (1), the result is finally as expected:

$ ./test
Hello, world!
139998901511936
ID: 139998901511936
In buggy function
cancelling ID: 139998901511936
thread cancelled!
cancelled: 139998901511936
$

So, the questions are:

  • what is the reason for the "terminate called without an active exception" abort in the first case?
  • why is the catch block not activated in the second case?
  • why did uncommenting (1) in the third case make such a difference?

For completeness, here is the stack trace from gdb for the first case:

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f5d9b49a700 (LWP 12130))]
(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f5d9b879801 in __GI_abort () at abort.c:79
#2  0x00007f5d9bece957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f5d9bed4ab6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f5d9bed4af1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f5d9bed44ba in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f5d9bc3a708 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007f5d9bc3acfc in _Unwind_ForcedUnwind () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007f5d9c1dbf10 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
#9  0x00007f5d9c1d0d42 in __do_cancel () at ./pthreadP.h:297
#10 sigcancel_handler (sig=<optimized out>, si=0x7f5d9b499bb0, ctx=<optimized out>) at nptl-init.c:215
#11 <signal handler called>
#12 buggy_function_simulation () at test.cpp:15
#13 0x0000558865838227 in <lambda()>::operator() (__closure=<optimized out>) at test.cpp:29
#14 std::__invoke_impl<void, main(int, char**)::<lambda()> > (__f=...) at /usr/include/c++/7/bits/invoke.h:60
#15 std::__invoke<main(int, char**)::<lambda()> > (__fn=...) at /usr/include/c++/7/bits/invoke.h:95
#16 std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > >::_M_invoke<0> (this=<optimized out>)
    at /usr/include/c++/7/thread:234
#17 std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > >::operator() (this=<optimized out>)
    at /usr/include/c++/7/thread:243
#18 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > > >::_M_run(void) (
    this=<optimized out>) at /usr/include/c++/7/thread:186
#19 0x00007f5d9beff66f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007f5d9c1d26db in start_thread (arg=0x7f5d9b49a700) at pthread_create.c:463
#21 0x00007f5d9b95a88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Canton answered 28/11, 2019 at 6:33 Comment(11)
Maybe the reason is that the function invoked by std::thread must not terminate with an (uncaught) exception?Cacophonous
The reported error apparently says the opposite, that throw() was called without an (active) exception to throw - which caused std::terminate() to be invoked instead. The implementation of pthread_cancel() in Linux involves throwing an exception of type abi::__forced_unwind, which must not be swallowed in a catch block.Canton
it looks like even with PTHREAD_CANCEL_ASYNCHRONOUS your thread needs a cancelation pointPattypatulous
The point is what to do when a cancellation point is not available. The use case is a thread stuck in a third-party computational library - an infinite loop due to a terminating threshold condition not being met, with no external calls - that needs to be killed off.Canton
@Canton Can't you file a bugreport with the library supplier? Getting stuck in an infinite loop doesn't sound correct - given that you've only given the library valid data to work with that is.Ptolemaist
@TedLyngmo: Numerical methods may have boundary cases that cause the algorithm to diverge instead of converging. It is a matter of opinion whether the numerical method itself should expend cycles detecting divergent conditions at each iteration (slowing down the common convergent cases) or if the caller should infer if the input is such a boundary case. Detecting a long execution time can be a good heuristic to detect divergence.Hetaera
@Hetaera True. For methods risking that behaviour I've seen ways to abort in a controlled manner by using a function in the API itself. It's usually enough that the internal algorithm checks the abort condition in one of the outer loops so that doesn't become noticeable when it comes to performance. The library supplier may even have workarounds for this problem already in place, so asking them could be a good start.Ptolemaist
@Canton What is the name of the library you are using?Ptolemaist
@TedLyngmo: Diverge was too strong a term for me to use. I really meant non-converging, implying an infinite execution. I would expect any decent numerical method library to detect overflow or underflow.Hetaera
The library is the proprietary product of another group within a fairly large (financial) organization; more significantly, on the other side of a classic quant-vs-tech divide. Which accounts for (a) the lack of useful documentation, and (b) the absence of a formal bug report mechanism (especially from the other side). The bug report will be noticed at some point in the future, but probably not nearly quick enough for our project timelines. I was hoping to use pthread_cancel as a workaround (while mindful of the costs), but the behavior in the example code is inscrutable.Canton
In fact, I'm not convinced there isn't some subtle bug in the implementation of pthread_cancel in the innards of libc and/or libstdc++. I wish I knew more about how that stuff works (or is supposed to work).Canton
S
11

That message can be triggered if you throw from inside a function marked noexcept. All destructors are implicitly noexcept, so if the thread is running a destructor when the exception triggered by pthread_cancel is thrown, your program will terminate and you will get that message.

operator<< for std::cout is a formatted output operation, which constructs a sentry object, which is destructed on exit (see https://en.cppreference.com/w/cpp/named_req/FormattedOutputFunction). If the cancel comes while the destructor of the sentry object is being processed, this will thus terminate your application.

Do not use PTHREAD_CANCEL_ASYNCHRONOUS in C++. Even using pthread_cancel at all can be problematic due to the automatic rethrow from catch clauses.

UPDATE:

pthread_cancel is a POSIX C function, intended to work with C code. It has two modes of operation: synchronous and asynchronous.

Synchronous use of pthread_cancel sets an internal flag on the target thread which is then check in certain functions marked as cancellation points in the POSIX documentation. If any of those functions are called by the target thread, then cancellation is triggered. On Linux this is done by raising a special exception using the C++ exception mechanism that cannot be caught and discarded. This triggers stack unwinding, calls C++ destructors, and runs code registered with pthread_cleanup_push. This is compatible with normal C++ code, assuming nothing tries to catch and discard the exception. If all catch blocks rethrow, then everything works as expected. If the cancellation starts inside a function marked noexcept (such as a destructor, which is noexcept by default), then the program will terminate.

Asynchronous use of pthread_cancel is different. This sends a special signal to the target thread which interrupts it at any arbitrary point and starts the stack unwinding process described above. This is much more dangerous, as the code may be in the middle of evaluating any arbitrary expression, so the state of the application's data is much less well defined.

If you use asynchronous cancellation with code that has been designed to support it, then this can be OK. It may be possible to make code async-cancel-safe through careful use of pthread_setcancelstate to disabled cancellation in specific regions, and use of pthread_cleanup_push to register cancellation cleanup handlers, but this cannot be done in all cases.

With synchronous cancellation, if a function declared noexcept does not call any cancellation point functions, then all is well. With asynchronous cancellation, all code is a potential cancellation point, so before entering any code that is marked noexcept, you must call pthread_setcancelstate to temporarily disable cancellation, otherwise if the cancellation signal is received while that function is running then terminate will be called due to the cancellation exception. As noted above, this includes all destructors which are not explicitly marked noexcept(false).

Consequently, any call to arbitrary C++ library code (which may therefore construct C++ objects with destructors) is a potential hazard when using asynchronous cancellation, and you must call pthread_setcancelstate to disable cancellation around any block of code which creates C++ objects with destructors, and/or calls into C++ library code out of your control (such as standard library functions).

Salverform answered 28/11, 2019 at 8:52 Comment(14)
Are you saying there is no way to kill off a worker thread stuck in an infinite loop (e.g. an iterative "solver" routine, such as a bond yield calculation, unable for whatever reason to reach a terminating threshold condition without ever going through a cancellation point) without killing the entire program?Canton
No, I am saying that you need to design your code to support cooperative cancellation rather than using pthread_cancel. C++20 adds std::stop_source and std::stop_token which can be used for this purpose, but you can do it yourself. The simplest method is to use std::atomic<bool> stop_requested, which is then checked in your loop.Salverform
The thread is stuck in an external library which we have no control over. Cancelling/killing threads running wayward third-party code is a very common real-life occurrence for which the standard bromide seems to be "design your code better!". If only we could.Canton
If the code is designed to work with pthread_cancel, feel free to use it. Otherwise things won't work as expected anyway: globals will be left in an unspecified state, memory might be leaked, mutex locks unreleased, etc.Salverform
That's just it. I'm thinking of using pthread_cancel because the only cost might be a small memory leak (which would be "fixed" by the periodic orderly restart of the service program during "off hours"). But there seems to be other issues at play that are not being explained.Canton
@Canton How could you possibly stop a third party function in a sane way? How do you write "exception safe" code in that case?Piefer
In general, one can't. But computational routines (e.g., bond yield from price) are often pure functions or pretty close, no external calls or dependencies, no global state. The only cost of interrupting them might be a small memory leak, if they aren't exception safe. Our own wrapper code would be exception safe, of course. Isn't that sane enough?Canton
You could create a new process using a system call and then communicate with it using message passing instead of shared memory. That way you can kill it (almost) whenever you like and there is very little risk that doing so will affect your main process.Fachini
Spawning one or more entirely new processes for each service request would be quite impractical (and very poorly performant) for a high-volume server.Canton
One idea that might be useful is to pre-allocate your memory and then use placement new to circumvent automatic cleanup to prevent destructors from automatically running, and then use a queue for both object destruction requests and async thread signals to prevent both from colliding. Or, alternatively, just make destructors never run and re-model your code to take care of this in some other fashion.Skewer
It isn't clear at all that destructors are responsible for the aberrant behavior in my example code, and so need to be worked around. A clear explanation of how pthread_cancel is supposed to work, and therefore what could get in the way, and thus an explanation of the observed behavior, would be helpful. Otherwise, we're just floundering in the dark :-(Canton
Comment on the update: what happens if the cancellation signal is received outside of any noexcept zone? Won't destructors then run normally as part of the stack unwinding? Or do they still need to be of the noexcept(false) kind to prevent problems? In my example code, the signal is received in the middle of a trivial infinite loop. There are no destructors to run in the stack unwind, other than perhaps for std::thread, and no noexcept zones that I can see, and yet bizarre things happen. I'd really like to undesrtand why and how.Canton
Once propagation has started outside of a noexcept zone, destructors will run normally, since cancellation is implemented as a special type of exception.Salverform
It looks like the key is in the logic of the __gxx_personality_v0() function in libgcc_s.so, since the call to std::terminate() in the aborting case emerged from there. It was looking for something and didn't find it in that case, but found something it was looking for in the other two cases. But what? A mystery. :-(Canton

© 2022 - 2025 — McMap. All rights reserved.