IBM example code, non re-entrant functions doesn't work in my system

Asked 16/1, 2020 at 17:4 Answered 17/1, 2020 at 3:32

I was studying re-entrancy in programming. On this site of IBM (really good one). I have founded a code, copied below. It's the first code that comes rolling down the website.

The code tries showing the issues involving shared access to variable in a non linear development of a text program (asynchronicity) by printing two values that constantly change in a "dangerous context".

#include <signal.h>
#include <stdio.h>

struct two_int { int a, b; } data;

void signal_handler(int signum){
   printf ("%d, %d\n", data.a, data.b);
   alarm (1);
}

int main (void){
   static struct two_int zeros = { 0, 0 }, ones = { 1, 1 };

   signal (SIGALRM, signal_handler); 
   data = zeros;
   alarm (1);
   while (1){
       data = zeros;
       data = ones;
   }
}

The problems appeared when I tried to run the code (or better, didn't appear). I was using gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) in default configuration. The misguided output doesn't occurs. The frequency in getting "wrong" pair values is 0!

What is going on after all? Why there is no problem in re-entrancy using static global variables?

Harlequinade answered 16/1, 2020 at 17:4 Comment(7)

Ensure that all compiler optimisation is disabled, and try again – Scandent 16/1, 2020 at 17:32

I supposed that... but which options would I change? I have no idea. :-( – Harlequinade 16/1, 2020 at 17:43

This looks like a programming question (stack overflow). It dose not seem well placed here. (Sorry, I with there were less sub-sites; it is so cut up. But that is the way that it is.) – Proclaim 16/1, 2020 at 18:53

The most simple re-entrant code is immutable. – Proclaim 16/1, 2020 at 18:58

At first moment, I think that the question would be related to the gcc and Linux environment. Evolving, for example, the scheduling of OS (executing more program text the after interruption signal before calling the handler routine), for example. – Harlequinade 17/1, 2020 at 12:41

Questions that are more about how the language, compiler, and hardware works, and concurrency in C, are better suited for Stack Overflow. Apparently this site does have C and GCC tags, but I suspect that most questions in those tags here are about running GCC, not about the behaviour of code compiled by it. (The opposite of Stack Overflow, which is 100% about programming, vs. this site is about Unix / Linux systems programming tools can run on). Agreed with @ctrl-alt-delor; this belongs on SO with tags like c, linux, signals, gcc, x86-64, and maybe concurrency or race-condition – Yestreen 18/1, 2020 at 7:11

You might want to change your defniition of data to volatile struct two_int { int a, b; } data; and try it again. FWIW your code works as intended both with and without the volatile for me at Online GDB – Whimwham 18/1, 2020 at 10:48

Looking at the godbolt compiler explorer (after adding in the missing #include <unistd.h>), one sees that for almost any x86_64 compiler the code generated uses QWORD moves to load the ones and zeros in a single instruction.

        mov     rax, QWORD PTR main::ones[rip]
        mov     QWORD PTR data[rip], rax

The IBM site says On most machines, it takes several instructions to store a new value in data, and the value is stored one word at a time. which might have been true for typical cpus in 2005 but as the code shows is not true now. Changing the struct to have two longs rather than two ints would show the issue.

I previously wrote that this was "atomic" which was lazy. The program is only running on a single cpu. Each instruction will complete from the point of view of this cpu (assuming there is nothing else altering the memory such as dma).

So at the C level it is not defined that the compiler will chose a single instruction to write the struct, and so the corruption mentioned in the IBM paper can happen. Modern compilers targeting current cpus do use a single instruction. A single instruction is good enough to avoid corruption for a single threaded program.

Upgrowth answered 16/1, 2020 at 17:57 Comment(4)

Try changing the data type from int to long long, and compile to 32bit. The lesson is that you never know if / when it will break. – Proclaim 16/1, 2020 at 18:57

that means, in my machine, the assignment of this two values is a atomic operation? (considering the compilation for x86_64 architecture) – Harlequinade 16/1, 2020 at 18:57

long long still compiles to one instruction for x86-64: 16-byte movdqa. Unless you disable optimization, like in your Godbolt link. (GCC's default is -O0 debug mode, which is full of store/reload noise and usually not interesting to look at.) – Yestreen 17/1, 2020 at 3:47

I changed the type to "long long" after reading all comments. The result was interesting: the waited results were achieved and, setting up some counters, it was able to improve others conceptions as how the rate of mismatched data is influenced by the rest of code. Thank you for all help! – Harlequinade 17/1, 2020 at 12:54

That's not really re-entrancy; you're not running a function twice in the same thread (or in different threads). You can get that via recursion or passing the address of the current function as a callback function-pointer arg to another function. (And it wouldn't be unsafe because it would be synchronous).

This is just plain vanilla data-race UB (Undefined Behaviour) between a signal handler and the main thread: only sig_atomic_t is guaranteed safe for this. Others may happen to work, like in your case where an 8-byte object can be loaded or stored with one instruction on x86-64, and the compiler happens to choose that asm. (As @icarus's answer shows).

See MCU programming - C++ O2 optimization breaks while loop - an interrupt handler on a single-core microcontroller is basically the same thing as a signal handler in a single threaded program. In that case the result of the UB is that a load got hoisted out of a loop.

Your test-case of tearing actually happening because of data-race UB was probably developed / tested in 32-bit mode, or with an older dumber compiler that loaded the struct members separately.

In your case, the compiler can optimize the stores out from the infinite loop because no UB-free program could ever observe them. data is not _Atomic or volatile, and there are no other side-effects in the loop. So there's no way any reader could synchronize with this writer. This in fact happens if you compile with optimization enabled (Godbolt shows an empty loop at the bottom of main). I also changed the struct to two long long, and gcc uses a single movdqa 16-byte store before the loop. (This is not guaranteed atomic, but it is in practice on almost all CPUs, assuming it's aligned, or on Intel merely doesn't cross a cache-line boundary. Why is integer assignment on a naturally aligned variable atomic on x86?)

So compiling with optimization enabled would also break your test, and show you the same value every time. C is not a portable assembly language.

volatile struct two_int would also force the compiler not to optimize them away, but would not force it to load/store the whole struct atomically. (It wouldn't stop it from doing so either, though.) Note that volatile does not avoid data-race UB, but in practice it's sufficient for inter-thread communication and was how people built hand-rolled atomics (along with inline asm) before C11 / C++11, for normal CPU architectures. They're cache-coherent so volatile is in practice mostly similar to _Atomic with memory_order_relaxed for pure-load and pure-store, if used for types narrow enough that the compiler will use a single instruction so you don't get tearing. And of course volatile doesn't have any guarantees from the ISO C standard vs. writing code that compiles to the same asm using _Atomic and mo_relaxed.

If you had a function that did global_var++; on an int or long long that you run from main and asynchronously from a signal handler, that would be a way to use re-entrancy to create data-race UB.

Depending on how it compiled (to a memory destination inc or add, or to separate load/inc/store) it would be atomic or not with respect to signal handlers in the same thread. See Can num++ be atomic for 'int num'? for more about atomicity on x86 and in C++. (C11's stdatomic.h and _Atomic attribute provides equivalent functionality to C++11's std::atomic<T> template)

An interrupt or other exception can't happen in the middle of an instruction, so a memory-destination add is atomic wrt. context switches on a single-core CPU. Only a (cache coherent) DMA writer could "step on" an increment from a add [mem], 1 without a lock prefix on a single-core CPU. There aren't any other cores that another thread could be running on.

So it's similar to the case of signals: a signal handler runs instead of the normal execution of the thread handling the signal, so it can't be handled in the middle of one instruction.

Yestreen answered 17/1, 2020 at 3:32 Comment(1)

I was impelled to accept yours as the best answer, despite of the Icaru's answer being sufficient to me. The clear concepts you told us give me a bucket of topics to study all this day (and further). In fact, I've got hardly what you write in the first two paragraphs at first glance. Thank you! If you public articles on the internet about computers and programming, give us the link! – Harlequinade 17/1, 2020 at 13:58

Recommended topics

Hot tags