Efficient way to toggle an atomic_bool

Asked 17/8, 2020 at 16:4 Answered 10/6, 2022 at 11:58

Solved c x86 atomic compiler-optimization stdatomic

If I have atomic_bool flag;, how can I write C code to toggle it that's atomic, portable, and efficient? Regarding "efficient", I'd like it to assemble on x86_64 to lock xorb $1, flag(%rip). The "obvious" flag = !flag; is out because it isn't actually atomic. My next guess would be flag ^= true;, which assembled to this mess on GCC:

        movzbl  flag(%rip), %eax
0:
        movb    %al, -1(%rsp)
        xorl    $1, %eax
        movl    %eax, %edx
        movzbl  -1(%rsp), %eax
        lock cmpxchgb   %dl, flag(%rip)
        jne     0b

And this mess on Clang:

        movb    flag(%rip), %al
0:
        andb    $1, %al
        movl    %eax, %ecx
        xorb    $1, %cl
        lock            cmpxchgb        %cl, flag(%rip)
        jne     0b

Then I tried specifying a weaker memory order by doing atomic_fetch_xor_explicit(&flag, true, memory_order_acq_rel); instead. This does what I want on Clang, but GCC now completely fails to compile it with error: operand type '_Atomic atomic_bool *' {aka '_Atomic _Bool *'} is incompatible with argument 1 of '__atomic_fetch_xor'. Interestingly, if my type is an atomic_char instead of an atomic_bool, then both GCC and Clang emit the assembly that I want. Is there a way to do what I want with atomic_bool?

Preraphaelite answered 17/8, 2020 at 16:4 Comment(18)

Note from standard: 7.17.7.5 xor is not applicable to atomic_bool. (I wonder if the "None of these operations is applicable to atomic_bool" part from the standard suggest that compilers should refuse to compile such code, making clang non-conformant here.) – Magenta 17/8, 2020 at 16:10

This is a deliberate choice in gcc. See bugzilla 68966 and bugzilla 68908. – Vanhorn 17/8, 2020 at 16:22

Is there a reason you can't use atomic_char flag;? – Ebsen 17/8, 2020 at 16:27

@Ebsen I can, but it feels like an awful hack, and then it'd be much easier for someone to mistakenly put something other than 0 or 1 in it, which would end badly. – Preraphaelite 17/8, 2020 at 16:30

Efficiency just doesn't have anything to do with the code generation. Count about 150 cycles for the lock and you'll rarely be disappointed. – Hairbrush 17/8, 2020 at 16:33

@JosephSible-ReinstateMonica gcc considers it not allowed per standard (even if there's a question about what "not applicable" C11, 7.17.7.5 means). You could instead use: typedef unsigned char mybool_t; and then you could use it with the gcc intrinsics. It's not ideal - having to create another "bool" type - but it could be a good enough workaround (At least, tThat's what I did when I had this problem in the past :) – Vanhorn 17/8, 2020 at 16:37

@JosephSible-ReinstateMonica Using _Bool isn't any safer than using char when it comes to "someone setting it to a value other than 0 or 1", just use atomic_char, and if it is a global, then provide set/get functions that accept a bool and write to a char. If you really think that it is a hack, remember that C bool type is just a hacked in typedef for _Bool, whatever that is, which is a hack that is here simply to keep backward compatibility with code that may have used "bool" with assumption that it isn't a keyword, and there's no point in worrying about such things anyway. – Alphonsoalphonsus 17/8, 2020 at 17:23

@Alphonsoalphonsus _Bool flag = 2; won't actually store a 2, but char flag = 2; will, so I disagree that _Bool isn't any safer. – Preraphaelite 17/8, 2020 at 17:31

@JosephSible-ReinstateMonica both are fundamentally wrong to do and it is a problem with the programmer doing it. Worrying about someone using 2 when there's no valid boolean value that can be mapped from 2 is like worrying about someone passing a NULL to api that clearly states it an undefined behaviour to do so. This is just how it is in C and it is too late to change that now. – Alphonsoalphonsus 17/8, 2020 at 20:17

@Yamirui: So you're arguing the bool foo = return_zero_or_nonzero(); is an error, and you should have written bool foo = (f() != 0); to explicitly booleanize, rather than rely on implicit conversion to bool? That's one style choice, but with unsigned char the compiler isn't going to warn you if you get it wrong. – Kratzer 2/6, 2022 at 17:13

If compilers suck at flag ^= 1;, that's a missed optimization on their part, and should get reported (github.com/llvm/llvm-project/issues and gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc). If the return value is unused, yes, lock xorb is optimal. And if it is used, lock btc $0, flag(%rip). – Kratzer 2/6, 2022 at 17:24

@PeterCordes Do the compilers actually suck at that, or is the mess they make necessary for that to be memory_order_seq_cst (which is what the standard requires for any operation where you don't specify an explicit one, even though I don't need it)? – Preraphaelite 2/6, 2022 at 17:28

lock xorb is a full barrier and more than sufficient for a seq_cst RMW, just like lock addl is safe for atomic_fetch_add. (Or lock xaddl if the return value is used.) x86 can't do atomic RMWs with anything less than a full memory barrier, so atomic_fetch_add_explicit for a relaxed integer add only allows compile-time reordering, still lock add or lock xadd in the asm, same as seq_cst. See The strong-ness of x86 store instruction wrt. SC-DRF? re: xchg or other locked insn being as strong as a full SC fence. – Kratzer 2/6, 2022 at 17:36

It's peculiar that the C standard does not require supporting atomic_fetch_xor and friends on atomic_bool, whereas AFAICT it does require supporting ^=. So it really doesn't save anything for the implementation, except that it deprives the programmer of their choice of memory ordering. – Bender 6/6, 2022 at 14:19

Anyway it seems clear from the comments that the answer is that you can't. From "atomic, portable, and efficient" you can pick two. There evidently isn't any portable way except for flag ^= 1 (or equivalents like flag -= 1), and gcc optimizes them poorly. You can get the better-optimized version, at cost of portability, with atomic_fetch_xor((atomic_uchar *)&flag, 1). Wrap in ifdef as needed. Do you want that in an answer, or are you holding out for a new idea out of left field? – Bender 6/6, 2022 at 14:46

@NateEldredge Yeah, at this point I'd accept a summary of the comments as an answer. Hopefully it will be useful to help convince the compiler authors to add such a way. – Preraphaelite 7/6, 2022 at 14:29

(Probably just changing what ^= compiles to, as @PeterCordes mentioned.) – Preraphaelite 7/6, 2022 at 14:30

Out of curiosity: what's the use of such an operation? I'm sure it's obvious but I can't figure it out :) It makes sense to lock xor a bitfield when different threads xor different bits. But if you let N threads atomically xor the same boolean flag won't the result just be N % 2? If that's the case, wouldn't be easier to use atomic_add and extract the LSb when needed? If the threads compete to set/reset the flag, wouldn't you need a form of synchronization to make sure that two threads trying to set the flag won't end up setting and resetting it? In that case you wouldnt need an atomic – Brachiate 7/6, 2022 at 20:41

Mainly summarizing comments:

It looks like the only portable way to atomically toggle your atomic_bool is flag ^= 1. But as you noted, gcc and clang don't know how to optimize it, and fall back to the cmpxchg loop. If you want full portability and compliance I think you just have to put up with that, until such time as they fix their missed optimization, which you might want to report.

In principle, another option should be flag -= 1 or flag += -1, which have the same truth table when nonzero values are treated as true. However, gcc compiles it to the same inefficient xor/cmpxchg code as flag ^= 1, and clang actually miscompiles it: when flag == 0, then flag -= 1 will set flag to 0xff which is invalid. It looks like this was reported several years ago but is still unfixed.

If you want a workaround, at least on x86 you should be able to do

atomic_fetch_xor((atomic_uchar *)&flag, 1);

I think it's okay for strict aliasing because atomic_uchar is a character type. In practice it is most likely fine anyway, because an atomic access shouldn't be optimized away. To be safe, check the assembly that is generated, or just go ahead and replace the whole thing with the appropriate one-liner of inline asm.

It's a nice touch that clang extends the atomic_fetch_* functions to work on atomic_bool, even though the C standard doesn't support it (7.17.7.5p1: "None of these operations is applicable to atomic_bool.") I don't really understand why the standards committee included that exception. All those operations still have to be available on atomic_bool via the compound assignment operators, so omitting them from atomic_fetch_* serves only to deprive the programmer of being able to use any weak memory ordering, without making life any easier for the implementation.

For similar reasons, I also don't understand why they didn't provide atomic_fetch_* for the remaining compound assignment operators. atomic_fetch_mul might not be that useful, but since *= has to work, it shouldn't cost the implementation anything to speak of, and the consistency would be nice.

Bender answered 10/6, 2022 at 11:58 Comment(1)

Maragaret Bloom commented that for most use-cases, atomic<uint8_t> with fetch_add should wok just as well: increment or decrement flips the low bit, and readers can test that cheaply. This even allows x86 lock xadd to do the fetch part of fetch_add. (Although with lock xor on a bool you also find out the old value via ZF, so atomic_bool for bool tmp = (flag ^= 1); could still compile efficiently without a CAS loop, just lock xor.) – Kratzer 10/6, 2022 at 20:28

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags