Is it worse in any aspect to use the CMPXCHG instruction on an 8-bit field than on a 32-bit field?

No, there's no penalty for lock cmpxchg [mem], reg 8 vs. 32-bit. Modern x86 CPUs can load and store to their L1d cache with no penalty for a single byte vs. an aligned dword or qword. Can modern x86 hardware not store a single byte to memory? answer: it can with zero penalty¹ because they spend the transistors to make even unaligned loads/stores fast.

The surrounding asm instructions dealing with a narrow integer in a register should also have negligible if any extra cost vs. [u]int32_t. See Why doesn't GCC use partial registers? - most compilers know how to be careful with partial registers, and modern CPUs (Haswell and later, and all non-Intel) don't rename the low 8 separately from the rest of the register so the only danger is false dependencies. Depending on exactly what you're doing, it might be best to use unsigned local temporaries with an _Atomic uint8_t, or it might be best to make your locals also uint8_t.

Footnote 1: Unlike on some non-x86 CPUs where a byte store actually is implemented with a cache RMW cycle (Are there any modern CPUs where a cached byte store is actually slower than a word store?). On those CPUs you'd hope that atomic xchg would be just as cheap for word vs. byte, but that's too much to hope for with cmpxchg. But almost all non-x86 ISAs have LL/SC instead of xchg / cmpxchg anyway, so even an atomic exchange is separate LL and SC instructions, and the SC would be take an RMW cycle to commit to cache.

Recommended topics

Hot tags