I have a problem with the MS C compiler reordering certain statements, critical in a multithreading context, at high levels of optimization. I want to know how to force ordering in specific places while still using high levels of optimization. (At low levels of optimization, this compiler does not reorder statements)
The following code:
ChunkT* plog2sizeChunk=...
SET_BUSY(plog2sizeChunk->pPoolAndBusyFlag); // set "busy" bit on this chunk of storage
x = plog2sizeChunk->pNext;
produces this:
0040130F 8B 5A 08 mov ebx,dword ptr [edx+8]
00401312 83 22 FE and dword ptr [edx],0FFFFFFFEh
in which the write to pPoolAndBusyFlag is reordered by the compiler to occur after the pNext fetch.
SET_BUSY is essentially
plog2sizeChunk->pPoolAndBusyFlag&=0xFFFFFFFeh;
I think the compiler has rightfully decided it was OK to reorder these accesses because they are to two separate members of the same struct, and such reordering has no affect on the results of single-threaded execution:
typedef struct chunk_tag{
unsigned pPoolAndBusyFlag; // Contains pointer to owning pool and a busy flag
natural log2size; // holds log2size of the chunk if Busy==false
struct chunk_tag* pNext; // holds pointer to next block of same size
struct chunk_tag* pPrev; // holds pointer to previous block of same size
} ChunkT, *pChunkT;
For my purposes, the pPoolAndBusyFlag has to be set before other accesses to this structure are valid in a multithreaded/multicore context. I don't think this particular access is problematic for me, but the fact the compiler can reorder this means that other parts of my code may have the same kind of reordering but it may be critical in those places. (Imagine the two statements are updates to the two members rather than one write/one read). I want to be able force the order of the actions.
Ideally, I'd write something like:
plog2sizeChunk->pPoolAndBusyFlag&=0xFFFFFFFeh;
#pragma no-reordering // no such directive appears to exist
pNext = plog2sizeChunk->pNext;
I have experimentally verified I can get this effect in this ugly way:
plog2sizeChunk->pPoolAndBusyFlag&=0xFFFFFFFeh;
asm { xor eax, eax } // compiler won't optimize past asm block
pNext = plog2sizeChunk->pNext;
gives
0040130F 83 22 FE and dword ptr [edx],0FFFFFFFEh
00401312 33 C0 xor eax,eax
00401314 8B 5A 08 mov ebx,dword ptr [edx+8]
I note that the x86 hardware may reorder these particular instructions anyway since they don't refer to the same memory location, and reads may pass writes; to really fix this example, I'd need some type of memory barrier. Back to my earlier remark, if they were both writes, the x86 will not reorder them, and the write order will be seen in that order by other threads. So in that case I don't think I need a memory barrier, just a forced ordering.
I have not seen the compiler re-order two writes (yet) but I haven't been looking very hard (yet); I just tripped over this. And of course with optimizations just because you don't see it in this compilation doesn't mean it won't appear in the next.
So, how do I force the compiler to order these?
I understand I can declare the memory slots in the struct to be volatile. They are still independent storage locations, so I don't see how this prevents an optimization. Maybe I'm mis-interpreting what volatile means?
EDIT (Oct 20): Thanks to all the responders. My current implementation uses volatile (used as the initial solution), _ReadWriteBarrier (to mark the code where reordering shouldn't occur by the compiler), and a few MemoryBarriers (where reads and writes occur), and that seems to have solved the problem.
EDIT: (Nov 2): To be clean, I ended up defining sets of macros for ReadBarrier, WriteBarrier, and ReadWriteBarrier. There are sets for pre and post locking, pre and post unlocking, and general usage. Some of these are empty, some contain _ReadWriteBarrier and MemoryBarrier, as appropriate for the x86 and typical spin locks based on XCHG [XCHG includes an implicit MemoryBarrier thus obviating that need in lock pre-/post- sets). I then parked these in the code at appropriate places documenting the essential (non)reordering requirements.
&=
or is this code protected by a mutex or otherwise made thread-safe? Because if you are trying to synchronize threads using what basically amounts toif (!busy) { busy = true; DoStuff(); busy = false; }
then that is not going to be thread-safe no matter the order. – Korenblatvolatile:ms
flag (which happens to be the default for everything but ARM), hence my question. With that flag set volatiles behave just like they do on say the JMM or in .NET. I.e. writes have release and reads acquire semantics. – Snitchstd::atomic
is the way for c++) – Snitch