TTAS coherence issue?
Asked Answered
P

1

0

I am attending an OS course as part of my undergrad and I have encountered a frustrating bug that is only present when compiling with -O2/3 flags set.

System: x86
Compiler: GCC
Emulator: Bochs/Qemu

I'm maintaining critical sections using spin locks, a TTAS implementation.

static int
xchange(int*s)
{
    int val = LOCKED;
    /* Exchanging value at lock address with 1, returns the old value */
    asm volatile("xchg (%%eax), %%ebx" : "=b"(val) : "0"(val), "a"(s));
    return val;
}

void
TTAS(int *s)
{
    /* While lock is locked, do nothing */
    while(TRUE){
        while(*s == LOCKED){}
        
        /* If lock acquired  */
        if( xchange(s)  == UNLOCKED){
        return;
        }
    }
}

Now the bug occurs when the two threads work on a shared variable with a mix of conditional waits and locks. The threads believe they have acquired the lock, but a subsequent read return with the wrong(old) value. I tried wrapping the locks for printing out the last 'owner', but this added time causes the synchronization to hold. The last and current owner of the lock:Thread 2 racing itself

If I print the value of the lock after acquiring it.

TTAS(lock <int*>);
print(lock::val);
print(lock::val);

The first print '0', the second '1'.
If I swap the TTAS with a TAS, it seemingly works.

void
TAS(int *s)
{
    /* While lock is locked, do nothing */
        While( xchange(s)  != UNLOCKED){}
}

I am unable to determine what is causing this behaviour, and hope some of you could help me with the reasoning.\

EDIT: Corrected a wrong void return on xchange

Pinna answered 29/3, 2021 at 13:30 Comment(1)
xchange can't be void - you need it to return a value. godbolt.org/z/fY3cq43cK has a version that compiles, and uses volatile int *s and a fixed asm statement to make it work. lwn.net/Articles/793253 / Multithreading program stuck in optimized mode but runs normally in -O0 / How can I indicate that the memory *pointed* to by an inline ASM argument may be used?Mistymisunderstand
P
2

Ref comments below and guidance from Peter Cordes a correct solution would be:

#define UNLOCKED 0
#define LOCKED 1
#define TRUE 1


static int
xchg(int volatile *s) {
    int val = LOCKED;

    asm("xchg %0, %1" : "+m"(*s), "+r"(val)::"memory");
    return val;
}

void
TTAS_acquire(int volatile *s) {

    while(TRUE){
        while(*s == LOCKED){}
        if(xchg(s) == UNLOCKED){
            return;
        }
    }
}


void
TTAS_release(int volatile *s) {
    asm("":::"memory");
    *s = UNLOCKED;
}

EDIT: Guess I jumped the gun on this one! The solution presented seemed to fix the issues, but it was just the symptom. I take this back! Also changed the wrong return value, it was never void.

EDIT2: Rewrote the answer with Peter Cordes guidance, also including the release function see comments.

Pinna answered 29/3, 2021 at 14:21 Comment(22)
I think the actual problem is that you're missing a "memory" clobber in your asm statement to tell the compiler the pointed-to memory is modified, as well as producing the "=b" output. (BTW, there's no need to hard-code the registers like that! One normal way would be asm("xchg %0, %1" : "+m"(*s), "+r"(val) ); which would avoid the need for a memory clobber by making the memory operand itself visible to the compiler, and getting it to pick the addressing mode.)Mistymisunderstand
I also don't think "weak consistency* is a great description for non-volatile. More like no consistency, i.e. assume non-shared. Which is totally fine for a local variable! Forcing the compiler to store and reload val doesn't tell it that *s is modified. You should also use an ACCESS_ONCE on *s in the spin loop, i.e. do that through a volatile int *. lwn.net/Articles/793253Mistymisunderstand
Note the .L11: jmp .L11 infinite loop in GCC -O3 output with the code in this answer which you claim fixes the bug. godbolt.org/z/YfTfqW6Wh (After fixing the multiple errors like return in a void function.)Mistymisunderstand
@PeterCordes You are absolutely right, and thanks for pointing out the errors I put in the code there, I changed the return values to their proper return values. I looked at the objdump files and I'm still baffled this works, it seems that the only difference when having int volatile *s and val is the operations: 17: 87 02 xchg %eax,(%edx) 19: 89 44 24 0c mov %eax,0xc(%esp) 1d: 8b 44 24 0c mov 0xc(%esp),%eax VS e: 87 18 xchg %ebx,(%eax)\ 10: 89 d8 mov %ebx,%eax\ In xchgPinna
@PeterCordes I failed completely on the last comment, putting that code indented. Sry..Pinna
Comments are useless for posting code. Best to share a Godbolt link like I did (I guess with -m32 -O3 for 32-bit code.) But yes, int volatile *s in TTAS would fix the infinite-loop problem there; but if int volatile val makes any difference it's just luck / chance.Mistymisunderstand
Could you help me understand how this: godbolt.org/z/M5qexdEee is ensuring coherence, while: godbolt.org/z/WEdcYhdTP fails? Pardon all the edits, fresh to this site^^Pinna
I meant post your C source with GCC options that make it produce the asm you're actually getting, and comment one what changed in the source. (I can use the Godbolt "diff" pane to compare the asm, but since I understand how C compiles to asm it's probably easiest to see the C as well as asm.) Also, objdump -d on an un-linked .o is pretty much the least useful output, you didn't even use -drwC to make it show where call targets are actually going. call 42 <TTAS+0xa> is obviously not the real call target.Mistymisunderstand
@PeterCordes Working: godbolt.org/z/d7WWh7Pz3. Not working:godbolt.org/z/z6KMbqeWEPinna
Your "working" version still has an infinite loop in TTAS that execution will get into if the lock is initially locked. .L6: / jmp .L6. Exactly because while(*s == LOCKED) is a non-volatile access to *s.Mistymisunderstand
Also, you don't need -fno-unit-at-a-time - if there's any specific function you need to not inline, use __attribute__((noinline)). Or if you want to see the asm for a stand-alone version of a function, make it non-staticMistymisunderstand
@PeterCordes I hope this then:godbolt.org/z/x7e9P8exK?Pinna
Yeah, that happens to work, but is pointlessly over-complicated by storing/reloading var before/after the asm statement. You still didn't actually tell the compiler that the asm statement can modify memory, see my initial comments and Godbolt link on the question. It still happens to work fine with plain int var, because the only access to the secretly-modified memory is through volatile int *s.Mistymisunderstand
@PeterCordes Did I get it right? godbolt.org/z/Ma6oTEjTn . I see I have been extremely lucky with the timing of the intial code, but since I have, could you help me see what makes it run? Apparently every time I hit the first while, it has been false, but after that the print at on the value of s has changed, just not immediately.Pinna
Yes, that's exactly what my godbolt link was doing, so yes that's right. IDK why your earlier version was avoiding an infinite loop. Could it have been inlining into a larger function resulting in different code-gen?Mistymisunderstand
I think the key thing here is to understand exactly why your initial version was broken (in terms of not telling the compiler what's going on and what you need to happen), and you can already understand that from looking at the -O3 asm for it in isolation. Oh, but if you were trying to use this for synchronization, you still haven't used a "memory" clobber to get a compiler barrier and stop it reordering other memory accesses, if TTAS inlines. So it's safely taking the lock, but not safely keeping the critical section contained between lock and unlock; volatile doesn't do that.Mistymisunderstand
right right, so that in order to actually guard this, now ordered execution, I'd have to include something like: godbolt.org/z/E74W7cabs? Or do I need something like mfence?Pinna
That compiler barrier is sufficient for taking the lock, you don't need mfence. xchg is already a full barrier. A lock only needs acquire/release semantics, so this is more than strong enough for taking the lock. But unlocking will need asm("" ::: "memory"); *s = UNLOCKED; (again with a volatile pointer). Like this: x86 spinlock using cmpxchgMistymisunderstand
@PeterCordes Would you agree that the edited answer is a potential solution? As I understand, the reason both the acquire and release need a barrier is the possibility for the compiler to schedule so that we could end up with an execution order such as: acquire(s) -> release(s) ->write_memory.?Pinna
Yup, that's correct. The asm("" ::: "memory") between the critical section's loads/stores and the unlock store makes it a release-store. (The x86 asm memory model already guarantees that only StoreLoad reordering is possible (minor oversimplification), so we only have to worry about compile-time ordering.)Mistymisunderstand
Thanks alot for keeping up with me this evening. You don't happen to have any good reads on the topic? Perhaps something to look into this storeLoad simplification on x86 memory handling? Thanks again, great help, learned alot today!Pinna
preshing.com/20120930/weak-vs-strong-memory-models - preshing's articles are great introductions. For more detail on the x86 memory model, have a look at Globally Invisible load instructions - it's actually program order + a store buffer with store forwarding. Also cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html - how C++11 std::atomic stuff maps to asm for different ISAs.Mistymisunderstand

© 2022 - 2024 — McMap. All rights reserved.