(C/C++) Why is it in/valid to synchronize a single reader and a single writer with a global variable?
Asked Answered
T

7

13

Let's assume there is a data structure like a std::vector and a global variable int syncToken initialized to zero. Also given, exactly two threads as reader/writer, why is the following (pseudo) code (in)valid?

void reader_thread(){
    while(1){
        if(syncToken!=0){
            while(the_vector.length()>0){
                 // ... process the std::vector 
            }
            syncToken = 0;  // let the writer do it's work
        }
        sleep(1);
    }
}

void writer_thread(){
    while(1){
        std::string data = waitAndReadDataFromSomeResource(the_resource);
        if(syncToken==0){
            the_vector.push(data);
            syncToken = 1;  // would syncToken++; be a difference here?
        }
        // drop data in case we couldn't write to the vector
    }
}

Although this code is not (time-)efficient, as far as I can see the code is valid, because the two threads only synchronize on the global variable value in a way such that no undefined behaviour could result. The only problem might occur at using the vector concurrently, but this shouldn't happen because of only switching between zero and one as a synchronization value, right?

UPDATE Since I made the mistake of asking just a yes/no question, I updated my question to why in hope of getting a very specific case as an answer. It also seems that the question itself draws the wrong picture based on the answers so I'll elaborate more on what my problem/question is with above code.

Beforehand, I want to point out that I'm asking for a specific use case/example/proof/detailed explanation which demonstrates exactly what goes out of sync. Even a C example code which let a an example counter behave non monotonic increasing would just answer the yes/no question but not why! I'm interested in the why. So, if you provide an example which demonstrates that it has a problem I'm interested in the why.

By (my) definition above code shall be named synchronized if and only if the code within the if statement, excluding the syncToken assignment at the bottom of the if block, can only be executed by exactly one of those two given threads at a given time.

Based on this thought I'm searching for a, maybe assembler based, example where both threads execute the if block at the same time - meaning they are out of sync or namely not synchronized.

As a reference, let's look at the relevant part of assembler code produced by gcc:

; just the declaration of an integer global variable on a 64bit cpu initialized to zero
syncToken:
.zero   4
.text
.globl  main
.type   main, @function

; writer (Cpu/Thread B): if syncToken == 0, jump not equal to label .L1
movl    syncToken(%rip), %eax
testl   %eax, %eax
jne .L1

; reader (Cpu/Thread A): if syncToken != 0, jump to Label L2
movl    syncToken(%rip), %eax
testl   %eax, %eax
je  .L2

; set syncToken to be zero
movl    $0, syncToken(%rip)

Now my problem is that, I don't see a way why those instructions can get out of sync.

Assume both threads run on their own CPU core like Thread A runs on core A, Thread B runs on core B. The initialization is global and done before both threads begin execution, so we can ignore the initialization and assume both Threads start with syncToken=0;

Example:

  1. Cpu A: movl syncToken(%rip), %eax
  2. Cpu A: context switch (saving all registers)
  3. Cpu B: movl syncToken(%rip), %eax
  4. Cpu B: testl %eax, %eax
  5. Cpu B: jne .L1 ; this one is false => execute writer if block
  6. Cpu B: context switch
  7. Cpu A: context switch to thread (restoring all registers)
  8. Cpu A: testl %eax, %eax
  9. Cpu A: je .L2 ; this is false => not executing if block

Honestly I've constructed an example which works well, but it demonstrates that I don't see a way why the variable should go out of sync such that both threads execute the if block concurrently. My point is: although the context switch will result in an inconsistency between %eax and the actual value of syncToken in RAM, the code should do the right thing and just not execute the if block if it is not the single only thread allowed to run it.

UPDATE 2 It can be assumed that syncToken will only be used like in the code as shown. No other function (like waitAndReadDataFromSomeResource) is allowed to use it in any way

UPDATE 3 Let's go one step further by asking a slight different question: Is it possible to synchronize two threads, one reader, one writer using an int syncToken such that the threads won't go out of sync all time by executing the if block concurrently? If yes - that's very interesting ^^ If no - why?

Transcendental answered 30/11, 2015 at 11:40 Comment(12)
Which is the type of syncToken ? std::atomic ?Muumuu
The word "valid" is too broad. Using such a design is not very object oriented. A more "valid" approach would be to create a blocking queue object like so: #12805541Bach
@Muumuu it's int as noted in the first sentence of my (unedited part of the) question - i understand that if i do it right (using std::atomic) i can be sure that it will be right. My question is: if above pseudo code is wrong, why? I need a proof/exact example/detailed explanation of what goes wrong.Transcendental
@Shloim right, seems my question was not specific enough - ive added an update to my questionTranscendental
@JohnDoe: Two threads accessing a variable where one is a write and another is a read results in a race-condition which is UB per standard.Muumuu
Related to Is it dangerous to read global variables from separate threads at potentially the same time?Muumuu
@Muumuu if i understand you correctly, you just refer to a definition of undefined behaviour and say that's the case here. It's okay for me if my example describes a definition, so let's move away from the definition and try to answer the question if it's possible that both thread's execute the if block at the same time - even the definition of UB holds, i wanna know exactly if this can really happen. I know i'm playing with fire here and i won't use this in production code. But i'm curious if this can really result in actual problems.Transcendental
@Muumuu i've already found your link before, but this one too uses the definition of UB and states it as "please don't do it". So it just answers the yes/no part of my question but now exactly why it may cause problems. I think the definition of UB goes a bit too far to be on the safe side which is perfectly valid and since my code is highly inefficient no one should use/do it this way - but why?Transcendental
Compilers may do some optimizations as don't modify syncToken...Muumuu
What does waitAndReadDataFromSomeResource do? You need to specify in particular the wait part. Does that thread block until data can be read from the source?Nighttime
Please someone tell me where I am wrong: this entire discussion seems to me completely pointless given that the int we are talking about is not even volatile to begin with. If it is not volatile, nothing will work. Does this really need an explanation?Overdose
@MikeNakis Yes, this needs explanation because there are many details involved as Anders and TomTanner pointed out.Transcendental
H
12

Short answer: No, this example is not properly synchronized and will not (always) work.

For software it is generally understood that working sometimes but not always is the same thing as broken. Now, you could ask something like "would this work for synchronizing an interrupt controller with the foreground task on an ACME brand 32-bit micro-controller with XYZ compiler at optimization level -O0" and the answer might certainly be yes. But in the general case, the answer is no. In fact, the likelihood of this working in any real situation is low because the intersection of the "uses STL" and "simple enough hardware and compiler to just work" is probably empty.

As other comments/answers have stated, it is also technically Undefined Behavior (UB). Real implementations are free to make UB work properly too. So just because it is not "standard" it may still work, but it will not be strictly conforming or portable. Whether it works depends on the exact situation, based heavily on the processor and the compiler, and perhaps also the OS.

What works

As your (code) comment implies, it is very possible that data will be dropped so this is presumed to be intentional. This example will have poor performance because the only time the vector needs to be "locked" is just when data is being added, removed, or length tested. However reader_thread() owns the vector until it is done testing, removing and processing all of the items. This is longer than desired, so it is more likely to drop data than it otherwise would need to be.

However, as long as variable accesses were synchronous and the statements occur in "naive" program order, the logic appears to be correct. The writer_thread() does not access the vector until it "owns" it (syncToken == 0). Similarly reader_thread() does not access the vector until it owns it (syncToken == 1). Even without atomic writes/reads (say this was a 16 bit machine and syncToken was 32 bits), this would still "work".

Note 1: the pattern if(flag) { ... flag = x } is a non-atomic test-and-set. Ordinarily this would be a race condition. But in this very specific case, that race is side-stepped. In general (e.g. more than one reader or writer) that would be a problem too.

Note 2: syncToken++ is less likely to be atomic than syncToken = 1. Normally this would be another bellwether of misbehavior because it involves a read-modify-write. In this specific case, it should make no difference.

What goes wrong

  1. What if the writes to syncToken are not synchronous with the other threads? What if writes to syncToken are to a register and not to memory? In this case the likelihood is that reader_thread() will never execute at all because it will not see syncToken set. Even though syncToken is a normal global variable, it only might be written back to memory when waitAndReadDataFromSomeResource() is called or just randomly when register pressure happens to be high enough. But since the writer_thread() function is an infinite while loop and never exits, it is also entirely possible that it never happens. To workaround this, syncToken would have to be declared as volatile, forcing every write and read to go to memory.

    As other comments/answers mentioned, the possibility of caching may be a problem. But for most architectures in normal system memory, it would not be. The hardware will, via cache coherency protocols like MESI, insure that all caches on all processors maintain coherency. If syncToken is written to L1 cache on processor P1, when P2 tries to access the same location, the hardware insures the dirty cache line from P1 will be flushed before P2 loads it. So for normal cache-coherent system memory this is probably "OK".

    However, this is scenario not entirely far fetched if the writes were to device or IO memory where caches and buffers are not automatically synchronized. For example, the PowerPC EIEIO instruction is required to synchronize external bus memory, and PCI posted writes may be buffered by bridges and must be flushed programatically. If either the vector or syncToken were not stored in normal cache-coherent system memory, this could also cause a synchronization problem.

  2. More realistically, if synchronization isn't the problem, then re-ordering by the compiler's optimizer will be. The optimizer can decide that since the_vector.push(data) and syncToken = 1 have no dependency, it is be free to move the syncToken = 1 first. Obviously that breaks things by allowing reader_thread() to be messing with vector at the same time as writer_thread().

    Simply declaring syncToken as volatile would not be enough either. Volatile accesses are only guaranteed to be ordered against other volatile accesses, but not between volatile and non-volatile accesses. So unless the vector was also volatile, this will still be a problem. Since vector is probably an STL class, it is not obvious that declaring it volatile would even work.

  3. Presume now that synchronization issues and the compiler optimizers have been beaten into submission. You review the assembler code and see clearly that everything now appears in the proper order. The final problem is that modern CPUs have a habit of executing and retiring instructions out-of-order. Since there is no dependency between the last instruction in whatever the_vector.push(data) compiles into and syncToken = 1, then the processor can decide to do the movl $0x1, syncToken(%rip) before other instructions that are part of the_vector.push(data) have finished, for example, saving the new length field. This is regardless of what the order of the assembly language opcodes appear to be.

    Normally the CPU knows that instruction #3 depends on the result of instruction #1 so it knows that #3 must be done after #1. Perhaps instruction #2 has no dependency on either and could be before or after either of them. This scheduling occurs dynamically at runtime based on whatever CPU resources are available at the moment.

    What goes wrong is that there is no explicit dependency between the instructions that access the_vector and those that access syncToken. Yet the program still implicitly requires them to be ordered for correct operation. There is no way for the CPU to know this.

    The only way to prevent the reordering would be to use a memory fence, barrier, or other synchronizing instruction specific to the particular CPU. For example, the intel mfence instruction or PPC sync could be inserted between touching the_vector and syncToken. Just which instruction or series of instructions, and where they are required to be placed is very specific to the CPU model and situation.

At the end of the day, it would be much easier to use "proper" synchronization primitives. Synchronization library calls also handle placing compiler and CPU barriers in the right places. Furthermore, if you did something like the following, it would perform better and not need to drop data (although the sleep(1) is still dodgey - better to use a condition variable or semaphore):

void reader_thread(){
    while(1){
        MUTEX_LOCK()
        if(the_vector.length()>0){
            std::string data = the_vector.pop();
            MUTEX_UNLOCK();

            // ... process the data
        } else {
            MUTEX_UNLOCK();
        }
        sleep(1);
    }
}

void writer_thread(){
    while(1){
        std::string data = waitAndReadDataFromSomeResource(the_resource);
        MUTEX_LOCK();
        the_vector.push(data);
        MUTEX_UNLOCK();
    }
}
Horologium answered 5/12, 2015 at 4:7 Comment(0)
S
13

The basic problem is you are assuming updates to syncToken are atomic with updates to the vector, which they aren't.

There's no guarantee that on a multi core CPU these two threads won't be running on different cores. And there's no guarantee of the sequence in which memory updates get written to main memory or that cache gets refreshed from main memory.

So when in the read thread you set syncToken to zero, it could be that the writer thread sees that change before it sees the change to the vector memory. So it could start pushing stuff to an out of date end of the vector.

Similarly, when you set the token in the writer thread, the reader may start accessing an old version of the contents of the vector. Even more fun, depending on how the vector is implemented, the reader might see the vector header containing an old pointer to the contents of the memory

void reader_thread(){
    while(1){
        if(syncToken!=0){
            while(the_vector.length()>0){
                 // ... process the std::vector 
            }
            syncToken = 0;  // let the writer do it's work
        }
        sleep(1);

This sleep will cause a memory flush as it goes to the OS, but there's no guarantee of the order of the memory flush or in which order the writer thread will see it.

    }
}

void writer_thread(){
    while(1){
        std::string data = waitAndReadDataFromSomeResource(the_resource);

This might cause a memory flush. On the other hand it might not.

        if(syncToken==0){
            the_vector.push(data);
            syncToken = 1;  // would syncToken++; be a difference here?
        }
        // drop data in case we couldn't write to the vector
    }
}

Using syncToken++ would (in general) not help, as that performs a read/modify/write, so if the other end happens to be doing a modification at the same time, you could get any sort of result out of it.

To be safe you need to use memory synchronisation or locks to ensure memory gets read/written in the correct order.

In this code, you would need to use a read synchronisation barrier before you read syncToken and a write synchronisation barrier before you write it.

Using the write synchronisation ensures that all memory updates up to that point are visible to main memory before any updates afterwards are - so that the_vector is appropriately updated before syncToken is set to one.

Using the read synchronisation before you read syncToken will ensure that what is in your cache will be correct with main memory.

Generally this can be rather tricky to get right, and you'd be better off using mutexes or semaphores to ensure the synchronisation, unless performance is very critical.

As noted by Anders, the compiler is still free to re-order access to syncToken with accesses to the_vector (if it can determine what these functions do, which with std::vector it probably can) - adding memory barriers will stop this re-ordering. Making syncToken volatile will also stop the reordering, but it won't address the issues with memory coherency on a multicore system, and it won't allow you to safely do read/modify/writes to the same variable from 2 threads.

Sunup answered 3/12, 2015 at 10:58 Comment(0)
H
12

Short answer: No, this example is not properly synchronized and will not (always) work.

For software it is generally understood that working sometimes but not always is the same thing as broken. Now, you could ask something like "would this work for synchronizing an interrupt controller with the foreground task on an ACME brand 32-bit micro-controller with XYZ compiler at optimization level -O0" and the answer might certainly be yes. But in the general case, the answer is no. In fact, the likelihood of this working in any real situation is low because the intersection of the "uses STL" and "simple enough hardware and compiler to just work" is probably empty.

As other comments/answers have stated, it is also technically Undefined Behavior (UB). Real implementations are free to make UB work properly too. So just because it is not "standard" it may still work, but it will not be strictly conforming or portable. Whether it works depends on the exact situation, based heavily on the processor and the compiler, and perhaps also the OS.

What works

As your (code) comment implies, it is very possible that data will be dropped so this is presumed to be intentional. This example will have poor performance because the only time the vector needs to be "locked" is just when data is being added, removed, or length tested. However reader_thread() owns the vector until it is done testing, removing and processing all of the items. This is longer than desired, so it is more likely to drop data than it otherwise would need to be.

However, as long as variable accesses were synchronous and the statements occur in "naive" program order, the logic appears to be correct. The writer_thread() does not access the vector until it "owns" it (syncToken == 0). Similarly reader_thread() does not access the vector until it owns it (syncToken == 1). Even without atomic writes/reads (say this was a 16 bit machine and syncToken was 32 bits), this would still "work".

Note 1: the pattern if(flag) { ... flag = x } is a non-atomic test-and-set. Ordinarily this would be a race condition. But in this very specific case, that race is side-stepped. In general (e.g. more than one reader or writer) that would be a problem too.

Note 2: syncToken++ is less likely to be atomic than syncToken = 1. Normally this would be another bellwether of misbehavior because it involves a read-modify-write. In this specific case, it should make no difference.

What goes wrong

  1. What if the writes to syncToken are not synchronous with the other threads? What if writes to syncToken are to a register and not to memory? In this case the likelihood is that reader_thread() will never execute at all because it will not see syncToken set. Even though syncToken is a normal global variable, it only might be written back to memory when waitAndReadDataFromSomeResource() is called or just randomly when register pressure happens to be high enough. But since the writer_thread() function is an infinite while loop and never exits, it is also entirely possible that it never happens. To workaround this, syncToken would have to be declared as volatile, forcing every write and read to go to memory.

    As other comments/answers mentioned, the possibility of caching may be a problem. But for most architectures in normal system memory, it would not be. The hardware will, via cache coherency protocols like MESI, insure that all caches on all processors maintain coherency. If syncToken is written to L1 cache on processor P1, when P2 tries to access the same location, the hardware insures the dirty cache line from P1 will be flushed before P2 loads it. So for normal cache-coherent system memory this is probably "OK".

    However, this is scenario not entirely far fetched if the writes were to device or IO memory where caches and buffers are not automatically synchronized. For example, the PowerPC EIEIO instruction is required to synchronize external bus memory, and PCI posted writes may be buffered by bridges and must be flushed programatically. If either the vector or syncToken were not stored in normal cache-coherent system memory, this could also cause a synchronization problem.

  2. More realistically, if synchronization isn't the problem, then re-ordering by the compiler's optimizer will be. The optimizer can decide that since the_vector.push(data) and syncToken = 1 have no dependency, it is be free to move the syncToken = 1 first. Obviously that breaks things by allowing reader_thread() to be messing with vector at the same time as writer_thread().

    Simply declaring syncToken as volatile would not be enough either. Volatile accesses are only guaranteed to be ordered against other volatile accesses, but not between volatile and non-volatile accesses. So unless the vector was also volatile, this will still be a problem. Since vector is probably an STL class, it is not obvious that declaring it volatile would even work.

  3. Presume now that synchronization issues and the compiler optimizers have been beaten into submission. You review the assembler code and see clearly that everything now appears in the proper order. The final problem is that modern CPUs have a habit of executing and retiring instructions out-of-order. Since there is no dependency between the last instruction in whatever the_vector.push(data) compiles into and syncToken = 1, then the processor can decide to do the movl $0x1, syncToken(%rip) before other instructions that are part of the_vector.push(data) have finished, for example, saving the new length field. This is regardless of what the order of the assembly language opcodes appear to be.

    Normally the CPU knows that instruction #3 depends on the result of instruction #1 so it knows that #3 must be done after #1. Perhaps instruction #2 has no dependency on either and could be before or after either of them. This scheduling occurs dynamically at runtime based on whatever CPU resources are available at the moment.

    What goes wrong is that there is no explicit dependency between the instructions that access the_vector and those that access syncToken. Yet the program still implicitly requires them to be ordered for correct operation. There is no way for the CPU to know this.

    The only way to prevent the reordering would be to use a memory fence, barrier, or other synchronizing instruction specific to the particular CPU. For example, the intel mfence instruction or PPC sync could be inserted between touching the_vector and syncToken. Just which instruction or series of instructions, and where they are required to be placed is very specific to the CPU model and situation.

At the end of the day, it would be much easier to use "proper" synchronization primitives. Synchronization library calls also handle placing compiler and CPU barriers in the right places. Furthermore, if you did something like the following, it would perform better and not need to drop data (although the sleep(1) is still dodgey - better to use a condition variable or semaphore):

void reader_thread(){
    while(1){
        MUTEX_LOCK()
        if(the_vector.length()>0){
            std::string data = the_vector.pop();
            MUTEX_UNLOCK();

            // ... process the data
        } else {
            MUTEX_UNLOCK();
        }
        sleep(1);
    }
}

void writer_thread(){
    while(1){
        std::string data = waitAndReadDataFromSomeResource(the_resource);
        MUTEX_LOCK();
        the_vector.push(data);
        MUTEX_UNLOCK();
    }
}
Horologium answered 5/12, 2015 at 4:7 Comment(0)
T
6

That program could have worked correctly about 20 years ago. Those days over and done with and are not likely to come back any time soon. People buy processors that are fast and consume little power. They don't buy the ones that give programmers an easier time writing code like this.

Modern processor design is an exercise in dealing with latency. The most severe latency problem by a long shot is the speed of memory. Typical RAM access time (the affordable kind) hovers around 100 nanoseconds. A modern core can easily execute a thousand instructions in that time. Processors are filled to the brim with tricks to deal with that huge difference.

Power is a problem, they cannot make processors faster anymore. Practical clock speeds topped out at ~3.5 gigahertz. Going faster requires more power and, beyond draining a battery too fast, there's an upper limit to how much heat you can effectively deal with. Having a thumbnail size sliver of silicon generate a hundred watts is where it stops getting practical. Only other thing that processor designers could do to make processor more powerful is by adding more execution cores. On the theory that you would know how to write code to use them effectively. That requires using threads.

The memory latency problem is addressed by giving the processor caches. Local copies of the data in memory. Sitting physically close to the execution unit and thus having less latency. Modern cores have 64 KB of L1 cache, the smallest and therefore the closest and therefore the fastest. A bigger and slower L2 cache, 256 KB typically. And a yet bigger and slower L3 cache, 4 MB typ that's shared between all the cores on the chip.

The caches still do squat if they don't have a copy of data stored in the memory location that your program needs. So processors have a prefetcher, a logical circuit that looks ahead in the instruction stream and guesses which locations will be required. In other words, it reads memory before your program uses it.

Another circuit deals with writes, the store buffer. It accepts a write instruction from the execution core so it doesn't have to wait for the physical write to be completed. In other words, it writes memory after your program updates it.

Perhaps you start seeing the bear-trap, when your program reads the syncToken variable value then it gets a stale value, one that easily mismatches the logical value. Another core could have updated it a handful of nanoseconds earlier but your program will not be aware of that. Producing a logical error in your code. Very hard to debug since it so critically depends on timing, nanoseconds.

Avoiding such undebuggable nasty bugs requires using fences, special instructions that ensure that the memory access is synchronized. They are expensive, they cause the processor to stall. They are wrapped in C++ by std::atomic.

They however can only solve part of the problem, note another undesirable trait of your code. As long as you can't obtain the syncToken, your code is spinning in the while-loop. Burning 100% core and not getting the job done. That's okay if it another thread isn't holding on to it for too long. It is not okay when it starts to take microseconds. You then need to get the operating system involved, it needs to put the thread on hold so another thread of another program can get some useful work done. Wrapped by std::mutex and friends.

Tetanus answered 8/12, 2015 at 13:3 Comment(3)
No need to discuss clock speed, it does not correlate to "speed". Plenty of 2ghz Quad core chips are than 4ghz octo-core chips. If you must discuss clock speed you must also discuss instructions per clock cycle and neither of these seem related to multi-threading. All the stuff about latency seems like a novel way to think about it though.Galingale
Meh, why processor vendors made their clock speed problem our programming problem is pretty relevant to this question.Tetanus
Clock speed was relevant to Intel's marketing and little else. Back when the Pentium 4 had high clock speed they advertised those numbers even though AMD chips were faster with a lower clock speeds. Now that my i7 with a 2ghz clock is faster than my AMD 4.7 GHZ chip I see other tactics (core count and nm) from intel marketing. "Clock speed" is just that, the speed of the clock and little else.Galingale
F
2

They say, that the reasons such c++ code is not thread safe are:

  1. Compiler may reorder instructions. (This was not the case, as you've demonstrated in assembler, but with different compiler setting the reordering might happen. To prevent the reordering, make syncToken a volatile.
  2. Processor's caches out of sync. The reader's thread CPU sees new syncToken, but old vector.
  3. The processor hardware might reorder the instructions. Plus the assembly instructions could be not atomic. But instead internally they could be a bunch of microcode that in turn could be reordered. That is, the assembly you saw could be different from actual microcode the cpu executes. So syncToken updaaaatiiing and vector updaaaatiiing could be mixed.

One can prevent all these following thread safe patterns.

On a particular CPU, or particular vendor, with particular compiler your code may work fine. It may even work on all platforms that you target. But it is not portable.

Flatter answered 8/12, 2015 at 13:30 Comment(0)
N
1

Given

  • that syncToken is of type int and
  • you use syncToken!=0 and syncToken==0 as sync conditions (to say it in your terms) and
  • copy assignments syncToken = 1 and syncToken = 0 to update the sync conditions

the conclusion is

  • no, it is not valid

because

  • syncToken!=0, syncToken==0, syncToken = 1 and syncToken = 0 are not atomic

If you run enough tests you might encounter desynchronized effects in some of them.

C++ provides facilities in the STL library to deal with threads, mutex, tasks, etc. I recommend to read upon those. You are likely to find simple examples in the internet.


In your case (I think fairly similar) you could refer to this answer: https://mcmap.net/q/903619/-two-threads-using-a-same-variable

Nighttime answered 30/11, 2015 at 12:24 Comment(4)
I've made the mistake of just asking a yes/no question which maybe you answered correctly, but i don't understand why and i already know everything regarding synchronization you noted. So that's not what i inteded to ask - but it was my fault of not asking well. Maybe you can read my update of the question and provide a more deeper answer? Thanks for your time!Transcendental
I see. Can you tell us what waitAndReadDataFromSomeResource exactly does? Does it wait for IO to be ready and just reads the data? Or does it take into account the global variable sync to wait?Nighttime
I've added another update. SyncToken shall only be used by the threads in the given way. for waitAndRead... function i would answer: it just calls e.g. read() on some socket descriptor or such, BUT if you can/need to distinct cases where the function causes or not causes problems altough not using syncToken i would be interested in those cases too.Transcendental
I assume in the reader thread's while loop you also break after having processed the vector. Otherwise you loop infinitely I think.Nighttime
S
0

This type of synchronization is not the correct way. For example: To test this condition "syncToken==0" cpu might execute more than one assempbly language instructions in series,

MOV DX, @syncToken CMP DX, 00 ; Compare the DX value with zero JE L7 ; If yes, then jump to label L7

Similarly, to change value of syncToken variable cpu might execute more than one assembly language instructions in series.

In case of multithreading operating system may pre-empt(Context switch) threads while execution.

Now lets consider, Thread A, is executing this condition "syncToken==0" and OS switches the context as indicated below

assembly lang instr 1 assembly lang instr 2 Context switch to Thread B assembly lang instr 3 assembly lang instr 4

And Thread B, is executing this condition "syncToken=1" and OS switches the context as indicated below, assembly lang instr 1 assembly lang instr 2 assembly lang instr 3 Context switch to Thread A assembly lang instr 4

In this case value in variable syncToken may overlap. Which will cause problem.

Even if you make syncToken variable atomic and continue with this, Which is not good for best performance.

Hence, I would suggest using mutex for synchronization. Or as per the use you can go for reader writer lock.

Subbase answered 30/11, 2015 at 12:48 Comment(1)
Thanks for your example, but i don't understand your answer at the very end of your argumentation where you state: "which will cause problem". So my further question to you is this exact point of your's: How can cause it a problem in my given example? I've also updated my question to explain why i don't see the code behave wrongly in hope you can see my non understanding better. Many thanks for your time!Transcendental
B
0

You assume that the value of SyncToken is written to and read from memory even time you change it or read it. It is not. It is cached in the CPU and may not be written to memory.

If you consider this, the writer thread would think that SyncToken is 1 (since he set it that way) and the reader thread would think that SyncToken is 0 (since he set it that way) and no one will work until the CPU cache is flushed. (could take forever, who knows).

Defining it as volatile/atomic/interlocked would prevent this caching effect and cause your code to run the way you intended it to.

Edit:

Another thing you should consider is what happens to your code with out-of-order-execution. I could write about it myself but this answer covers it: Handling out of order execution

So, pitfall 1 is that the threads might stop working at some point, and pitfall 2 is that an out-of-order execution might cause SyncToken to be updated prematurely.

I would recommend using boost lockfree queue for such tasks.

Bach answered 3/12, 2015 at 10:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.