Can two sequential assignment statements in C be executed on hardware out of order?
Asked Answered
F

2

5

Given the following C program:

static char vals[ 2 ] = {0, 0};

int main() {

char *a = &vals[0];
char *b = &vals[1];

while( 1 ) {

    SOME_STUFF()

    // non-atomic operations in critical section
    if( SOME_CONDITION() )
        {
        *a = 1;
        *b = 2;
        }
    else
        {
        *a = 0;
        *b = 0;
        }


    SOME_OTHER_STUFF()

    }

return 0;
}

int async_interrupt( void ) {

PRINT( a );
PRINT( b );
}

Is it possible for the hardware to actually load the value 2 into the memory location &vals[1] first, such that an interrupt routine could execute and see vals[1] == 2 and vals[0] == 0?

If this is possible, any description of the load/store operations that would result in this scenario would be much appreciated.

EDIT 1: Added a little more context to the code section. Unfortunately, I don't have the machine code from the compiled source.

Futile answered 19/11, 2018 at 22:50 Comment(3)
Yes; it's also possible for the program to be optimized to int main() { return 0; } since it has no observable behaviourKempf
It would improve the question to post an example of the "interrupt routine" you ask about. It would be undefined behaviour if an interrupt routine tried to access a or b, generally speaking, so this question might be moot. Also, some platforms provide stronger guarantees for interrupt routines than Standard C does.Kempf
Also UB if the interrupt routine accessed vals[0] or vals[1]. (a and b are locals with automatic storage, so there's no good way for an interrupt to get them. Not sure what the point of them is.)Underpainting
P
4

Yes, it is possible because the compiler might re-order those statements as described in Peter's answer.

However, you might still be wondering about the other half: what hardware can do. Under the assumption that your stores end up in the assembly in the order you show in your source1, if an interrupt occurs on the same CPU that is running this code, from within the interrupt you'll see everything in a consistent order. That is, from within the interrupt handler, you'll never see the second store having completed, but the first not. The only scenarios you'll see are both not having completed, both completed or the first having completed and the second not.

If multiple cores are involved, and the interrupt may run on a different core, then you simply the classic cross-thread sharing scenarios, whether it is an interrupt or not - and what the other core can observe depends on the hardware memory model. For example, on the relatively strongly ordered x86, you would always observe the stores in order, where as on the more weakly ordered ARM or POWER memory models you could see the stores out of order.

In general, however, the CPU may be doing all sorts of reordering: the ordering you see within an interrupt handler is a special case where the CPU will restore the appearance of sequential execution at the point of handling the interrupt. The same is true of any case where a thread observes its own stores. However, when stores are observed by a different thread - what happens then depends on the hardware memory model, which varies a lot between architectures.


1 Assuming also that they show up separately - there is nothing stopping a smart compiler from noticing you are assigning to adjacent values in memory and hence transforming the two stores into a single wider one. Most compilers can do this in at least some scenarios.

Pegram answered 20/11, 2018 at 3:3 Comment(7)
I believe this is what I'm looking for "from within the interrupt handler, you'll never see the second store having completed, but the first not.". However, my limited understanding of architectures, I was thinking that second store could possibly happen first due to some hardware nuances. Maybe that isn't the case.Futile
@leo - an interrupt that actually interrupts the code in question (running on the same CPU that the code in question was running on) will always see a consistent view of the stores, just like code running on the CPU will see it's stores in source order. If another CPU is involved, then interrupt code (or any code really) running on that second CPU concurrently with the code doing the stores may see them out of order, depending on the hardware memory model.Pegram
@Rescissory - I'm not quite sure what you are referring to. If multiple threads are used and there is more than one CPU, and interrupts are not involved, then sure you can also see an inconsistent store order, depending on the hardware. Maybe you could clarify your question or create a separate one.Pegram
@Rescissory - I'm considering the case where an interrupt occurs as specified by the OP and show in their main() method. So I assume there are two execution contexts: the normal user context (single thread) running the stores, and the interrupt context, which may occur on the same CPU or a different CPU as the other context. When you ask about multiple threads, do you mean multiple threads running the store method? Multiple overlapping interrupts? It is not clear to me, since the original question doens't obviously involve threads.Pegram
@Pegram So you are saying that without any thread creation, the interrupt can cause the signal handler to run in a different execution context, on a different CPU/core, as if by another thread? That's new to me and strange.Rescissory
@Rescissory - no, I'm not trying to say that (systems that I'm aware of will deliver the signal to the only thread if there is only one). In fact, I'm trying to more or less sidestep the whole issue of signal handling semantics, and answer the question as asked, with any necessary caveats. Note that the OP didn't even talk about signals, but simply "interrupts". I don't know what type of system they are using, or how these interrupts are delivered. Originally I assumed the interrupt would actually interrupt the code in question, I wrote my answer that way (as in "you'll be fine").Pegram
However, it occurred to me that this was incomplete: in the scenario where the interrupt runs concurrently with the code doing the stores, it could certainly see them out of order. Multiple threads is an easy way to get interrupts (e.g., signals) concurrent with other code, but I doubt it is the only way. If you are interested in digging further into interrupt or signal handler semantics, I recommend another question where you present your specific query and include details of the OS and hardware that interests you.Pegram
U
10

C doesn't run on hardware directly. It has to be compiled first.

The specifics of undefined behaviour (like unsynchronized reads of non-atomic variables) totally depend on the implementation (including compile-time reordering in the compiler, and depending on the target CPU architecture, the runtime reordering rules of the that ISA).

Reads/writes of non-atomic variables are not considered an observable side-effect in C or C++, so they can be optimized away and reordered up to the limit of preserving the behaviour of the program as a whole (except when the program has undefined behaviour- optimizations can do anything in that case even if the compiler can't "see" there will be UB when it's compiling.)

See also https://preshing.com/20120625/memory-ordering-at-compile-time/

Underpainting answered 19/11, 2018 at 23:7 Comment(8)
As I commented in the thread under your answer: C only guarantees that causality applies within a single thread. The OP wants to read vals[0] and vals[1] from an interrupt handler, which runs asynchronously from the main thread so C doesn't guarantee anything about what it will find if it reads vals[0..1] without synchronization, and without that array being _Atomic. The whole point of _Atomic is to guarantee causality in cases like this that aren't synchronous single-threaded execution. Your answer arguing based on causality for non-atomic C variables is misleading at best.Underpainting
@EdwinBuck: Was about to reply in the thread under your answer- I think you missed that the OP really did say "such that an interrupt routine could ...". So yes, we are talking about a case that goes outside the bounds of what C's as-if rule requires any code-transformations to preserve. Since you deleted your answer, I'm guessing you noticed that in the question now :)Underpainting
Peter, if we assume that the C is compiled and linked such that the machine code is not optimized, do you know of any hardware nuances that could have somehow reordered the loading/storing of those values, such that the interrupt sees the second assigned, but not the first?Futile
@leo1: assuming a compiler like GCC where un-optimized means all variables are treated similar to volatile, then for an interrupt on the same core that was running the main thread, no, not on a normal mainstream CPU architecture. That's kind of pointless and un-interesting, though. You'd never want to use un-optimized code in production. The Mill CPU architecture has stores that don't become visible (even to itself) for multiple cycles, allowing explicit parallelism, but I can't think of a reason why a compiler would use a longer delay for the first store in fully un-optimized code.Underpainting
@PeterCordes, there are many reasons for using non-optimized code in production (e.g. safety critical flight software, automotive software, any environment that doesn't desire to introduce compiler bugs). But, that's besides the point.Futile
@EdwinBuck "thus they are required to be translated into code segments which preserve the ordering" What is a "code segment"? When does it start?Rescissory
@curiousguy: I assume he meant "basic blocks", or just "blocks"/chunks of asm, like the definition for a whole function. Note that Edwin's deleted his misleading answer after I replied (but not the comment), so I don't think we need to pick at it any farther.Underpainting
Comment deleted to assist in clarityScratchboard
P
4

Yes, it is possible because the compiler might re-order those statements as described in Peter's answer.

However, you might still be wondering about the other half: what hardware can do. Under the assumption that your stores end up in the assembly in the order you show in your source1, if an interrupt occurs on the same CPU that is running this code, from within the interrupt you'll see everything in a consistent order. That is, from within the interrupt handler, you'll never see the second store having completed, but the first not. The only scenarios you'll see are both not having completed, both completed or the first having completed and the second not.

If multiple cores are involved, and the interrupt may run on a different core, then you simply the classic cross-thread sharing scenarios, whether it is an interrupt or not - and what the other core can observe depends on the hardware memory model. For example, on the relatively strongly ordered x86, you would always observe the stores in order, where as on the more weakly ordered ARM or POWER memory models you could see the stores out of order.

In general, however, the CPU may be doing all sorts of reordering: the ordering you see within an interrupt handler is a special case where the CPU will restore the appearance of sequential execution at the point of handling the interrupt. The same is true of any case where a thread observes its own stores. However, when stores are observed by a different thread - what happens then depends on the hardware memory model, which varies a lot between architectures.


1 Assuming also that they show up separately - there is nothing stopping a smart compiler from noticing you are assigning to adjacent values in memory and hence transforming the two stores into a single wider one. Most compilers can do this in at least some scenarios.

Pegram answered 20/11, 2018 at 3:3 Comment(7)
I believe this is what I'm looking for "from within the interrupt handler, you'll never see the second store having completed, but the first not.". However, my limited understanding of architectures, I was thinking that second store could possibly happen first due to some hardware nuances. Maybe that isn't the case.Futile
@leo - an interrupt that actually interrupts the code in question (running on the same CPU that the code in question was running on) will always see a consistent view of the stores, just like code running on the CPU will see it's stores in source order. If another CPU is involved, then interrupt code (or any code really) running on that second CPU concurrently with the code doing the stores may see them out of order, depending on the hardware memory model.Pegram
@Rescissory - I'm not quite sure what you are referring to. If multiple threads are used and there is more than one CPU, and interrupts are not involved, then sure you can also see an inconsistent store order, depending on the hardware. Maybe you could clarify your question or create a separate one.Pegram
@Rescissory - I'm considering the case where an interrupt occurs as specified by the OP and show in their main() method. So I assume there are two execution contexts: the normal user context (single thread) running the stores, and the interrupt context, which may occur on the same CPU or a different CPU as the other context. When you ask about multiple threads, do you mean multiple threads running the store method? Multiple overlapping interrupts? It is not clear to me, since the original question doens't obviously involve threads.Pegram
@Pegram So you are saying that without any thread creation, the interrupt can cause the signal handler to run in a different execution context, on a different CPU/core, as if by another thread? That's new to me and strange.Rescissory
@Rescissory - no, I'm not trying to say that (systems that I'm aware of will deliver the signal to the only thread if there is only one). In fact, I'm trying to more or less sidestep the whole issue of signal handling semantics, and answer the question as asked, with any necessary caveats. Note that the OP didn't even talk about signals, but simply "interrupts". I don't know what type of system they are using, or how these interrupts are delivered. Originally I assumed the interrupt would actually interrupt the code in question, I wrote my answer that way (as in "you'll be fine").Pegram
However, it occurred to me that this was incomplete: in the scenario where the interrupt runs concurrently with the code doing the stores, it could certainly see them out of order. Multiple threads is an easy way to get interrupts (e.g., signals) concurrent with other code, but I doubt it is the only way. If you are interested in digging further into interrupt or signal handler semantics, I recommend another question where you present your specific query and include details of the OS and hardware that interests you.Pegram

© 2022 - 2024 — McMap. All rights reserved.