How to demonstrate Java instruction reordering problems?

Asked 4/10, 2018 at 14:6 Answered 13/10, 2018 at 15:23

Solved java multithreading compiler-optimization instruction-reordering

With Java instruction reordering the execution order of the code is changed by the JVM at compile time or run time, possibly causing unrelated statements to be executed out-of-order.

Edit: [Instruction reordering can produce counter-intuitive results. Many CPU architectures can reorder the memory interactions of machine instructions which leads to similar unexpected results even if the compiler did not change the instruction order. Thus, the term memory reordering may be a better fit than instruction reordering.]

So my question is:

Can someone provide an example Java program/snippet, that reliably shows an instruction reordering problem, that is not caused also by other synchronization issues ( such as caching/visibility or non-atomic r/w, as in my failed attempt at such a demo in my previous question )

To emphasize, I am not looking for examples of theoretical reordering issues. What I am looking for is a way to actually demonstrate them by seeing incorrect or unexpected results of a running program.

Barring a faulty behavior example, just showing actual reordering happening in the assembly of a simple program could also be nice.

Pea answered 4/10, 2018 at 14:6 Comment(19)

But that would be a bug, and probably fixed in the next release when discovered. Are you looking for historical issue reports, or open tickets or such? Or do you mean to say that this is a situation that can just happen and is not going to be resolved? – Sullivan 4/10, 2018 at 14:12

It wouldn't be a bug. Instruction reordering can be visible from other threads if proper synchronization/memory barriers aren't in place. – Niblick 4/10, 2018 at 14:13

Well, OP explicitly excluded undefined behaviour due to insufficient thread synchronization. Modulo that, it's a bug, no? – Sullivan 4/10, 2018 at 14:14

And you are saying you can get undefined behaviour due to re-ordering even when thread synchronization has been applied properly? Do you have a reference for that? – Sullivan 4/10, 2018 at 14:17

that would be tremendously hard to show IMO on x86, but a very nice question – Nelly 4/10, 2018 at 14:22

@Sullivan I take it that OP wants an example where improper synchronization triggers a problem caused specifically by instruction reordering, not one caused by non-atomicity of reads or some other synchronization issue. There are many specific reasons improper synchronization can be a problem; they're interested in this particular one. – Niblick 4/10, 2018 at 14:22

Do you asking for an example which shows this? – Bijouterie 4/10, 2018 at 14:27

A good answer might show bytecode with reordered instructions that is buggy, yet would in fact work if the instructions were in their original order. A more difficult demonstration could inspect the JIT's native code output and point out a reordering optimization it performs that causes failure. – Niblick 4/10, 2018 at 14:28

How do you expect to see instruction reordering happening in the bytecode? It's something done by the JIT compiler. javac does not perform any instruction reordering. – Thrall 10/10, 2018 at 17:38

I found an example but for C++. Maybe you can translate it to Java: preshing.com/20120515/memory-reordering-caught-in-the-act – Landscape 10/10, 2018 at 17:55

Possible duplicate - #35883854 – Footway 12/10, 2018 at 9:0

It sounds like you're actually worried about memory reordering. On non-x86 ISAs, memory ordering != asm program order, so instruction order is not really relevant to correctness without also having memory barriers or e.g. ARM64 stlr / ldar release-store / acquire-load instructions. And even on x86, StoreLoad reordering happens, so you don't get sequentially-consistent execution. Anyway, if lack of instruction reordering was all that was providing correctness on x86 without having told the JVM about the desired memory ordering, your code is broken on ARM. – Ungley 13/10, 2021 at 23:0

Even an in-order pipeline can create LoadStore and LoadLoad reordering with scoreboarding of loads and hit-under-miss cache: How is load->store reordering possible with in-order commit?. And basically all CPUs have StoreLoad reordering, although that's not a problem for acq/rel sync. (This is memory reordering relative to asm instruction program order and/or execution order, not compile-time reordering of source vs. asm, but my point is that compile-time ordering is just one of many things that are necessary for correct / desired memory ordering) – Ungley 13/10, 2021 at 23:2

@PeterCordes Much of what you wrote is a bit too technical for me. Please take a look at my edit, and see if I've captured the gist of what you are saying correctly. – Pea 14/10, 2021 at 8:21

Yeah, that's basically the point. In some ways that significantly changes the question, but if you want a memory-reordering demo that works even on x86, compile-time reordering (including JIT) is only way to break most uses of lock-free inter-thread communication (because usually acquire/release synchronization is all that's needed, and x86 does that for free, just based on asm program order). C++ How is release-and-acquire achieved on x86 only using MOV? – Ungley 14/10, 2021 at 8:49

The other major way to see memory reordering on x86 is with something that would demo StoreLoad reordering, like preshing.com/20120515/memory-reordering-caught-in-the-act, which x86's asm memory model allows (i.e. at run-time for a fixed asm instruction order). In that case asm instruction order is not sufficient. (Most other ISAs allow that and more: preshing.com/20120930/weak-vs-strong-memory-models). Your previous wording ruled that out, and gave the impression that asm itself would be sequentially consistent, making compile-time reodering sound like the only problem. – Ungley 14/10, 2021 at 8:56

Note that store forwarding means a thread always sees it's own stores in program order, i.e. CPUs (and compilers) must not break single-threaded code. So StoreLoad reordering isn't the only effect of the store-buffer + store forwarding that's part of CPU memory models, including x86's. Java may not let you type-pun like this, but if you do for example an int load that overlaps with a recent char store, you can potentially load a value that no other core can see. (Or will ever see if another thread's int or char store also overlaps that and becomes globally visible first.) – Ungley 14/10, 2021 at 9:4

Globally Invisible load instructions. So it's not as simple as just "x86 allows StoreLoad reordering". Not that all this x86 detail is really relevant, mostly just bringing this up because you accepted an x86-specific answer. That answer would show reordering on most ARM or AArch64 CPU even if the asm loads/stores matched Java source order. But that's fine, running on x86 and seeing acq/rel sync broken (and checking the asm) does demo what you asked for: instruction reordering by whichever JIT was used. – Ungley 14/10, 2021 at 9:7

TL:DR: as a "how does it break" question, this is fine. But just be aware that x86 is pretty strongly ordered, and can make things happen to work (depending on lack of compile-time reordering) when they're not safe in the Java memory model, and will break on other ISAs. So understanding when instruction reordering happens or not isn't relevant for reasoning about correctness of portable programs, only trying to hack cheap x86-only equivalents for C++ memory_order_release / memory_order_acquire in Java which only provides expensive seq_cst via volatile. (Unless that's changed) – Ungley 14/10, 2021 at 9:9

This demonstrates reordering of certain assignments, out of 1M iterations there is usually couple of printed lines.

public class App {

    public static void main(String[] args) {

        for (int i = 0; i < 1000_000; i++) {
            final State state = new State();

            // a = 0, b = 0, c = 0

            // Write values
            new Thread(() -> {
                state.a = 1;
                // a = 1, b = 0, c = 0
                state.b = 1;
                // a = 1, b = 1, c = 0
                state.c = state.a + 1;
                // a = 1, b = 1, c = 2
            }).start();

            // Read values - this should never happen, right?
            new Thread(() -> {
                // copy in reverse order so if we see some invalid state we know this is caused by reordering and not by a race condition in reads/writes
                // we don't know if the reordered statements are the writes or reads (we will se it is writes later)
                int tmpC = state.c;
                int tmpB = state.b;
                int tmpA = state.a;

                if (tmpB == 1 && tmpA == 0) {
                    System.out.println("Hey wtf!! b == 1 && a == 0");
                }
                if (tmpC == 2 && tmpB == 0) {
                    System.out.println("Hey wtf!! c == 2 && b == 0");
                }
                if (tmpC == 2 && tmpA == 0) {
                    System.out.println("Hey wtf!! c == 2 && a == 0");
                }
            }).start();

        }
        System.out.println("done");
    }

    static class State {
        int a = 0;
        int b = 0;
        int c = 0;
    }

}

Printing the assembly for the write lambda gets this output (among other..)

                                                ; {metadata('com/example/App$$Lambda$1')}
  0x00007f73b51a0100: 752b                jne       7f73b51a012dh
                                                ;*invokeinterface run
                                                ; - java.lang.Thread::run@11 (line 748)

  0x00007f73b51a0102: 458b530c            mov       r10d,dword ptr [r11+0ch]
                                                ;*getfield arg$1
                                                ; - com.example.App$$Lambda$1/1831932724::run@1
                                                ; - java.lang.Thread::run@-1 (line 747)

  0x00007f73b51a0106: 43c744d41402000000  mov       dword ptr [r12+r10*8+14h],2h
                                                ;*putfield c
                                                ; - com.example.App::lambda$main$0@17 (line 18)
                                                ; - com.example.App$$Lambda$1/1831932724::run@4
                                                ; - java.lang.Thread::run@-1 (line 747)
                                                ; implicit exception: dispatches to 0x00007f73b51a01b5
  0x00007f73b51a010f: 43c744d40c01000000  mov       dword ptr [r12+r10*8+0ch],1h
                                                ;*putfield a
                                                ; - com.example.App::lambda$main$0@2 (line 14)
                                                ; - com.example.App$$Lambda$1/1831932724::run@4
                                                ; - java.lang.Thread::run@-1 (line 747)

  0x00007f73b51a0118: 43c744d41001000000  mov       dword ptr [r12+r10*8+10h],1h
                                                ;*synchronization entry
                                                ; - java.lang.Thread::run@-1 (line 747)

  0x00007f73b51a0121: 4883c420            add       rsp,20h
  0x00007f73b51a0125: 5d                  pop       rbp
  0x00007f73b51a0126: 8505d41eb016        test      dword ptr [7f73cbca2000h],eax
                                                ;   {poll_return}
  0x00007f73b51a012c: c3                  ret
  0x00007f73b51a012d: 4181f885f900f8      cmp       r8d,0f800f985h

I am not sure why the last mov dword ptr [r12+r10*8+10h],1h is not marked with putfield b and line 16, but you can see the swapped assignment of b and c (c right after a).

EDIT: Because writes happen in order a,b,c and reads happen in reverse order c,b,a you should never see an invalid state unless the writes (or reads) are reordered.

Writes performed by single cpu (or core) are visible in same order by all processors, see e.g. this answer, which points to Intel System Programming Guide Volume 3 section 8.2.2.

Writes by a single processor are observed in the same order by all processors.

Palpable answered 10/10, 2018 at 20:31 Comment(11)

I don't mean to be difficult, since it really seems from the assembly that the writes are reordered, but how do you rule out, for example, that c == 2 and b == 0 is not caused by b not being updated from thread cache to main memory? – Pea 11/10, 2018 at 6:36

State properties are neither volatile nor Atomic. And there is no happens-before relation between writes in one thread and reads in another. So the fact that sometimes results is different could be cause by improper synchronization, no? – Weider 11/10, 2018 at 14:38

@Weider because there is no happens-before relation between reads and writes the jit compiler is allowed to reorder the instructions. You would never get the weird result without instruction reordering and it is not a case of normal race condition as with OP's attempt in linked question. – Leveille 11/10, 2018 at 14:49

@frant.hartm 10x. Very nice. – Pea 12/10, 2018 at 19:18

@GonenI Main memory? What do you mean? – Adlar 14/5, 2019 at 17:24

@GonenI: x86, like all other ISAs where we run threads across multiple cores, guarantees coherent data caches. Other cores don't read directly from main memory, they read through caches which are part of the same coherency domain as the core running the code that stored. You're mixing up compilers choosing to keep values in private registers vs. actual stores committing to shared cache. – Ungley 13/10, 2021 at 23:10

(x86 guarantees stores and loads become visible to other cores in asm program order, with the stores delayed by a store buffer w/ store-forwarding. Actual CPUs maintain that illusion even with OoO exec. Other ISAs don't; only on x86 can you get acquire/release synchronization just by ordering normal load and store instructions, without using special loads/stores or barriers. If there hadn't been instruction reordering here, this demo would only happen to work on x86, not ARM, PowerPC or other ISAs.) – Ungley 13/10, 2021 at 23:15

@PeterCordes You've written quite a lot in the comments here, and I thought it might be more appropriate for a new question so I posted one. If you think it is not, feel free to close my new question. – Pea 14/10, 2021 at 10:18

tmpB == 1 && tmpA == 0 doesn't necessarily happen due to reordering. It could be possible for the write thread to only flush the b variable value to the main memory while not doing the same for the a variable value. The same applies for other checks. – Labialized 28/10, 2022 at 13:7

@PiotrMichalczyk no, this is not possible on Intel architecture. All writes are observed in order. – Leveille 29/10, 2022 at 11:49

@FrantišekHartman right but the question doesn't ask about the specific architecture nor the Java Memory Model is coupled only to the Intel architecture – Labialized 31/10, 2022 at 10:2

Test

I wrote a JUnit 5 test that checks whether instruction reordering took place after two threads terminate.

The test must pass if no instruction reordering happened.
The test must fail if instruction reordering occurred.

‌

public class InstructionReorderingTest {

    static int x, y, a, b;

    @org.junit.jupiter.api.BeforeEach
    public void init() {
        x = y = a = b = 0;
    }

    @org.junit.jupiter.api.Test
    public void test() throws InterruptedException {
        Thread threadA = new Thread(() -> {
            a = 1;
            x = b;
        });
        Thread threadB = new Thread(() -> {
            b = 1;
            y = a;
        });

        threadA.start();
        threadB.start();

        threadA.join();
        threadB.join();

        org.junit.jupiter.api.Assertions.assertFalse(x == 0 && y == 0);
    }

}

Results

I ran the test until it fails several times. The results are as follows:

InstructionReorderingTest.test [*] (12s 222ms): 29144 total, 1 failed, 29143 passed.
InstructionReorderingTest.test [*] (26s 678ms): 69513 total, 1 failed, 69512 passed.
InstructionReorderingTest.test [*] (12s 161ms): 27878 total, 1 failed, 27877 passed.

Explanation

The results we expect are

x = 0, y = 1: threadA runs to completion before threadB starts.
x = 1, y = 0: threadB runs to completion before threadA starts.
x = 1, y = 1: their instructions are interleaved.

No one can expect x = 0, y = 0, which may happen as the test results showed.

The actions in each thread have no dataflow dependence on each other, and accordingly can be executed out of order. (Even if they are executed in order, the timing by which caches are flushed to main memory can make it appear, from the perspective of threadB, that the assignments in threadA occurred in the opposite order.)

_{Java Concurrency in Practice, Brian Goetz}

Oster answered 12/10, 2018 at 8:44 Comment(8)

I think the assert inspires more confidence this way: Assert.assertFalse(x == 0 && y == 0 && a == 1 && b == 1); // Just to make sure some other weirdness is not happening – Pea 12/10, 2018 at 19:23

Thanks for your answer. I modified it to use a loop and dropped JUnit, and it eventually it fails on my machine as well ( though at times it passes 10 mil iteratrions ). I did wonder how to rule out an unflushed cache as the reason, but this gee.cs.oswego.edu/dl/cpj/jmm.html states "if one thread synchronizes on the termination of another thread using Thread.join, then it is guaranteed to see the effects made by that thread" – Pea 12/10, 2018 at 19:48

Althogh it looks great, it is an example of concurrency and threading, NOT instruction reordering... Those are completely different things. Reorder takes place in a single code block and therefore in a single thread not across multiple ones. – Beverleybeverlie 13/10, 2018 at 6:3

@Mark if you look at the figure, you might notice the reordering of a = 1 and x = b in threadA which is "a single code block" – Oster 13/10, 2018 at 8:3

@Mark and we need a multithreaded context to have different perspectives on the code where reordering may occur potentially – Oster 13/10, 2018 at 8:8

@Andrew hm right.. I was not getting your point as the x=1 and y=1 possbility is missing from the explanation part... Have my upvote, but I do not know the necessity of the multithreadning part. In case of the usage of the Unsafe class it might not has to have a multithread environment. – Beverleybeverlie 13/10, 2018 at 8:17

@Andrew correct me if I am wrong but this is a bug. If the reordering must not have any consequences in the output of the running program, this reproduction shows a problem in the specific jvm you are running your code on. – Beverleybeverlie 13/10, 2018 at 8:37

@Mark this reproduction show what might happen due to improper synchronisation. – Oster 14/10, 2018 at 7:39

-1

For single thread executions reordering is not a problem at all, because of Java Memory Model (JMM) (guarantee that any reads actions related to writes are total ordered) and cannot lead to unexpected results.

For concurrent execution, rules are completely different and things gets more complicated to understand (even by providing simple example which will raise even more questions). But even this are totally described by JMM with all corner cases, so, unexpected results also forbidden. Generally, forbidden if all barriers are placed right.

For better understanding reordering I strongly recommend this subject with plenty examples insides.

Reichsmark answered 13/10, 2018 at 15:23 Comment(1)

I think you mean instruction reordering isn't a problem if your code isn't already broken. Yeah, I think that's true; anything inter-thread memory ordering you get from lack of instruction reordering can only help you on a strongly-ordered asm memory model like x86, so such a program that happened to work on x86 (with one JVM JIT version / options) wouldn't work on ARM, MIPS, PowerPC, etc., if you didn't tell the compiler about the necessary memory ordering semantics so it could use special instructions. preshing.com/20120930/weak-vs-strong-memory-models – Ungley 13/10, 2021 at 23:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Test

Results

Explanation

Recommended topics

Hot tags