Is there any instruction reordering done by the Hotspot JIT compiler that can be reproduced?
Asked Answered
B

1

16

As we know, some JIT allows reordering for object initialization, for example,

someRef = new SomeObject();

can be decomposed into below steps:

objRef = allocate space for SomeObject; //step1
call constructor of SomeObject;         //step2
someRef = objRef;                    //step3

JIT compiler may reorder it as below:

objRef = allocate space for SomeObject; //step1
someRef = objRef;                    //step3
call constructor of SomeObject;         //step2

namely, step2 and step3 can be reordered by JIT compiler. Even though this is theoretically valid reordering, I was unable to reproduce it with Hotspot(jdk1.7) under x86 platform.

So, Is there any instruction reordering done by the Hotspot JIT comipler that can be reproduced?


Update: I did the test on my machine(Linux x86_64,JDK 1.8.0_40, i5-3210M ) using below command:

java -XX:-UseCompressedOops -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand="print org.openjdk.jcstress.tests.unsafe.UnsafePublication::publish" -XX:CompileCommand="inline, org.openjdk.jcstress.tests.unsafe.UnsafePublication::publish" -XX:PrintAssemblyOptions=intel -jar tests-custom/target/jcstress.jar -f -1 -t .*UnsafePublication.* -v > log.txt 

and I can see the tool reported something like:

[1] 5 ACCEPTABLE The object is published, at least 1 field is visible.

That meant an observer thread saw an uninitialized instance of MyObject.

However,I did NOT see assembly code generated like @Ivan's:

0x00007f71d4a15e34: mov r11d,DWORD PTR [rbp+0x10] ;getfield x 
0x00007f71d4a15e38: mov DWORD PTR [rax+0x10],r11d ;putfield x00 
0x00007f71d4a15e3c: mov DWORD PTR [rax+0x14],r11d ;putfield x01 
0x00007f71d4a15e40: mov DWORD PTR [rax+0x18],r11d ;putfield x02 
0x00007f71d4a15e44: mov DWORD PTR [rax+0x1c],r11d ;putfield x03 
0x00007f71d4a15e48: mov QWORD PTR [rbp+0x18],rax ;putfield o

There seems to be no compiler reordering here.


Update2: @Ivan corrected me. I used wrong JIT command to capture the assembly code.After fixing this error, I can grap below assembly code:

0x00007f76012b18d5: mov    DWORD PTR [rax+0x10],ebp  ;*putfield x00
0x00007f76012b18d8: mov    QWORD PTR [r8+0x18],rax  ;*putfield o
                                                ; - org.openjdk.jcstress.tests.unsafe.generated.UnsafePublication_jcstress$Runner_publish::call@94 (line 156)
0x00007f76012b18dc: mov    DWORD PTR [rax+0x1c],ebp  ;*putfield x03

Apparently, the compiler did the reordering which caused an unsafe publication.

Bal answered 9/3, 2016 at 5:8 Comment(11)
How exactly are you planning to detect that this happens? Any such optimization must surely not be allowed to leak out into "user land".Pasquil
@Pasquil A) by looking at the HotSpot source code? B) the java memory model allows for many reorderings so it's possible that this could be observed from a different thread, if it were happeningIncoming
Generally, method A) is practical, we can observe the assembly code generated by the JIT compiler.Bal
@ErwinBolwidt: Are you saying that Hotspot would allow a different thread to get access to an Object that has not been fully constructed yet?Pasquil
When people mention that possibility, it's often in the context of multithreading. So, then you have to consider not just the order in which things were done core (ie, the instructions) but also the order in which they were published to some other core.Alcot
@Pasquil Yes - unless the two threads are correctly synchronized wrt to the constructed object, it is possible that another thread sees the object in a partially constructed state. Whether that is because of memory visibility (CPU caches etc) or because of re-ordering by the CPU or JIT compiler (or anything else) is not specified (in the JMM section of the JLS). The JVM is not required to protect the program against such apparent reordering unless the threads are correctly synchronized. (Whether you would actually see such reordering in practice is a different matter)Incoming
"Even though this is theoretically invalid reordering, I was unable to reproduce it" -- Did you expect to be able to produce invalid behaviour of the JIT?!Br
I'm confused about why you would care. Is this even a problem? If this question were to be answered, would it be of use to anyone in the future? If this is a problem for you, perhaps you should be using a lower-level language. Java isn't meant to be torn apart and put under a microscope. It gives users a higher-level medium to write software. Unless you were writing your own implementation, I don't see how this information could be useful.Heroin
Of course, the reordering I mentioned IS VALID, sorry for the typo saying it is invalid. I just want to reproduce a reordering done by JIT compiler that CAN affect another thread given a program that is NOT properly synchronized.Bal
@Bal method was inlined and compiled into another method. So use -XX:+PrintAssembly and grep or change compiler hint to something like -XX:CompileCommand="print org.openjdk.jcstress.tests.unsafe.generated.UnsafePublication_jcstress*::call"Volatile
@Ivan is absolutely right. Thank you very much! I will accept your answer tomorrow so that I can award you some reputation.Bal
V
20

You can reproduce any compiler reordering. The right question is - which tool to use for this. In order to see compiler reordering - you have to follow down to assembly level with JITWatch(as it uses HotSpot's assembly log output) or JMH with LinuxPerfAsmProfiler.

Let's consider the following benchmark based on JMH:

public class ReorderingBench {

    public int[] array = new int[] {1 , -1,  1, -1};
    public int sum = 0;

    @Benchmark
    public void reorderGlobal() {
        int[] a = array;
        sum += a[1];
        sum += a[0];
        sum += a[3];
        sum += a[2];
    }

    @Benchmark
    public int reorderLocal() {
        int[] a = array;
        int sum = 0;
        sum += a[1];
        sum += a[0];
        sum += a[3];
        sum += a[2];
        return sum;
    }
}

Please note that array access is unordered. On my machine for method with global variable sum assembler output is:

mov    0xc(%rcx),%r8d         ;*getfield sum
...
add    0x14(%r12,%r10,8),%r8d ;add a[1]
add    0x10(%r12,%r10,8),%r8d ;add a[0]
add    0x1c(%r12,%r10,8),%r8d ;add a[3]
add    0x18(%r12,%r10,8),%r8d ;add a[2]

but for method with local variable sum access pattern was changed:

mov    0x10(%r12,%r10,8),%edx ;add a[0] <-- 0(0x10) first
add    0x14(%r12,%r10,8),%edx ;add a[1] <-- 1(0x14) second
add    0x1c(%r12,%r10,8),%edx ;add a[3]
add    0x18(%r12,%r10,8),%edx ;add a[2]

You can play with c1 compiler optimizations c1_RangeCheckElimination

Update:

It is extremely hard to see only compiler reorderings from user's point of view, because you have to run bilions of samples to catch the racy behavior. Also it is important to separate compiler and hardware issues, for instance, weakly-ordered hardware like POWER can change behavior. Let's start from the right tool: jcstress - an experimental harness and a suite of tests to aid the research in the correctness of concurrency support in the JVM, class libraries, and hardware. Here is a reproducer where the instruction scheduler may decide to emit a few field stores, then publish the reference, then emit the rest of the field stores(also you can read about safe publications and instruction scheduling here). In some cases on my machine with Linux x86_64, JDK 1.8.0_60, i5-4300M compiler generates the following code:

mov    %edx,0x10(%rax)    ;*putfield x00                    
mov    %edx,0x14(%rax)    ;*putfield x01
mov    %edx,0x18(%rax)    ;*putfield x02
mov    %edx,0x1c(%rax)    ;*putfield x03
...
movb   $0x0,0x0(%r13,%rdx,1)  ;*putfield o

but sometimes:

mov    %ebp,0x10(%rax)    ;*putfield x00
...
mov    %rax,0x18(%r10)    ;*putfield o  <--- publish here
mov    %ebp,0x1c(%rax)    ;*putfield x03
mov    %ebp,0x18(%rax)    ;*putfield x02
mov    %ebp,0x14(%rax)    ;*putfield x01

Update 2:

Regarding to the question about performance benefits. In our case, this optimization(reordering) does not bring meaningful performance benefit it's just a side effect of the compiler's implementation. HotSpot uses sea of nodes graph to model data and control flow(you can read about graph-based intermediate representation here). The following picture shows the IR graph for our example(-XX:+PrintIdeal -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=graph.xml options + ideal graph visualizer): enter image description here where inputs to a node are inputs to the node's operation. Each node defines a value based on it's inputs and operation, and that value is available on all output edges. It is obvious that compiler does not see any difference between pointer and integer store nodes so the only thing that limits it - is memory barrier. As a result in order to reduce register pressure, target code size or something else compiler decides to schedule instructions within the basic block in this strange(from user's point of view) order. You can play with instruction scheduling in Hotspot by using the following options(available in fastdebug build): -XX:+StressLCM and -XX:+StressGCM.

Volatile answered 9/3, 2016 at 22:1 Comment(5)
Thanks @Ivan, for reorderLocal, the reordering will NOT affect any other thread since sum is a local variable and the content of array does not change. I think it is more valuable to reproduce a reordering that can affect another thread(of course, here we assume that the program itself is not properly synchronized).Bal
I also did the test, but the assembly code generated looked like have no compiler reordering. For details, see my update to the question.Bal
@Bal this is because you print wrong method, use -XX:CompileCommand="print org.openjdk.jcstress.tests.unsafe.generated.UnsafePublication_jcstress*::call" . Constructor and publish method were inlined and compiled into call methodVolatile
Could you please explain what performance benefit can we gain through such reordering(I mean this specific reordering discussed in the question)?Bal
Amazing answer!Oakland

© 2022 - 2024 — McMap. All rights reserved.