While writing an answer to another question, I noticed a strange border case for JIT optimization.
The following program is not a "Microbenchmark" and not intended to reliably measure an execution time (as pointed out in the answers to the other question). It is solely intended as an MCVE to reproduce the issue:
class MissedLoopOptimization
{
public static void main(String args[])
{
for (int j=0; j<3; j++)
{
for (int i=0; i<5; i++)
{
long before = System.nanoTime();
runWithMaxValue();
long after = System.nanoTime();
System.out.println("With MAX_VALUE : "+(after-before)/1e6);
}
for (int i=0; i<5; i++)
{
long before = System.nanoTime();
runWithMaxValueMinusOne();
long after = System.nanoTime();
System.out.println("With MAX_VALUE-1 : "+(after-before)/1e6);
}
}
}
private static void runWithMaxValue()
{
final int n = Integer.MAX_VALUE;
int i = 0;
while (i++ < n) {}
}
private static void runWithMaxValueMinusOne()
{
final int n = Integer.MAX_VALUE-1;
int i = 0;
while (i++ < n) {}
}
}
It basically runs the same loop, while (i++ < n){}
, where the limit n
is once set to Integer.MAX_VALUE
, and once to Integer.MAX_VALUE-1
.
When executing this on Win7/64 with JDK 1.7.0_21 and
java -server MissedLoopOptimization
the timing results are as follows:
...
With MAX_VALUE : 1285.227081
With MAX_VALUE : 1274.36311
With MAX_VALUE : 1282.992203
With MAX_VALUE : 1292.88246
With MAX_VALUE : 1280.788994
With MAX_VALUE-1 : 6.96E-4
With MAX_VALUE-1 : 3.48E-4
With MAX_VALUE-1 : 0.0
With MAX_VALUE-1 : 0.0
With MAX_VALUE-1 : 3.48E-4
Obviously, for the case of MAX_VALUE-1
, the JIT does what one could expect: It detects that the loop is useless, and completely eliminates it. However, it does not remove the loop when it is running up to MAX_VALUE
.
This observation is confirmed by a look at the JIT assembly output when starting with
java -server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly MissedLoopOptimization
The log contains the following assembly for the method that runs up to MAX_VALUE
:
Decoding compiled method 0x000000000254fa10:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'runWithMaxValue' '()V' in 'MissedLoopOptimization'
# [sp+0x20] (sp of caller)
0x000000000254fb40: sub $0x18,%rsp
0x000000000254fb47: mov %rbp,0x10(%rsp) ;*synchronization entry
; - MissedLoopOptimization::runWithMaxValue@-1 (line 29)
0x000000000254fb4c: mov $0x1,%r11d
0x000000000254fb52: jmp 0x000000000254fb63
0x000000000254fb54: nopl 0x0(%rax,%rax,1)
0x000000000254fb5c: data32 data32 xchg %ax,%ax
0x000000000254fb60: inc %r11d ; OopMap{off=35}
;*goto
; - MissedLoopOptimization::runWithMaxValue@11 (line 30)
0x000000000254fb63: test %eax,-0x241fb69(%rip) # 0x0000000000130000
;*goto
; - MissedLoopOptimization::runWithMaxValue@11 (line 30)
; {poll}
0x000000000254fb69: cmp $0x7fffffff,%r11d
0x000000000254fb70: jl 0x000000000254fb60 ;*if_icmpge
; - MissedLoopOptimization::runWithMaxValue@8 (line 30)
0x000000000254fb72: add $0x10,%rsp
0x000000000254fb76: pop %rbp
0x000000000254fb77: test %eax,-0x241fb7d(%rip) # 0x0000000000130000
; {poll_return}
0x000000000254fb7d: retq
0x000000000254fb7e: hlt
0x000000000254fb7f: hlt
[Exception Handler]
[Stub Code]
0x000000000254fb80: jmpq 0x000000000254e820 ; {no_reloc}
[Deopt Handler Code]
0x000000000254fb85: callq 0x000000000254fb8a
0x000000000254fb8a: subq $0x5,(%rsp)
0x000000000254fb8f: jmpq 0x0000000002528d00 ; {runtime_call}
0x000000000254fb94: hlt
0x000000000254fb95: hlt
0x000000000254fb96: hlt
0x000000000254fb97: hlt
One can clearly see the loop, with the comparison to 0x7fffffff
and the jump back to inc
. In contrast to that, the assembly for the case where it is running up to MAX_VALUE-1
:
Decoding compiled method 0x000000000254f650:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'runWithMaxValueMinusOne' '()V' in 'MissedLoopOptimization'
# [sp+0x20] (sp of caller)
0x000000000254f780: sub $0x18,%rsp
0x000000000254f787: mov %rbp,0x10(%rsp) ;*synchronization entry
; - MissedLoopOptimization::runWithMaxValueMinusOne@-1 (line 36)
0x000000000254f78c: add $0x10,%rsp
0x000000000254f790: pop %rbp
0x000000000254f791: test %eax,-0x241f797(%rip) # 0x0000000000130000
; {poll_return}
0x000000000254f797: retq
0x000000000254f798: hlt
0x000000000254f799: hlt
0x000000000254f79a: hlt
0x000000000254f79b: hlt
0x000000000254f79c: hlt
0x000000000254f79d: hlt
0x000000000254f79e: hlt
0x000000000254f79f: hlt
[Exception Handler]
[Stub Code]
0x000000000254f7a0: jmpq 0x000000000254e820 ; {no_reloc}
[Deopt Handler Code]
0x000000000254f7a5: callq 0x000000000254f7aa
0x000000000254f7aa: subq $0x5,(%rsp)
0x000000000254f7af: jmpq 0x0000000002528d00 ; {runtime_call}
0x000000000254f7b4: hlt
0x000000000254f7b5: hlt
0x000000000254f7b6: hlt
0x000000000254f7b7: hlt
So my question is: What is so special about the Integer.MAX_VALUE
that prevents the JIT from optimizing it in the same way as it does for Integer.MAX_VALUE-1
? My guess would be that has to do with the cmp
instruction, which is intended for signed arithmetic, but that alone is not really a convincing reason. Can anybody explain this, and maybe even give a pointer to the OpenJDK HotSpot code where this case is treated?
(An aside: I hope that the answer will also explain the different behavior between i++
and ++i
that was asked for in the other question, assuming that the reason for the missing optimization is (obviously) actually caused by the Integer.MAX_VALUE
loop limit)
while (++i <= n) {}
– Manifoldi++<n
,++i<n
,i++<=n
, andn<=++i
etc, but did not systematically compare and document all possible configurations in order to keep the actual question clear. – Giesser