As others have pointed out, the test is flawed in many ways.
You did not tell us exactly how you did this test. However, I tried to implement a "naive" test (no offense) like this:
class PrePostIncrement
{
public static void main(String args[])
{
for (int j=0; j<3; j++)
{
for (int i=0; i<5; i++)
{
long before = System.nanoTime();
runPreIncrement();
long after = System.nanoTime();
System.out.println("pre : "+(after-before)/1e6);
}
for (int i=0; i<5; i++)
{
long before = System.nanoTime();
runPostIncrement();
long after = System.nanoTime();
System.out.println("post : "+(after-before)/1e6);
}
}
}
private static void runPreIncrement()
{
final int n = Integer.MAX_VALUE;
int i = 0;
while (++i < n) {}
}
private static void runPostIncrement()
{
final int n = Integer.MAX_VALUE;
int i = 0;
while (i++ < n) {}
}
}
When running this with default settings, there seems to be a small difference. But the real flaw of the benchmark becomes obvious when you run this with the -server
flag. The results in my case then are along something like
...
pre : 6.96E-4
pre : 6.96E-4
pre : 0.001044
pre : 3.48E-4
pre : 3.48E-4
post : 1279.734543
post : 1295.989086
post : 1284.654267
post : 1282.349093
post : 1275.204583
Obviously, the pre-increment version has been completely optimized away. The reason is rather simple: The result is not used. It does not matter at all whether the loop is executed or not, so the JIT simply removes it.
This is confirmed by a look at the hotspot disassembly: The pre-increment version results in this code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x0000000055060500} 'runPreIncrement' '()V' in 'PrePostIncrement'
# [sp+0x20] (sp of caller)
0x000000000286fd80: sub $0x18,%rsp
0x000000000286fd87: mov %rbp,0x10(%rsp) ;*synchronization entry
; - PrePostIncrement::runPreIncrement@-1 (line 28)
0x000000000286fd8c: add $0x10,%rsp
0x000000000286fd90: pop %rbp
0x000000000286fd91: test %eax,-0x243fd97(%rip) # 0x0000000000430000
; {poll_return}
0x000000000286fd97: retq
0x000000000286fd98: hlt
0x000000000286fd99: hlt
0x000000000286fd9a: hlt
0x000000000286fd9b: hlt
0x000000000286fd9c: hlt
0x000000000286fd9d: hlt
0x000000000286fd9e: hlt
0x000000000286fd9f: hlt
The post-increment version results in this code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x00000000550605b8} 'runPostIncrement' '()V' in 'PrePostIncrement'
# [sp+0x20] (sp of caller)
0x000000000286d0c0: sub $0x18,%rsp
0x000000000286d0c7: mov %rbp,0x10(%rsp) ;*synchronization entry
; - PrePostIncrement::runPostIncrement@-1 (line 35)
0x000000000286d0cc: mov $0x1,%r11d
0x000000000286d0d2: jmp 0x000000000286d0e3
0x000000000286d0d4: nopl 0x0(%rax,%rax,1)
0x000000000286d0dc: data32 data32 xchg %ax,%ax
0x000000000286d0e0: inc %r11d ; OopMap{off=35}
;*goto
; - PrePostIncrement::runPostIncrement@11 (line 36)
0x000000000286d0e3: test %eax,-0x243d0e9(%rip) # 0x0000000000430000
;*goto
; - PrePostIncrement::runPostIncrement@11 (line 36)
; {poll}
0x000000000286d0e9: cmp $0x7fffffff,%r11d
0x000000000286d0f0: jl 0x000000000286d0e0 ;*if_icmpge
; - PrePostIncrement::runPostIncrement@8 (line 36)
0x000000000286d0f2: add $0x10,%rsp
0x000000000286d0f6: pop %rbp
0x000000000286d0f7: test %eax,-0x243d0fd(%rip) # 0x0000000000430000
; {poll_return}
0x000000000286d0fd: retq
0x000000000286d0fe: hlt
0x000000000286d0ff: hlt
It's not entirely clear for me why it seemingly does not remove the post-increment version. (In fact, I consider asking this as a separate question). But at least, this explains why you might see differences with an "order of magnitude"...
EDIT: Interestingly, when changing the upper limit of the loop from Integer.MAX_VALUE
to Integer.MAX_VALUE-1
, then both versions are optimized away and require "zero" time. Somehow this limit (which still appears as 0x7fffffff
in the assembly) prevents the optimization. Presumably, this has something to do with the comparison being mapped to a (singed!) cmp
instruction, but I can not give a profound reason beyond that. The JIT works in mysterious ways...
++i<n
itfirst increment i then compare
– Adulteration