EDIT: maaartinus gave the answer I was looking for and tmyklebu's data on the problem helped a lot, so thanks both! :)
I've read a bit about how HotSpot has some "intrinsics" that injects in the code, specially for Java standard Math libs (from here)
So I decided to give it a try, to see how much difference HotSpot could make against doing the comparison directly (specially since I've heard min/max can compile to branchless asm).
public class OpsMath {
public static final int max(final int a, final int b) {
if (a > b) {
return a;
}
return b;
}
}
That's my implementation. From another SO question I've read that using the ternary operator uses an extra register, I haven't found significant differences between doing an if block and using a ternary operator (ie, return ( a > b ) ? a : b ).
Allocating a 8Mb int array (ie, 2 million values), and randomizing it, I do the following test:
try ( final Benchmark bench = new Benchmark( "millis to max" ) )
{
int max = Integer.MIN_VALUE;
for ( int i = 0; i < array.length; ++i )
{
max = OpsMath.max( max, array[i] );
// max = Math.max( max, array[i] );
}
}
I'm using a Benchmark object in a try-with-resources block. When it finishes, it calls close() on the object and prints the time the block took to complete. The tests are done separately by commenting in/out the max calls in the code above.
'max' is added to a list outside the benchmark block and printed later, so to avoid the JVM optimizing the whole block away.
The array is randomized each time the test runs.
Running the test 6 times, it gives these results:
Java standard Math:
millis to max 9.242167
millis to max 2.1566199999999998
millis to max 2.046396
millis to max 2.048616
millis to max 2.035761
millis to max 2.001044
So fairly stable after the first run, and running the tests again gives similar results.
OpsMath:
millis to max 8.65418
millis to max 1.161559
millis to max 0.955851
millis to max 0.946642
millis to max 0.994543
millis to max 0.9469069999999999
Again, very stable results after the first run.
The question is: Why? Thats quite a big difference there. And I have no idea why. Even if I implement my max() method exactly like Math.max() (ie, return (a >= b) ? a : b ) I still get better results! It makes no sense.
Specs:
CPU: Intel i5 2500, 3,3Ghz. Java Version: JDK 8 (public march 18 release), x64. Debian Jessie (testing release) x64.
I have yet to try with 32 bit JVM.
EDIT: Self contained test as requested. Added a line to force the JVM to preload Math and OpsMath classes. That eliminates the 18ms cost of the first iteration for OpsMath test.
// Constant nano to millis.
final double TO_MILLIS = 1.0d / 1000000.0d;
// 8Mb alloc.
final int[] array = new int[(8*1024*1024)/4];
// Result and time array.
final ArrayList<Integer> results = new ArrayList<>();
final ArrayList<Double> times = new ArrayList<>();
// Number of tests.
final int itcount = 6;
// Call both Math and OpsMath method so JVM initializes the classes.
System.out.println("initialize classes " +
OpsMath.max( Math.max( 20.0f, array.length ), array.length / 2.0f ));
final Random r = new Random();
for ( int it = 0; it < itcount; ++it )
{
int max = Integer.MIN_VALUE;
// Randomize the array.
for ( int i = 0; i < array.length; ++i )
{
array[i] = r.nextInt();
}
final long start = System.nanoTime();
for ( int i = 0; i < array.length; ++i )
{
max = Math.max( array[i], max );
// OpsMath.max() method implemented as described.
// max = OpsMath.max( array[i], max );
}
// Calc time.
final double end = (System.nanoTime() - start);
// Store results.
times.add( Double.valueOf( end ) );
results.add( Integer.valueOf( max ) );
}
// Print everything.
for ( int i = 0; i < itcount; ++i )
{
System.out.println( "IT" + i + " result: " + results.get( i ) );
System.out.println( "IT" + i + " millis: " + times.get( i ) * TO_MILLIS );
}
Java Math.max result:
IT0 result: 2147477409
IT0 millis: 9.636998
IT1 result: 2147483098
IT1 millis: 1.901314
IT2 result: 2147482877
IT2 millis: 2.095551
IT3 result: 2147483286
IT3 millis: 1.9232859999999998
IT4 result: 2147482828
IT4 millis: 1.9455179999999999
IT5 result: 2147482475
IT5 millis: 1.882047
OpsMath.max result:
IT0 result: 2147482689
IT0 millis: 9.003616
IT1 result: 2147483480
IT1 millis: 0.882421
IT2 result: 2147483186
IT2 millis: 1.079143
IT3 result: 2147478560
IT3 millis: 0.8861169999999999
IT4 result: 2147477851
IT4 millis: 0.916383
IT5 result: 2147481983
IT5 millis: 0.873984
Still the same overall results. I've tried with randomizing the array only once, and repeating the tests over the same array, I get faster results overall, but the same 2x difference between Java Math.max and OpsMath.max.
-XX:+PrintAssembly
. It will save your ass when you get confused by things like this. – ElectromechanicalBenchmark
fromCaliper
(code.google.com/p/caliper)? If not, read this: code.google.com/p/caliper/wiki/JavaMicrobenchmarks – Remonstrantfinal double end = (System.nanoTime() - start;
won't compile, so this isn't what you ran. – PointMath.max
is far from the only thing that needs to be inlined and compiled properly. – Electromechanicalfinal
s has any performance impact – Standifer