I have been micro benchmarking the following code and I noticed something interesting that I am hoping somebody can shed more light on. It leads to a situation where it looks like a for loop can carry on running fast while blocking other threads within the JVM. If that is true then I would like to understand why, if it is not true then any insight into what I may be missing would be appreciated.
To build up the situation, let me walk you through the benchmark that I am running and its results.
The code is pretty simple, iterate over every element in an array, summing its contents. Repeat 'targetCount' times.
public class UncontendedByteArrayReadBM extends Benchmark {
private int arraySize;
private byte[] array;
public UncontendedByteArrayReadBM( int arraySize ) {
super( "array reads" );
this.arraySize = arraySize;
}
@Override
public void setUp() {
super.setUp();
array = new byte[arraySize];
}
@Override
public void tearDown() {
array = null;
}
@Override
public BenchmarkResult invoke( int targetCount ) {
long sum = 0;
for ( int i=0; i<targetCount; i++ ) {
for ( int j=0; j<arraySize; j++ ) {
sum += array[j];
}
}
return new BenchmarkResult( ((long)targetCount)*arraySize, "uncontended byte array reads", sum );
}
}
On my 4 core - 2Ghz Intel Sandy Bridged/i7 laptop, running Java 6 (Oracle JVM) on OSX. This code repeatedly runs to
2.626852686364034 uncontended byte array reads/ns [totalTestRun=3806.837ms]
(I have culled the repeated runs used to warm the JVM)
This result appears reasonable to me.
Where this got interesting is when I started measuring JVM Jitter. To do that I start a background daemon thread that sleeps for 1ms and then figures out how much longer than 1ms it was really asleep for. And I changed the report to print out the max jitter for each repeated test run.
2.6109858273078306 uncontended byte array reads/ns [maxJitter=0.411ms totalTestRun=3829.971ms]
To get an idea of 'normal' jitter for my environment, before starting the actual test runs I monitor the jitter with no work going on and readings such as the following are typical (all in ms). Thus a jitter of 0.411ms is normal and not so interesting.
getMaxJitterMillis() = 0.599
getMaxJitterMillis() = 0.37
getMaxJitterMillis() = 0.352
I have included the code for how I measured Jitter at the end of this question.
The interesting part however, and yes it does happen during the 'JVM warmup' period and is thus not 'normal' but I would like to understand in more detail is the following:
2.4519521584902644 uncontended byte array reads/ns [maxJitter=2561.222ms totalTestRun=4078.383ms]
Notice that the jitter is over 2.5 seconds. Usually I would put this down to GC. However I did trigger a System.gc() before the test runs, AND -XX:+PrintGCDetails shows no GC at this time. Infact there is no GC during any of the test runs as next to no Object allocation is happening in this test of summing up pre-allocated bytes. It also happens every time I run a new test, and thus I am not suspecting it to be interference from some other process that happens randomly.
My curiousity rocketed because when I noticed that while the jitter is very high the total runtime, and indeed the number of array element reads per nanosecond remained more or less unchanged. So here is a situation where a thread is getting heavily lagged on a 4 core machine while the work thread itself is not getting lagged and there is no GC going on.
Investigating further I looked into what the Hotspot compiler was doing and found the following via -XX:+PrintCompilation:
2632 2% com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ 14 (65 bytes)
6709 2% made not entrant com.mosaic.benchmark.datastructures.array.UncontendedByteArrayReadBM::invoke @ -2 (65 bytes)
The lag between these two lines being printed out was about 2.5 seconds. Right when the method that contains the big for loops has had its optimised code marked as not being entrant any more.
My understanding was that Hotspot runs on a background thread, and when it is ready to swap in a new version of the code it waits for that code that is already running to reach a safe point and then it is swapped in. In the case of a big for-loop that is at the end of each loop body (which may have been unrolled some). I would not expect a 2.5s delay unless this swap out had to perform a stop-the-world event across the JVM. Does it do that when de-optimising previous compiled code?
So my first question for the JVM internal experts is, am I on the right track here? Could the 2.5s delay be due to marking the method as 'made not entrant'; and if so, why does it have such an extreme impact on other threads? If that is not likely to be the cause, then any ideas on what else to investigate would be great.
(for completeness, here is the code that I use for measuring the Jitter)
private static class MeasureJitter extends Thread {
private AtomicLong maxJitterWitnessedNS = new AtomicLong(0);
public MeasureJitter() {
setDaemon( true );
}
public void reset() {
maxJitterWitnessedNS.set( 0 );
}
public double getMaxJitterMillis() {
return maxJitterWitnessedNS.get()/1000000.0;
}
public void printMaxJitterMillis() {
System.out.println( "getMaxJitterMillis() = " + getMaxJitterMillis() );
}
@Override
public void run() {
super.run();
long preSleepNS = System.nanoTime();
while( true ) {
try {
Thread.sleep( 1 );
} catch (InterruptedException e) {
e.printStackTrace();
}
long wakeupNS = System.nanoTime();
long jitterNS = Math.max(0, wakeupNS - (preSleepNS+1000000));
long max = Math.max( maxJitterWitnessedNS.get(), jitterNS );
maxJitterWitnessedNS.lazySet( max );
preSleepNS = wakeupNS;
}
}
}