Bench Mark in Multi threaded environment
Asked Answered
O

3

7

I was learning multi threading and found slow down of Object.hashCode in multi threaded environment as it is taking over twice as long to compute the default hash code running 4 threads vs 1 thread for the same number of objects.

But as per my understanding it should take a similar amount of time doing this in parallel.

You can change the number of threads. Each thread has the same amount of work to do so you'd hope that running 4 threads on a my machine which is quad core machine might take about the same time as running a single thread.

I'm seeing ~2.3 seconds for 4x but .9 s for 1x.

Is there any gap in my understanding , please help me understanding this behaviour.

public class ObjectHashCodePerformance {

private static final int THREAD_COUNT = 4;
private static final int ITERATIONS = 20000000;

public static void main(final String[] args) throws Exception {
    long start = System.currentTimeMillis();
    new ObjectHashCodePerformance().run();
    System.err.println(System.currentTimeMillis() - start);
 }

private final ExecutorService _sevice =   Executors.newFixedThreadPool(THREAD_COUNT,
        new ThreadFactory() {
            private final ThreadFactory _delegate =   Executors.defaultThreadFactory();

            @Override
            public Thread newThread(final Runnable r) {
                Thread thread = _delegate.newThread(r);
                thread.setDaemon(true);
                return thread;
            }
        });

    private void run() throws Exception {
    Callable<Void> work = new java.util.concurrent.Callable<Void>() {
        @Override
        public Void call() throws Exception {
            for (int i = 0; i < ITERATIONS; i++) {
                Object object = new Object();
                object.hashCode();
            }
            return null;
        }
    };
    @SuppressWarnings("unchecked")
    Callable<Void>[] allWork = new Callable[THREAD_COUNT];
    Arrays.fill(allWork, work);
    List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
    for (Future<Void> future : futures) {
        future.get();
    }
 }

 }

For thread count 4 Output is

~2.3 seconds

For thread count 1 Output is

~.9 seconds
Ouabain answered 16/12, 2015 at 13:47 Comment(5)
Please share the changes you made between 1 and 4 threadsChildbearing
The time measurement does not necessarily tell you much here. See #504603Chip
You're probably not measuring the right thing: GC, creation of the executors and of its threads, thread coordination, object instantiations, memory allocations, etc. etc. Anyway, the beanchmark is pretty useless, since you won't be able to change anything to Object's hashCode() implementation anyway.Record
You're not measuring hashCode(), you're measuring the instantiation of 20 million Objects when single threaded, and 80 million Objects when running 4 threads. Move the new Object() logic out of the for loop in your Callable, then you will be measuring hashCode()Oringas
Besides, hashCode for Object is actually implented with a native platform-specific call, so you likely won't find any performance issues there.Gelding
C
7

I've created a simple JMH benchmark to test the various cases:

@Fork(1)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
public class HashCodeBenchmark {
    private final Object object = new Object();

    @Benchmark
    @Threads(1)
    public void singleThread(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }

    @Benchmark
    @Threads(2)
    public void twoThreads(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }

    @Benchmark
    @Threads(4)
    public void fourThreads(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }

    @Benchmark
    @Threads(8)
    public void eightThreads(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }
}

And the results are as follows:

Benchmark                       Mode  Cnt  Score   Error  Units
HashCodeBenchmark.eightThreads  avgt   10  5.710 ± 0.087  ns/op
HashCodeBenchmark.fourThreads   avgt   10  3.603 ± 0.169  ns/op
HashCodeBenchmark.singleThread  avgt   10  3.063 ± 0.011  ns/op
HashCodeBenchmark.twoThreads    avgt   10  3.067 ± 0.034  ns/op

So we can see that as long as there are no more threads than cores, the time per hashcode remains the same.

PS: As @Tom Cools had commented - you are measuring the allocation speed and not the hashCode() speed in your test.

Candi answered 16/12, 2015 at 14:39 Comment(1)
Can you please tell ..about tool u used for bench markingOuabain
B
1

See Palamino's comment:

You're not measuring hashCode(), you're measuring the instantiation of 20 million Objects when single threaded, and 80 million Objects when running 4 threads. Move the new Object() logic out of the for loop in your Callable, then you will be measuring hashCode() – Palamino

Barnet answered 16/12, 2015 at 13:53 Comment(2)
He said that you can change the thread count to observe the problem that he describedChip
I moved it out same result..:(Ouabain
A
0

Two issue I see with the code:

  1. The size of allWork [] array equal to ITERATIONS.
  2. And while iterating, in the call() method make sure that each thread gets its share of load. ITERATIONS/THREAD_COUNT.

Below is the modified version you can try:

import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadFactory;

 public class ObjectHashCodePerformance {

private static final int THREAD_COUNT = 1;
private static final int ITERATIONS = 20000;
private final Object object = new Object();

public static void main(final String[] args) throws Exception {
    long start = System.currentTimeMillis();
    new ObjectHashCodePerformance().run();
    System.err.println(System.currentTimeMillis() - start);
 }

private final ExecutorService _sevice =   Executors.newFixedThreadPool(THREAD_COUNT,
        new ThreadFactory() {
            private final ThreadFactory _delegate =   Executors.defaultThreadFactory();

            @Override
            public Thread newThread(final Runnable r) {
                Thread thread = _delegate.newThread(r);
                thread.setDaemon(true);
                return thread;
            }
        });

    private void run() throws Exception {
    Callable<Void> work = new java.util.concurrent.Callable<Void>() {
        @Override
        public Void call() throws Exception {
            for (int i = 0; i < ITERATIONS/THREAD_COUNT; i++) {
                object.hashCode();
            }
            return null;
        }
    };
    @SuppressWarnings("unchecked")
    Callable<Void>[] allWork = new Callable[ITERATIONS];
    Arrays.fill(allWork, work);
    List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
    System.out.println("Futures size : " + futures.size());
    for (Future<Void> future : futures) {
        future.get();
    }
 }

 }
Adventitia answered 16/12, 2015 at 14:41 Comment(1)
in the run()/call() method you are still allocating objects - so you are measuring the hashcode plus the allocation speed. Your answer is flawed.Candi

© 2022 - 2024 — McMap. All rights reserved.