How does incrementAndGet method of AtomicLong works internally?
Asked Answered
S

1

6

I am using incrementAndGet method of AtomicLong in my multithreaded code to measure the performance of some of our client side code.

@Override
public void run() {


   long start = System.nanoTime();

   attributes = client.getAttributes(columnsList);

   long end = System.nanoTime() - start;

   final AtomicLong before = select.putIfAbsent(end / 1000000L, new AtomicLong(1L));
        if (before != null) {
            before.incrementAndGet();
        }
}

In the above code, I am trying to measure how much time-

client.getAttributes(columnsList);

is taking.

As far as I know incrementAndGet method will increment the current value by one in an atomic way. Which means it might be possible each thread will wait for other threads to increment the value. Am I right? Meaning it will get blocked?

And also does this affect the way I am measuring performance of any method? Meaning it will add some extra time to that measurement as well?

Why I am asking this is because I am trying to benchmark most of our client side code and the server side code and if I need to measure how much time each method is taking, then I am doing it simply like this-

Whatever code I want to measure, I usually put the below line just above that method

long start = System.nanoTime();

And these two lines after the same method but with different ConcurrentHashMap

long end = System.nanoTime() - start;

final AtomicLong before = select.putIfAbsent(end / 1000000L, new AtomicLong(1L));
    if (before != null) {
        before.incrementAndGet();
    }

So if I am using incrementAndGet method and if it is adding extra time to my performance measurement then it might be possible that I am not getting accurate result?

Update:-

This is the below method, I got when I did F3 on the incrementAndGet in eclipse.

So there is a synchronized block here. That means each thread will wait here for other threads. And it's a blocking call.

/**
 * Atomically increments by one the current value.
 *
 * @return the updated value
 */
public final synchronized long incrementAndGet() {                          //IBM-perf_AtomicLong
   ++value;                                                                 //IBM-perf_AtomicLong
   return value;                                                            //IBM-perf_AtomicLong
}

Aaah. I just checked my JVM and I am running IBM JVM as compared to SUN JVM. As I am working in a company where I can't changed this particular thing.

So is there any way to avoid this lock based solution to measure the performance/benchmark of any method? Keeping in mind I am running IBM JVM.

Thanks for the help.

Shank answered 13/4, 2013 at 20:46 Comment(0)
B
8

Just don't worry. Unless you are writing something like a Forex platform or something, that tiny, tiny latency will not matter. To give you an idea, we'd be talking at the order of nano seconds, whereas in application code we usually talk in milliseconds. Further, locks of JVM has improved significantly. If you have low contention (which is the norm), the performance difference between lock based solution and non-blocking solution would be tiny.

Surprising though that IBM JVM uses a lock for AtomicLong. I thought all major implementation uses non-blocking CAS. Here is the common implementation which uses CAS:

/**
 * Atomically increments by one the current value.
 *
 * @return the updated value
 */
public final long incrementAndGet() {
    for (;;) {
        long current = get();
        long next = current + 1;
        if (compareAndSet(current, next))
            return next;
    }
}

Response to follow up comment:
As a very rough rule of thumb, if your current end-to-end response time (or the response time requirement) is over 30ms, I really would not worry about this as whatever time spent incrementing the long would be in the realm of nanoseconds. I'm almost certain that you'll find other places to optimise that gives you more improvement (e.g. milliseconds).

However, you could just copy the Sun JVM implementation of AtomicLong to use the non-blocking implementation to use instead, as IBM VM should have CAS operations, too. This is only likely to result in significant improvement if you expect moderate to high contention (lots of threads). If you don't, I think the locking solution can perform nearly identically with the current, improvement lock implementation (available from JDK6, if I remember).

In fact, if you have very high contention, a lock can perform BETTER than a non-blocking solution. So you'd ideally have to use two implementation and compare the results... which is kinda why I think you shouldn't bother, because in that time, you could have made a few performance improvements elsewhere that gives you literary more than 1 mil times the improvement you could attain through tackling here.

Boorish answered 14/4, 2013 at 0:29 Comment(4)
I am also surprised that the IBM JDK uses a lock. I suspect that it actually doesn't, and that the code @TechGeeky sees is not the code that actually executes. It may be placeholder code in the library that is replaced with some special-case code at runtime, in the manner of an intrinsic.Heist
Thanks Emno for the suggestion. But in my company where I am working on the project even a single milliseconds counts a lot. As we have very tight latency requirements end to end for the client side. So that is the reason right now I care about that a lot. Why? Because I am doing Load and Performance testing of the client and service side code both. So if my results gets wrong then that will affect the end results. It will be great if can you provide some solution for that.Shank
@TechGeeky I have worked on a number of systems where the end to end latency was well under 1 milli-second. AtomicLong can take in the order of 100 nano-seconds (when contented) which isn't that long, but the simplest solution is to work only on thread local data as much as possible and you don't have this issue in the first place.Whippoorwill
A simple way to test if something is an intrinsic (apparent from reading the code of the JVM) is to copy the method into another class and compare micro-benchmarks. The intrinsic method can be 2-5x faster than the same code in another class.Whippoorwill

© 2022 - 2024 — McMap. All rights reserved.