Cost of locking in .NET vs Java

Asked 27/8, 2011 at 17:0 Answered 19/6, 2015 at 7:14

Solved c#java performance locking synchronized

I was playing with Disruptor framework and its port for .NET platform and found an interesting case. May be I completely miss something so I'm looking for help from almighty Community.

        long iterations = 500*1000*1000;
        long testValue = 1;

        //.NET 4.0. Release build. Mean time - 26 secs;
        object lockObject = new object();
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            lock (lockObject)
            {
                testValue++;    
            }
        }
        sw.Stop();

        //Java 6.25. Default JVM params. Mean time - 17 secs.
        Object lock = new Object();
        long start = System.currentTimeMillis();
        for (int i = 0; i < iterations; i++)
        {
                synchronized (lock)
                {
                    testValue++;
                }
        }
        long stop = System.currentTimeMillis();

It seems that acquiring the lock in the scenario with a signle thread in .NET costs just 50% more than in Java. At first I was suspicious at timers but I've ran the same test for a few times with results just around mentioned above mean values. Then I was suspicious at synchronized block of code but it does no more than just monitorenter / monitorexit byte code instructions - the same thing as lock keyword in .NET. Any other ideas why taking a lock is so expensive in .NET vs Java?

Urbanity answered 27/8, 2011 at 17:0 Comment(13)

did you measure w/o the lock to be sure you have a fair baseline? Also JITing might be a factor here, could put the code in a separate method, run it once and only then start measuring. – Trews 27/8, 2011 at 17:5

For what it's worth; while it isn't a direct answer, I would use Interlocked.Increment(ref testValue) instead of locking, or lock it once just for the duration of the entire loop. The performance differences you are seeing shouldn't be evident if you use a proper locking pattern. – Chapell 27/8, 2011 at 17:6

@Trews You're right about a baseline, but I don't think JITing should take an appreciable amount of time for a single, simple method. – Christmann 27/8, 2011 at 17:7

Did you run the C# code without debugger attached? – Frederigo 27/8, 2011 at 17:9

@vcsjones: Not that I disagree, but Interlocked.Increment would take longer, which just defies and answer to the question. – Fullbodied 27/8, 2011 at 17:20

For me even Monitor.Enter + Monitor.Exit inside this loop works better than lock keyword - 22 sec vs 25 sec. – Genvieve 27/8, 2011 at 17:22

On my computer, both cases run for about 13 seconds. – Frederigo 27/8, 2011 at 17:24

@Mr. Disappointment using Interlocked.Increment instead of locking (assuming the lock is inside the loop) is about 60% faster for me. My point being that all this code is doing is atomically incrementing a long. There are better ways to do that instead of using a full-blown lock, like Interlocked. – Chapell 27/8, 2011 at 17:32

@vcsjones: I get an average of 60 seconds using Interlocked.Increment on this machine; lock results are similar to the OPs, while ReaderWriterLockSlim.EnterWriteLock comes just ahead of the Interlocked results. – Fullbodied 27/8, 2011 at 17:37

@vcjones: I've started with CAS because this is what is used in Disruptor framework and didn't see a lot of difference between JVM and CLR. The thing that caught my eye was the time to take an unconteded lock. – Urbanity 27/8, 2011 at 17:54

@svick. Yes, without debugger. Just launched from console. – Urbanity 27/8, 2011 at 17:56

@Andrei, if you launch it from VS using F5, it starts with the debugger attached. If you use Ctrl+F5, it starts without it. – Frederigo 27/8, 2011 at 18:42

What are the results like if you use a SpinLock? – Monochasium 28/8, 2011 at 3:1

Yes, it looks like taking an uncontended lock is more expensive in .NET than in Java. (The results on my netbook are slightly more dramatic still.)

There are various aspects to performance which will be faster on one platform than another, sometimes to this extent. The HotSpot JIT and the .NET JIT are pretty radically different in various ways - not least because the .NET JIT only runs once on IL, whereas HotSpot is able to optimize more and more as a particular piece of code is run more and more often.

The important question is whether this is really significant. If your real life application spends really acquires an uncontented lock 500 million times every minute, it probably is significant - and you should probably redesign your app somewhat. If your real life application actually does real work within the lock (or between acquisitions of the lock) then it's unlikely to be a real bottleneck.

I recently found two .NET gotchas (part one; part two) which I'm having to work round as I'm writing a "system level library" and they would have made a significant difference when an application did a lot of date/time parsing - but this sort of micro-optimization is rarely worth doing.

Secondhand answered 27/8, 2011 at 17:21 Comment(1)

Thanks, so it could be hat one thing is just a bit slower than another and it's not me who completely missed some obvious point. – Urbanity 28/8, 2011 at 7:48

The first thing to remember about micro-benchmarks is that Java is particularly good at identifying and eliminating code which doesn't do anything. I have found that again and again, Java does pointless code faster than any other language. ;)

If Java is surprising fast compared to another language the first question should be; Does the code do anything remotely useful? (or even look like it could be useful)

Java tends to loop unroll more than it used to. It can also combine locks. As your test is uncontested and does do anything your code is like to look something like.

for (int i = 0; i < iterations; i+=8) {
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
    synchronized (lock) {
        testValue++;
    }
}

which becomes

for (int i = 0; i < iterations; i+=8) {
    synchronized (lock) {
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
        testValue++;
    }
}

since testValue is not used.

for (int i = 0; i < iterations; i+=8) {
    synchronized (lock) {
    }
}

and finally

{ }

Lambertson answered 27/8, 2011 at 20:40 Comment(1)

@RoundTower... it takes time to decide whether a loop is worth unrolling, to determine whether the locks can be amalgamated, and to perform escape analysis. Sometimes the loop may already be partly calculated WHILE these decisions are being made... In any case, "the JVM really sucks" is neither a helpful nor informed statement. – Gaselier 29/8, 2011 at 10:11

Is the variable 'testValue' local to a method? If so, it is possible that the JRE has detected that locking is unnecessary as the variable is local to one thread and is therefore not locking at all.

This is explained here.

To show just how hard it is to tell what optimisations the JVM decides to do - and when it decides to do it - examine these results from running your code three consecutive times:

public static void main(String[] args) {
  System.out.println("Java version: " + System.getProperty("java.version"));
  System.out.println("First call : " + doIt(500 * 1000 * 1000, 1)); // 14 secs
  System.out.println("Second call: " + doIt(500 * 1000 * 1000, 1)); // 1 sec
  System.out.println("Third call : " + doIt(500 * 1000 * 1000, 1)); // 0.4 secs
}

private static String doIt(final long iterations, long testValue) {
    Object lock = new Object();
    long start = System.currentTimeMillis();
    for (int i = 0; i < iterations; i++) {
        synchronized (lock) {
            testValue++;
        }
    }
    long stop = System.currentTimeMillis();
    return (stop - start) + " ms, result = " + testValue;
}

These results are so hard to explain, I think only a JVM engineer could help shed light.

Gaselier answered 27/8, 2011 at 18:23 Comment(3)

Sorry, I wasn't clear enought with my examples but both variables iterations and testValue are passed as parameters to the method. Anyway, thanks for the link to nice article. – Urbanity 27/8, 2011 at 18:36

Hi Andrei, I see... But in Java longs are passed by value, and therefore are still local to the method and therefore the thread. They are almost certainly still candidates for locking to be ignored. – Gaselier 28/8, 2011 at 15:6

Steve, I understand your point but if locking is ignored or all locks are collapsed into the single one then this code should not take 17 secs to execute and there should not be any difference between the same code without any locking (~1 sec on my box). – Urbanity 28/8, 2011 at 15:38

Remember, both are extremely fast; we are talking about 50 CPU cycles for lock-read-write-unlock here.

In Java, I compared it with a simulated impl in uncontended case

volatile int waitingList=0;

    AtomicInteger x = new AtomicInteger(0);
    for (int i = 0; i < iterations; i++)
    {
        while( ! x.compareAndSet(0, 1) )
            ;

        testValue++;

        if(waitingList!=0)
            ;
        x.set(0);
    }

This bare bone simulation is a little faster than the synchronized version, time taken is 15/17.

That shows that in your test case, Java didn't do crazy optimizations, it honestly did lock-read-update-unlock for each iteration. However, Java's impl is as fast as the bare bone impl; it can't be any faster.

Although C#'s impl is also close to minimum, it apparently does one or two things more than Java. I'm not familiar with C#, but this probably indicates some semantics difference, so C# has to do something extra.

Aboveboard answered 30/8, 2011 at 6:0 Comment(0)

When I investigated lock/sync costs a few years ago in Java I ended up with a big question how locking affected over-all performance also for other threads accessing any kind of memory. What may be affected is the CPU cache, especially on a multi-processor computer - and depends on how the specific CPU architecture handles cache synchronization. I believe the overall performance is not affected on a modern single CPU architecture, but I am not sure.

Anyway, when in doubt especially when multi-process computers may be used to host the software, it may be worth putting a lock on a higher level over several operations.

Talmud answered 12/9, 2013 at 19:32 Comment(0)

The Java JIT will optimize the synchronization away as the lock object is thread local (i.e. it is confined to the thread's stack and never shared) and thus can never be synchronized on from another thread. I'm not sure if the .NET JIT will do this.

See this very informative article, especially the part on lock elision.

Huffish answered 19/6, 2015 at 7:14 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags