What is the synchronization cost of calling a synchronized method from a synchronized method?

Asked 25/9, 2013 at 5:18 Answered 20/8, 2017 at 10:54

Is there any difference in performance between this

synchronized void x() {
    y();
}

synchronized void y() {
}

and this

synchronized void x() {
    y();
}

void y() {
}

Venessavenetia answered 25/9, 2013 at 5:18 Comment(1)

I would be surprised if there were a difference. See also oracle.com/technetwork/java/6-performance-137236.html (2.1.1 and 2.1.2) – Mg 25/9, 2013 at 5:28

Yes, there is an additional performance cost, unless and until the JVM inlines the call to y(), which a modern JIT compiler will do in fairly short order. First, consider the case you've presented in which y() is visible outside the class. In this case, the JVM must check on entering y() to ensure that it can enter the monitor on the object; this check will always succeed when the call is coming from x(), but it can't be skipped, because the call could be coming from a client outside the class. This additional check incurs a small cost.

Additionally, consider the case in which y() is private. In this case, the compiler still does not optimize away the synchronization; see the following disassembly of an empty y():

private synchronized void y();
  flags: ACC_PRIVATE, ACC_SYNCHRONIZED
  Code:
    stack=0, locals=1, args_size=1
       0: return

According to the spec's definition of synchronized, each entrance into a synchronized block or method performs lock action on the object, and leaving performs an unlock action. No other thread can acquire that object's monitor until the lock counter goes down to zero. Presumably some sort of static analysis could demonstrate that a private synchronized method is only ever called from within other synchronized methods, but Java's multi-source-file support would make that fragile at best, even ignoring reflection. This means that the JVM must still increment the counter on entering y():

Monitor entry on invocation of a synchronized method, and monitor exit on its return, are handled implicitly by the Java Virtual Machine's method invocation and return instructions, as if monitorenter and monitorexit were used.

@AmolSonawane correctly notes that the JVM may optimize this code at runtime by performing lock coarsening, essentially inlining the y() method. In this case, after the JVM has decided to perform a JIT optimization, calls from x() to y() will not incur any additional performance overhead, but of course calls directly to y() from any other location will still need to acquire the monitor separately.

Baize answered 25/9, 2013 at 5:28 Comment(8)

Why not just post the detailed explanation with your answer? – Interposition 25/9, 2013 at 5:30

@Interposition Several incorrect armchair answers were getting upvoted. – Baize 25/9, 2013 at 5:31

what you are showing is not the assembly but the bytecode - that is not very relevant for performance purposes as the JIT will run something else... As for the specs, not trying to reacquire the lock when y is called from x is compliant (since the lock is already held). – Mg 25/9, 2013 at 5:52

@Mg That's what javap calls "disassembly". Additionally, the JVM spec isn't written in terms of "locks", it's written in terms of the monitor entry count. The JVM spec (2.13) permits the sort of transforms that the C spec permits optimizing compilers, but both by strict semantics and before the JVM inlines the method call, there's additional overhead. (This entire question is, of course, academic, since whether to make y() synchronized is a matter of correctness and not performance.) – Baize 25/9, 2013 at 6:1

@chrylis I agree with all aspects of your final answer. – Mg 25/9, 2013 at 6:23

@Mg Thanks; I've tried to keep up with the other input, since this looks like the sort of useful question you'd expect from a mod-level user. ;-) – Baize 25/9, 2013 at 6:26

depending on what side effect happen inside y() and after the call to it in x() then it may not be able to be optimized out because of memory happens before effects – Pirog 25/9, 2013 at 10:28

@chrylis I added a new test. Not sure if it's flawed but only difference is in a one thread environment. https://mcmap.net/q/639173/-what-is-the-synchronization-cost-of-calling-a-synchronized-method-from-a-synchronized-method – Pimp 20/8, 2017 at 11:22

Results of a micro benchmark run with jmh

Benchmark                      Mean     Mean error    Units
c.a.p.SO18996783.syncOnce      21.003        0.091  nsec/op
c.a.p.SO18996783.syncTwice     20.937        0.108  nsec/op

=> no statistical difference.

Looking at the generated assembly shows that lock coarsening has been performed and y_sync has been inlined in x_sync although it is synchronized.

Full results:

Benchmarks: 
# Running: com.assylias.performance.SO18996783.syncOnce
Iteration   1 (5000ms in 1 thread): 21.049 nsec/op
Iteration   2 (5000ms in 1 thread): 21.052 nsec/op
Iteration   3 (5000ms in 1 thread): 20.959 nsec/op
Iteration   4 (5000ms in 1 thread): 20.977 nsec/op
Iteration   5 (5000ms in 1 thread): 20.977 nsec/op

Run result "syncOnce": 21.003 ±(95%) 0.055 ±(99%) 0.091 nsec/op
Run statistics "syncOnce": min = 20.959, avg = 21.003, max = 21.052, stdev = 0.044
Run confidence intervals "syncOnce": 95% [20.948, 21.058], 99% [20.912, 21.094]

Benchmarks: 
com.assylias.performance.SO18996783.syncTwice
Iteration   1 (5000ms in 1 thread): 21.006 nsec/op
Iteration   2 (5000ms in 1 thread): 20.954 nsec/op
Iteration   3 (5000ms in 1 thread): 20.953 nsec/op
Iteration   4 (5000ms in 1 thread): 20.869 nsec/op
Iteration   5 (5000ms in 1 thread): 20.903 nsec/op

Run result "syncTwice": 20.937 ±(95%) 0.065 ±(99%) 0.108 nsec/op
Run statistics "syncTwice": min = 20.869, avg = 20.937, max = 21.006, stdev = 0.052
Run confidence intervals "syncTwice": 95% [20.872, 21.002], 99% [20.829, 21.045]

Mg answered 25/9, 2013 at 5:51 Comment(4)

Keep in mind that this is presumably taking advantage of thread biasing. Is there any practical way to run this test with a contended monitor? – Baize 25/9, 2013 at 5:53

@chrylis Not sure it would make a difference: there is never contention when acquiring the lock in y() since it is already held. – Mg 25/9, 2013 at 5:57

A better test would be to sync once. sync 1000 times. – Pimp 19/8, 2017 at 20:11

@Mg I added a new test. Not sure if it's flawed but only difference is in a one thread environment. https://mcmap.net/q/639173/-what-is-the-synchronization-cost-of-calling-a-synchronized-method-from-a-synchronized-method – Pimp 20/8, 2017 at 11:22

Why not test it!? I ran a quick benchmark. The benchmark() method is called in a loop for warm-up. This may not be super accurate but it does show some consistent interesting pattern.

public class Test {
    public static void main(String[] args) {

        for (int i = 0; i < 100; i++) {
            System.out.println("+++++++++");
            benchMark();
        }
    }

    static void benchMark() {
        Test t = new Test();
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            t.x();
        }
        System.out.println("Double sync:" + (System.nanoTime() - start) / 1e6);

        start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            t.x1();
        }
        System.out.println("Single sync:" + (System.nanoTime() - start) / 1e6);
    }
    synchronized void x() {
        y();
    }
    synchronized void y() {
    }
    synchronized void x1() {
        y1();
    }
    void y1() {
    }
}

Results (last 10)

+++++++++
Double sync:0.021686
Single sync:0.017861
+++++++++
Double sync:0.021447
Single sync:0.017929
+++++++++
Double sync:0.021608
Single sync:0.016563
+++++++++
Double sync:0.022007
Single sync:0.017681
+++++++++
Double sync:0.021454
Single sync:0.017684
+++++++++
Double sync:0.020821
Single sync:0.017776
+++++++++
Double sync:0.021107
Single sync:0.017662
+++++++++
Double sync:0.020832
Single sync:0.017982
+++++++++
Double sync:0.021001
Single sync:0.017615
+++++++++
Double sync:0.042347
Single sync:0.023859

Looks like the second variation is indeed slightly faster.

Uninstructed answered 25/9, 2013 at 5:31 Comment(4)

@Mg It's a microbenchmark with a decent warmup that produces a fairly consistent pattern that matches the ballpark overhead you'd expect for an extra monitor check with otherwise empty methods. What's your criticism of it? – Baize 25/9, 2013 at 5:40

@chrylis 10,000 loops is just enough for the JIT to start kicking in - I would not call it a decent warmup... – Mg 25/9, 2013 at 5:43

Added new answer. – Pimp 19/8, 2017 at 21:3

I added a new test. Not sure if it's flawed but only difference is in a one thread environment. https://mcmap.net/q/639173/-what-is-the-synchronization-cost-of-calling-a-synchronized-method-from-a-synchronized-method – Pimp 20/8, 2017 at 11:23

Test can be found below ( You have to guess what some methods do but nothing complicated ) :

It tests them with 100 threads each and starts counting the averages after 70% of them has completed ( as warmup ).

It prints it out once at the end.

public static final class Test {
        final int                      iterations     =     100;
        final int                      jiterations    = 1000000;
        final int                      count          = (int) (0.7 * iterations);
        final AtomicInteger            finishedSingle = new AtomicInteger(iterations);
        final AtomicInteger            finishedZynced = new AtomicInteger(iterations);
        final MovingAverage.Cumulative singleCum      = new MovingAverage.Cumulative();
        final MovingAverage.Cumulative zyncedCum      = new MovingAverage.Cumulative();
        final MovingAverage            singleConv     = new MovingAverage.Converging(0.5);
        final MovingAverage            zyncedConv     = new MovingAverage.Converging(0.5);

        // -----------------------------------------------------------
        // -----------------------------------------------------------
        public static void main(String[] args) {
                final Test test = new Test();

                for (int i = 0; i < test.iterations; i++) {
                        test.benchmark(i);
                }

                Threads.sleep(1000000);
        }
        // -----------------------------------------------------------
        // -----------------------------------------------------------

        void benchmark(int i) {

                Threads.async(()->{
                        long start = System.nanoTime();

                        for (int j = 0; j < jiterations; j++) {
                                a();
                        }

                        long elapsed = System.nanoTime() - start;
                        int v = this.finishedSingle.decrementAndGet();
                        if ( v <= count ) {
                                singleCum.add (elapsed);
                                singleConv.add(elapsed);
                        }

                        if ( v == 0 ) {
                                System.out.println(elapsed);
                                System.out.println("Single Cum:\t\t" + singleCum.val());
                                System.out.println("Single Conv:\t" + singleConv.val());
                                System.out.println();

                        }
                });

                Threads.async(()->{

                        long start = System.nanoTime();
                        for (int j = 0; j < jiterations; j++) {
                                az();
                        }

                        long elapsed = System.nanoTime() - start;

                        int v = this.finishedZynced.decrementAndGet();
                        if ( v <= count ) {
                                zyncedCum.add(elapsed);
                                zyncedConv.add(elapsed);
                        }

                        if ( v == 0 ) {
                                // Just to avoid the output not overlapping with the one above 
                                Threads.sleep(500);
                                System.out.println();
                                System.out.println("Zynced Cum: \t"  + zyncedCum.val());
                                System.out.println("Zynced Conv:\t" + zyncedConv.val());
                                System.out.println();
                        }
                });

        }                       

        synchronized void a() { b();  }
                     void b() { c();  }
                     void c() { d();  }
                     void d() { e();  }
                     void e() { f();  }
                     void f() { g();  }
                     void g() { h();  }
                     void h() { i();  }
                     void i() { }

        synchronized void az() { bz(); }
        synchronized void bz() { cz(); }
        synchronized void cz() { dz(); }
        synchronized void dz() { ez(); }
        synchronized void ez() { fz(); }
        synchronized void fz() { gz(); }
        synchronized void gz() { hz(); }
        synchronized void hz() { iz(); }
        synchronized void iz() {}
}

MovingAverage.Cumulative add is basically ( performed atomically ): average = (average * (n) + number) / (++n);

MovingAverage.Converging you can look up but uses another formula.

The results after a 50 second warmup:

With: jiterations -> 1000000

Zynced Cum:     3.2017985649516254E11
Zynced Conv:    8.11945143126507E10

Single Cum:     4.747368153507841E11
Single Conv:    8.277793176290959E10

That's nano seconds averages. That's really nothing and even shows that the zynced one takes less time.

With: jiterations -> original * 10 (takes much longer time)

Zynced Cum:     7.462005651190714E11
Zynced Conv:    9.03751742946726E11

Single Cum:     9.088230941676143E11
Single Conv:    9.09877020004914E11

As you can see the results show it's really not a big difference. The zynced one actually has lower average time for the last 30% completions.

With one thread each (iterations = 1) and jiterations = original * 100;

Zynced Cum:     6.9167088486E10
Zynced Conv:    6.9167088486E10

Single Cum:     6.9814404337E10
Single Conv:    6.9814404337E10

In a same thread environment ( removing Threads.async calls )

With: jiterations -> original * 10

Single Cum:     2.940499529542545E8
Single Conv:    5.0342450600964054E7


Zynced Cum:     1.1930525617915475E9
Zynced Conv:    6.672312498662484E8

The zynced one here seems to be slower. On an order of ~10. The reason for this could be due to the zynced one running after each time, who knows. No energy to try the reverse.

Last test run with:

public static final class Test {
        final int                      iterations     =     100;
        final int                      jiterations    = 10000000;
        final int                      count          = (int) (0.7 * iterations);
        final AtomicInteger            finishedSingle = new AtomicInteger(iterations);
        final AtomicInteger            finishedZynced = new AtomicInteger(iterations);
        final MovingAverage.Cumulative singleCum      = new MovingAverage.Cumulative();
        final MovingAverage.Cumulative zyncedCum      = new MovingAverage.Cumulative();
        final MovingAverage            singleConv     = new MovingAverage.Converging(0.5);
        final MovingAverage            zyncedConv     = new MovingAverage.Converging(0.5);

        // -----------------------------------------------------------
        // -----------------------------------------------------------
        public static void main(String[] args) {
                final Test test = new Test();

                for (int i = 0; i < test.iterations; i++) {
                        test.benchmark(i);
                }

                Threads.sleep(1000000);
        }
        // -----------------------------------------------------------
        // -----------------------------------------------------------

        void benchmark(int i) {

                        long start = System.nanoTime();

                        for (int j = 0; j < jiterations; j++) {
                                a();
                        }

                        long elapsed = System.nanoTime() - start;
                        int s = this.finishedSingle.decrementAndGet();
                        if ( s <= count ) {
                                singleCum.add (elapsed);
                                singleConv.add(elapsed);
                        }

                        if ( s == 0 ) {
                                System.out.println(elapsed);
                                System.out.println("Single Cum:\t\t" + singleCum.val());
                                System.out.println("Single Conv:\t" + singleConv.val());
                                System.out.println();

                        }


                        long zstart = System.nanoTime();
                        for (int j = 0; j < jiterations; j++) {
                                az();
                        }

                        long elapzed = System.nanoTime() - zstart;

                        int z = this.finishedZynced.decrementAndGet();
                        if ( z <= count ) {
                                zyncedCum.add(elapzed);
                                zyncedConv.add(elapzed);
                        }

                        if ( z == 0 ) {
                                // Just to avoid the output not overlapping with the one above 
                                Threads.sleep(500);
                                System.out.println();
                                System.out.println("Zynced Cum: \t"  + zyncedCum.val());
                                System.out.println("Zynced Conv:\t" + zyncedConv.val());
                                System.out.println();
                        }

        }                       

        synchronized void a() { b();  }
                     void b() { c();  }
                     void c() { d();  }
                     void d() { e();  }
                     void e() { f();  }
                     void f() { g();  }
                     void g() { h();  }
                     void h() { i();  }
                     void i() { }

        synchronized void az() { bz(); }
        synchronized void bz() { cz(); }
        synchronized void cz() { dz(); }
        synchronized void dz() { ez(); }
        synchronized void ez() { fz(); }
        synchronized void fz() { gz(); }
        synchronized void gz() { hz(); }
        synchronized void hz() { iz(); }
        synchronized void iz() {}
}

Conclusion, there really is no difference.

Pimp answered 20/8, 2017 at 10:54 Comment(0)

In the case where both methods are synchronized, you would be locking monitor twice. So first approach would have additional overhead of lock again. But your JVM can reduce the cost of locking by lock coarsening and may in-line call to y().

Resistance answered 25/9, 2013 at 5:29 Comment(2)

This is not true, if both the mehtods are synchronized and non-static no additional lock is required. – Ardent 25/9, 2013 at 5:31

"A thread t may lock a particular monitor multiple times; each unlock reverses the effect of one lock operation." - Java Language Specification 17.1 – Resistance 25/9, 2013 at 5:37

No difference will be there. Since threads content only to acquire lock at x(). Thread that acquired lock at x() can acquire lock at y() without any contention(Because that is only thread that can reach that point at one particular time). So placing synchronized over there has no effect.

Begley answered 25/9, 2013 at 6:13 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags