Why does parallel stream with lambda in static initializer cause a deadlock?
Asked Answered
S

3

93

I came across a strange situation where using a parallel stream with a lambda in a static initializer takes seemingly forever with no CPU utilization. Here's the code:

class Deadlock {
    static {
        IntStream.range(0, 10000).parallel().map(i -> i).count();
        System.out.println("done");
    }
    public static void main(final String[] args) {}
}

This appears to be a minimum reproducing test case for this behavior. If I:

  • put the block in the main method instead of a static initializer,
  • remove parallelization, or
  • remove the lambda,

the code instantly completes. Can anyone explain this behavior? Is it a bug or is this intended?

I am using OpenJDK version 1.8.0_66-internal.

Siward answered 15/1, 2016 at 21:22 Comment(11)
Reproducable with Oracle 1.8.0_66.Danish
With range (0, 1) the program terminates normally. With (0, 2) or higher hangs.Illjudged
similar question: #34223169Shew
Actually it is exactly the same question/issue, just with a different API.Edgeworth
You are trying to use a class, in a background thread, when you haven't finished initialising the class so it can't be used in a background thread.Monomer
@PeterLawrey Put that way it sounds obvious, but it isn't at all obvious that the lambda needs to use the outer class.Siward
@Solomonoff'sSecret as i -> i is not a method reference it is a static method implemented in the Deadlock class. If replace i -> i with Function.identity() this code should be fine.Monomer
@PeterLawrey i -> i becomes an implementation of a functional interface that behaves as a real object. Doesn't that mean it needs to be invoked polymorphically and if so, how can it simply be a static method? Doesn't it at least need to have a method that implements the method of the functional interface which in the current implementation delegates to the static method? Or is there some magic happening behind the scenes?Siward
@Solomonoff'sSecret There is a class generated at runtime with one method which calls the static method in the class you have defined.Monomer
Related to Are Java static initializers thread safe?;Believe
I can't understan why sometimes it hangs but sometimes not. I always run app with the same arguments.Pullulate
M
74

I found a bug report of a very similar case (JDK-8143380) which was closed as "Not an Issue" by Stuart Marks:

This is a class initialization deadlock. The test program's main thread executes the class static initializer, which sets the initialization in-progress flag for the class; this flag remains set until the static initializer completes. The static initializer executes a parallel stream, which causes lambda expressions to be evaluated in other threads. Those threads block waiting for the class to complete initialization. However, the main thread is blocked waiting for the parallel tasks to complete, resulting in deadlock.

The test program should be changed to move the parallel stream logic outside of the class static initializer. Closing as Not an Issue.


I was able to find another bug report of that (JDK-8136753), also closed as "Not an Issue" by Stuart Marks:

This is a deadlock that is occurring because the Fruit enum's static initializer is interacting badly with class initialization.

See the Java Language Specification, section 12.4.2 for details on class initialization.

http://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.2

Briefly, what's happening is as follows.

  1. The main thread references the Fruit class and starts the initialization process. This sets the initialization in-progress flag and runs the static initializer on the main thread.
  2. The static initializer runs some code in another thread and waits for it to finish. This example uses parallel streams, but this has nothing to do with streams per se. Executing code in another thread by any means, and waiting for that code to finish, will have the same effect.
  3. The code in the other thread references the Fruit class, which checks the initialization in-progress flag. This causes the other thread to block until the flag is cleared. (See step 2 of JLS 12.4.2.)
  4. The main thread is blocked waiting for the other thread to terminate, so the static initializer never completes. Since the initialization in-progress flag isn't cleared until after the static initializer completes, the threads are deadlocked.

To avoid this problem, make sure that a class's static initialization completes quickly, without causing other threads to execute code that requires this class to have completed initialization.

Closing as Not an Issue.


Note that FindBugs has an open issue for adding a warning for this situation.

Melitamelitopol answered 15/1, 2016 at 21:36 Comment(9)
"This was considered when we designed the feature" and "We know what causes this bug but not how to fix it" do not mean "this is not a bug". This is absolutely a bug.Perceptible
so, no lambda in static initializers? This smells like a bug and stings like a bug.Cornered
@bayou.io The main issue is using threads within static initializers, not lambdas.Necrotomy
BTW Tunaki thanks for digging up my bug reports. :-)Necrotomy
@StuartMarks - How can you blame people for using parallel stream? That's the biggest selling point of Stream, at least from Goetz. The thing is, once you hand off a functional object to another API, you can't be sure how it is gonna be used. If the function is created by an anonymous class that is coupled to the class being initialized, an alarm instantly sounds off. However, in this case of lambda, the coupling is implicit and incidental; and that is the main pitfall here.Cornered
Secondly, such deadlock can exist without "using threads". Imagine the static initializer of class A registers a lambda to a registry; the lambda seems self-contained and immediately usable. Now anyone else (on other threads) trying to use the lambda could trigger deadlocks.Cornered
There is absolutely no problem that we have to live with some limitations of implementations; everybody would be understanding of that. But I don't think it is right, in this particular case, to simply chalk it up as a non-issue, and blame all use cases that reveal the problem.Cornered
@bayou.io: it’s the same thing on class level as it would be in a constructor, letting this escape during object construction. The basic rule is, don’t use multi-threaded operations in initializers. I don’t think that this is hard to understand. Your example of registering a lambda implemented function into a registry is a different thing, it doesn’t create deadlocks unless you are going to wait for one these blocked background threads. Nevertheless, I strongly discourage from doing such operations in a class initializer. It’s not what they are meant for.Gatt
I guess the programming style lesson is: keep static initalizers simple.Believe
E
20

For those who are wondering where are the other threads referencing the Deadlock class itself, Java lambdas behave like you wrote this:

public class Deadlock {
    public static int lambda1(int i) {
        return i;
    }
    static {
        IntStream.range(0, 10000).parallel().map(new IntUnaryOperator() {
            @Override
            public int applyAsInt(int operand) {
                return lambda1(operand);
            }
        }).count();
        System.out.println("done");
    }
    public static void main(final String[] args) {}
}

With regular anonymous classes there is no deadlock:

public class Deadlock {
    static {
        IntStream.range(0, 10000).parallel().map(new IntUnaryOperator() {
            @Override
            public int applyAsInt(int operand) {
                return operand;
            }
        }).count();
        System.out.println("done");
    }
    public static void main(final String[] args) {}
}
Excitor answered 15/1, 2016 at 21:56 Comment(15)
This is so very strange. Could you provide a citation or an explanation as to why lambdas behave that way? I always thought they were equivalent to an anonymous class but you are right that an anonymous class doesn't deadlock.Siward
@Solomonoff'sSecret I buried myself in the spec, but didn't find anything relevant.Excitor
@Solomonoff'sSecret It's an implementation choice. The code in the lambda has to go somewhere. Javac compiles it into a static method in the containing class (analogous to lambda1 i this example). Putting each lambda into its own class would have been considerably more expensive.Necrotomy
@StuartMarks Given that the lambda creates a class implementing the functional interface, wouldn't it be just as efficient to put the implementation of the lambda in the implementation of the functional interface's lambda as in the second example of this post? That's certainly the obvious way to do things but I'm sure there's a reason why they're done the way they are.Siward
@Solomonoff'sSecret The lambda might create a class at runtime (via java.lang.invoke.LambdaMetafactory), but the lambda body must be placed somewhere at compile time. The lambda classes can thus take advantage of some VM magic to be less expensive than normal classes loaded from .class files.Scratch
@Solomonoff'sSecret Yes, Jeffrey Bosboom's reply is correct. If in a future JVM it becomes possible to add a method to an existing class, the metafactory might do that instead of spinning a new class. (Pure speculation.)Necrotomy
@StuartMarks Perhaps a better implementation would be to create a class Deadlock$Lambdas with static methods for all the lambdas. Then the lambdas wouldn't depend on Deadlock. Granted the benefit would be extremely slim and this implementation might have a speed/memory penalty due to increasing the number of classes.Siward
@Solomonoff'sSecret -- That is only needed for lambdas created during static initialization. Unfortunately, that is a runtime property; javac can't be able to find all such lambdas. At this point, it falls on java programmers to be aware of this issue, and manually add a separate class to work around the problem.Cornered
@bayou.io The idea was not to use static analysis to determine which lambdas are used in static initializers. The idea was that all lambdas (say, defined in a static context) would have their implementations in a separate class. But I suppose that isn't viable in general because the lambda could use a field in the original static class, so the lambda may need to be able to see the static class in which it's written.Siward
@Solomonoff'sSecret -- non-static code can be invoked during static initialization too. It is a tricky phase that requires careful reasoning.Cornered
@bayou.io Right, just like a constructor can view an uninitialized field by invoking a method. However, the compiler does take partial measures to protect you against that. I'm fine with the current behavior but it is unintuitive so if there were some easy way to prevent most errors with no collateral damage, it would be nice.Siward
@Solomonoff's Secret: don’t judge by looking at such trivial lambda expressions like your i -> i; they won’t be the norm. Lambda expressions may use all members of their surrounding class, including private ones, and that makes the defining class itself their natural place. Letting all these use cases suffer from an implementation optimized for the special case of class initializers with multi-threaded use of trivial lambda expressions, not using members of their defining class, is not a viable option.Gatt
@Gatt You are right, my suggestion isn't workable in general.Siward
On my PC, first snippet sometimes leads to deadlock but sometimes - not. Is it expected begaviour?Pullulate
@Pullulate As far as I remember the first snippet caused a deadlock for me every time, but it is nit guaranteed, depends on the implementation of parallel streams. If you need more specifics I might look into it abit more for you.Excitor
K
18

There is an excellent explanation of this problem by Andrei Pangin, dated by 07 Apr 2015. It is available here, but it is written in Russian (I suggest to review code samples anyway - they are international). The general problem is a lock during class initialization.

Here are some quotes from the article:


According to JLS, every class has a unique initialization lock that is captured during initialization. When other thread tries to access this class during initialization, it will be blocked on the lock until initialization completes. When classes are initialized concurrently, it is possible to get a deadlock.

I wrote a simple program that calculates the sum of integers, what should it print?

public class StreamSum {
    static final int SUM = IntStream.range(0, 100).parallel().reduce((n, m) -> n + m).getAsInt();

    public static void main(String[] args) {
        System.out.println(SUM);
    }
} 

Now remove parallel() or replace lambda with Integer::sum call - what will change?

Here we see deadlock again [there were some examples of deadlocks in class initializers previously in the article]. Because of the parallel() stream operations run in a separate thread pool. These threads try to execute lambda body, which is written in bytecode as a private static method inside StreamSum class. But this method can not be executed before the completion of class static initializer, which waits the results of stream completion.

What is more mindblowing: this code works differently in different environments. It will work correctly on a single CPU machine and will most likely hang on a multi CPU machine. This difference comes from the Fork-Join pool implementation. You can verify it yourself changing the parameter -Djava.util.concurrent.ForkJoinPool.common.parallelism=N

Keyway answered 18/1, 2016 at 11:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.