I just found out about a pretty weird behaviour of Scala scoping when bytecode generated from Scala code is used from Java code. Consider the following snippet using Spark (Spark 1.4, Hadoop 2.6):
import java.util.Arrays;
import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.broadcast.Broadcast;
public class Test {
public static void main(String[] args) {
JavaSparkContext sc =
new JavaSparkContext(new SparkConf()
.setMaster("local[*]")
.setAppName("test"));
Broadcast<List<Integer>> broadcast = sc.broadcast(Arrays.asList(1, 2, 3));
broadcast.destroy(true);
// fails with java.io.IOException: org.apache.spark.SparkException:
// Attempted to use Broadcast(0) after it was destroyed
sc.parallelize(Arrays.asList("task1", "task2"), 2)
.foreach(x -> System.out.println(broadcast.getValue()));
}
}
This code fails, which is expected as I voluntarily destroy a Broadcast
before using it, but the thing is that in my mental model it should not even compile, let alone running fine.
Indeed, Broadcast.destroy(Boolean)
is declared as private[spark]
so it should not be visible from my code. I'll try looking at the bytecode of Broadcast
but it's not my specialty, that's why I prefer posting this question. Also, sorry I was too lazy to create an example that does not depend on Spark, but at least you get the idea. Note that I can use various package-private methods of Spark, it's not just about Broadcast
.
Any idea of what's going on ?