Why apache spark does not work with java 10? We get illegal reflective then java.lang.IllegalArgumentException
Asked Answered
E

3

21

Is there any technical reason why spark 2.3 does not work with java 1.10 (as of July 2018)?

Here is the output when I run SparkPi example using spark-submit.

$ ./bin/spark-submit ./examples/src/main/python/pi.py 
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2018-07-13 14:31:30 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-07-13 14:31:31 INFO  SparkContext:54 - Running Spark version 2.3.1
2018-07-13 14:31:31 INFO  SparkContext:54 - Submitted application: PythonPi
2018-07-13 14:31:31 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 58681.
2018-07-13 14:31:31 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-07-13 14:31:31 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-07-13 14:31:31 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-07-13 14:31:31 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-07-13 14:31:31 INFO  DiskBlockManager:54 - Created local directory at /private/var/folders/mp/9hp4l4md4dqgmgyv7g58gbq0ks62rk/T/blockmgr-d24fab4c-c858-4cd8-9b6a-97b02aa630a5
2018-07-13 14:31:31 INFO  MemoryStore:54 - MemoryStore started with capacity 434.4 MB
2018-07-13 14:31:31 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
...
2018-07-13 14:31:32 INFO  StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
Traceback (most recent call last):
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/./examples/src/main/python/pi.py", line 44, in <module>
    count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 862, in reduce
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in collect
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
    at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2299)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
    at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:162)
    at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:844)

2018-07-13 14:31:33 INFO  SparkContext:54 - Invoking stop() from shutdown hook
...

I resolved the issue by switching to Java8 instead of Java10 as mentioned here.

Err answered 13/7, 2018 at 18:17 Comment(7)
"does not work" Please be specific and include more details about how you found that it "does not work".Lyndell
You can start JVM with some flags (like --add-opens java.base/jdk.unsupported=ALL-UNNAMED) which allow JVM to bypass access restrictions imposed by Java 10 modular systemFye
@Fye - Are either Apache Spark project or Databricks planning to document this as the approved method to use Java 10 and beyond? Will it work in Java 11?Marcellmarcella
@RichMeister, have no idea since I have no connections with Apache Spark and Databricks. I suppose at some point they will release versions that are compatible with Java module systemFye
@Fye I'm not sure how Databricks could do that. The Spark app or any app using the JVM reflective access would require cooperation from Oracle Java to set up a legal reflective access operation in the JVM (say 11). Hopefully other than "unsupported=ALL-UNNAMED"Marcellmarcella
@Marcellmarcella Databricks or whatever else company would need to change their code to use classes that are available for real "public" access without accessing JVM internals. That might be challenging but on the other hand you can still use Java 8Fye
@RichMeister, BTW there is a library to make everything accessible to everything: github.com/nqzero/permit-reflectFye
F
16

Committer here. It's actually a fair bit of work to support Java 9+: SPARK-24417

It's also almost done and should be ready for Spark 3.0, which should run on Java 8 through 11 and beyond.

The goal (well, mine) is to make it work without opening up module access. The key issues include:

  • sun.misc.Unsafe usage has to be removed or worked around
  • Changes to the structure of boot classloader
  • Scala support for Java 9+
  • A bunch of dependency updates to work with Java 9+
  • JAXB no longer automatically available
Fart answered 28/2, 2019 at 5:21 Comment(6)
Any idea when the Spark 3.0 release date might be? Even a ball park figure would be great!Corrugation
Middle of 2019 is roughly what people seem to be saying on the dev@ listFart
Update: Java 11 support is done in Spark 3, but Spark 3 probably won't be out until nearer the end of the year.Fart
@SeanOwen Any idea or thoughts around, why Spark doesn't provide an Automatic-Module-Name entry in their MANIFEST.MF even with spark-core_2.12-3.0.0-preview2.jar?Particiaparticipant
I think it might be hard or impossible, as other modules have classes in the same package. There isn't much value in it anyway IMHOFart
Oh, just to close the thread: Spark 3 was released in June 2020 and supports Java 11+. Not tested on Java 14+ but I would be surprised if it didn't work.Fart
S
17

Primary technical reason is that Spark depends heavily on direct access to native memory with sun.misc.Unsafe, which has been made private in Java 9.

Seema answered 13/7, 2018 at 18:24 Comment(2)
It would be good to get an actual dependable solution for this from the Java community. Does Oracle or IBM consider this an issue to deal with in Java 11 that they are plan to address in a formal way? I would think at least IBM would since it has software built on Apache Spark.Marcellmarcella
This explains why you cannot compile Spark on Java 11, but does not explain why you cannot run Spark on Java 11. If Spark uses sun.misc.Unsafe and was compiled on Java 8, you should still be able to run it on Java 11, no?Hypothermal
F
16

Committer here. It's actually a fair bit of work to support Java 9+: SPARK-24417

It's also almost done and should be ready for Spark 3.0, which should run on Java 8 through 11 and beyond.

The goal (well, mine) is to make it work without opening up module access. The key issues include:

  • sun.misc.Unsafe usage has to be removed or worked around
  • Changes to the structure of boot classloader
  • Scala support for Java 9+
  • A bunch of dependency updates to work with Java 9+
  • JAXB no longer automatically available
Fart answered 28/2, 2019 at 5:21 Comment(6)
Any idea when the Spark 3.0 release date might be? Even a ball park figure would be great!Corrugation
Middle of 2019 is roughly what people seem to be saying on the dev@ listFart
Update: Java 11 support is done in Spark 3, but Spark 3 probably won't be out until nearer the end of the year.Fart
@SeanOwen Any idea or thoughts around, why Spark doesn't provide an Automatic-Module-Name entry in their MANIFEST.MF even with spark-core_2.12-3.0.0-preview2.jar?Particiaparticipant
I think it might be hard or impossible, as other modules have classes in the same package. There isn't much value in it anyway IMHOFart
Oh, just to close the thread: Spark 3 was released in June 2020 and supports Java 11+. Not tested on Java 14+ but I would be surprised if it didn't work.Fart
N
3

Spark depends on the memory API's which has been changed in JDK 9 so it is not available starting JDK 9.

And that is the reason for this.

Please check the issue:

https://issues.apache.org/jira/browse/SPARK-24421

Nazi answered 6/3, 2019 at 16:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.