Using Scala 2.12 with Spark 2.x
Asked Answered
C

3

29

At the Spark 2.1 docs it's mentioned that

Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.1.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).

at the Scala 2.12 release news it's also mentioned that:

Although Scala 2.11 and 2.12 are mostly source compatible to facilitate cross-building, they are not binary compatible. This allows us to keep improving the Scala compiler and standard library.

But when I build an uber jar (using Scala 2.12) and run it on Spark 2.1. every thing work just fine.

and I know its not any official source but at the 47 degree blog they mentioned that Spark 2.1 does support Scala 2.12.

How can one explain those (conflicts?) pieces of information ?

Couplet answered 19/3, 2017 at 13:50 Comment(4)
There is a formal difference i.e. "we support that version, we have tested it, and if you have issues then it's a bug on our side" vs. "do it your way, experiment if you wish, but if you have issues then don't come back whining".Breakfront
yea but how can it work if scala 2.11 is not binary compatible with 2.12?Couplet
Not compatible means that there is at least 1 issue. Could be OK for 99.99% of the API calls. How much did you test with your custom Uber-JAR? Maybe 15%?Breakfront
#75947949Odeliaodelinda
L
38

Spark does not support Scala 2.12. You can follow SPARK-14220 (Build and test Spark against Scala 2.12) to get up to date status.

update: Spark 2.4 added an experimental Scala 2.12 support.

Lafferty answered 19/3, 2017 at 14:22 Comment(4)
could have been added as commentBernardobernarr
Spark 2.4 now supports Scala 2.12 as an experimentally.Synchrocyclotron
2.12 support is no longer experimental - it's now GA - see Spark 2.4.1 release notes.Noddy
Scala 2.12 may be supported, but as of Spark 2.4.x the pre-built binaries are compiled with Scala 2.11 (except version 2.4.2).Mattingly
U
4

Scala 2.12 is officially supported (and required) as of Spark 3. Summary:

  • Spark 2.0 - 2.3: Required Scala 2.11
  • Spark 2.4: Supported Scala 2.11 and Scala 2.12, but not really cause almost all runtimes only supported Scala 2.11.
  • Spark 3: Only Scala 2.12 is supported

Using a Spark runtime that's compiled with one Scala version and a JAR file that's compiled with another Scala version is dangerous and causes strange bugs. For example, as noted here, using a Scala 2.11 compiled JAR on a Spark 3 cluster will cause this error: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps.

Look at all the poor Spark users running into this very error.

Make sure to look into Scala cross compilation and understand the %% operator in SBT to limit your suffering. Maintaining Scala projects is hard and minimizing your dependencies is recommended.

Ultimogeniture answered 1/12, 2020 at 20:11 Comment(0)
C
0

To add to the answer, I believe it is a typo https://spark.apache.org/releases/spark-release-2-0-0.html has no mention of scala 2.12.

Also, if we look at timings Scala 2.12 was not released untill November 2016 and Spark 2.0.0 was released on July 2016.

References: https://spark.apache.org/news/index.html

www.scala-lang.org/news/2.12.0/

Crownpiece answered 22/9, 2017 at 18:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.