Scala/Spark version compatibility
Asked Answered
T

3

14

I am building my first spark application.

http://spark.apache.org/downloads.html tells me that Spark 2.x is built against Scala 2.11.

On the Scala site https://www.scala-lang.org/download/all.html I am seeing the versions from 2.11.0 - 2.11.11

So here is my question: what exactly does the 2.11 on the Spark site mean. Is it any Scala version in the 2.11.0 - 2.11.11 range?

Another question: Can I build my Spark apps using the latest Scala 2.12.2? I assume that Scala is backward compatible, so Spark libraries built with Scala say 2.11.x can be used/called in Scala 2.12.1 applications. Am I correct?

Turtledove answered 10/5, 2017 at 3:53 Comment(0)
E
35

Scala is not backwards compatible, as you assume. You must use scala 2.11 with spark unless you rebuild spark under scala 2.12 (which is an option if you want to use the latest Scala version, but requires more work to get everything working).

When considering compatibility, you need to consider both source compatibility and binary compatibility. Scala does tend to be source backwards compatible, so you can rebuild your jar under a newer version, but it is not binary backward compatible, so you can't use a jar built with an old version with code from a new version.

This is just major versions, so scala 2.10, 2.11, 2.12 etc. are all major versions and are not binary compatible (even if they are source compatible). Within a major version though compatibility is maintained, so Scala 2.11 is compatible with all versions 2.11.0 - 2.11.11 (plus any future 2.11 revisions will also be compatible)

It is for this reason that you will see most Scala libraries have separate releases for each major Scala version. You have to make sure that any library you use provides a jar for the version you are using, and that you use that jar and not one for a different version. If you use SBT %% will handle selecting the correct version for you but with maven you need to make sure to use the correct artifact name. The versions are typically prepended with _2.10, _2.11, and _2.12 referring to the scala version the jar is built for.

Effector answered 10/5, 2017 at 4:22 Comment(2)
I want to use at least Spark 2.0, since that is the first version that has the model saving and loading capability I need. So what version of Scala do I need and where can this be looked up?Sorrows
@PaulReiners The latest version 2.1.1 is distributed for Scala 2.11. If you want Scala 2.12 you can build spark from source for that Scala version. The spark homepage mentions the Scala version for the latest release in a couple places but I haven't seen any official compatibility table. You can of course just look at the package names though which follow the standard convention of appending the Scala version they are compatible with to the artifact id.Effector
P
0

For anyone who wants to get jump started, this is the versioning pair I've used.

scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.3.2",
  "org.apache.spark" %% "spark-sql" % "2.3.2"
)
Prosper answered 5/1, 2020 at 8:43 Comment(0)
P
0

I used these versions of Scala and Spark and it worked OK for my need:

scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"

Some libraries need 2.11 version of Scala, and in this case one should use the versions mentioned by @the775.

NOTE : This is an old answer, it is no longer available now, as newer versions of Scala and Spark exist.

Passed answered 8/3, 2020 at 13:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.