java.lang.NoSuchMethodError: scala.Predef$.refArrayOps in Spark job with Scala
Asked Answered
A

1

4

Full error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object; at org.spark_module.SparkModule$.main(SparkModule.scala:62) at org.spark_module.SparkModule.main(SparkModule.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

When I compile and run the code in IntelliJ, it executes fine all the way through. The error shows when I submit the .jar as a spark job (runtime).

Line 62 contains: for ((elem, i) <- args.zipWithIndex). I commented out the rest of the code to be sure, and the error kept showing on that line.

At first I thought it was zipWithIndex's fault. Then I changed it for for (elem <- args) and guess what, the error still showed. Is the for causing this?

Google searching always points to Scala versions incompatibility between version used to compile and version used on runtime but I can't figure out a solution.

I tried this to check Scala version used by IntelliJ and here is everything Scala-related under Modules > Scala:

enter image description here

Then I did this to check the run-time version of Scala and the output is:

(file:/C:/Users/me/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-library/2.12.11/1a0634714a956c1aae9abefc83acaf6d4eabfa7d/scala-library-2.12.11.jar )

Versions seem to match...

This is my gradle.build (includes fatJar task)

group 'org.spark_module'
version '1.0-SNAPSHOT'

apply plugin: 'scala'
apply plugin: 'idea'
apply plugin: 'eclipse'

repositories {
    mavenCentral()
}

idea {
    project {
        jdkName = '1.8'
        languageLevel = '1.8'
    }
}

dependencies {
    implementation group: 'org.scala-lang', name: 'scala-library', version: '2.12.11'
    implementation group: 'org.apache.spark', name: 'spark-core_2.12'//, version: '2.4.5'
    implementation group: 'org.apache.spark', name: 'spark-sql_2.12'//, version: '2.4.5'
    implementation group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.12', version: '2.5.0'
    implementation group: 'org.apache.spark', name: 'spark-mllib_2.12', version: '2.4.5'
    implementation group: 'log4j', name: 'log4j', version: '1.2.17'
    implementation group: 'org.scalaj', name: 'scalaj-http_2.12', version: '2.4.2'
}

task fatJar(type: Jar) {
    zip64 true
    from {
        configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
    } {
        exclude "META-INF/*.SF"
        exclude "META-INF/*.DSA"
        exclude "META-INF/*.RSA"
    }

    manifest {
        attributes 'Main-Class': 'org.spark_module.SparkModule'
    }

    with jar
}

configurations.all {
    resolutionStrategy {
        force 'com.google.guava:guava:12.0.1'
    }
}

compileScala.targetCompatibility = "1.8"
compileScala.sourceCompatibility = "1.8"

jar {
    zip64 true
    getArchiveFileName()
    from {
        configurations.compile.collect {
            it.isDirectory() ? it : zipTree(it)
        }
    }
    manifest {
        attributes 'Main-Class': 'org.spark_module.SparkModule'
    }

    exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'

}

To build the (fat) jar:

gradlew fatJar

in IntelliJ's terminal.

To run the job:

spark-submit.cmd .\SparkModule-1.0-SNAPSHOT.jar

in Windows PowerShell.

Thank you

EDIT:

spark-submit.cmd and spark-shell.cmd both show Scala version 2.11.12, so yes, they differ from the one I am using in IntelliJ (2.12.11). The problem is, in Spark's download page, there is only one Spark distribution for Scala 2.12 and it comes without Hadoop; does it mean I have to downgrade from 2.12 to 2.11 in my gradle.build?

Atiana answered 8/5, 2020 at 8:32 Comment(7)
you can try specify scala version un gradle build file using compile keyword, look here: #44374972Determinable
Yes exactly try that, try to change scala version in your gradle.build fileCannibalize
@Cannibalize but isn't there a way to keep Scala 2.12? a Spark (with Hadoop) dist for Scala 2.12?Atiana
Indeed, it is recommended in 2.4.5 spark version to use 2.12 scala version, and 2.11 is deprecated, have you tried to run your code with 2.11 scala version? I would like to know.Cannibalize
@Chema, it worked with 2.11, I'm trying to make it work the other way around (with 2.12)Atiana
Your code run Great! I updated my answer with some notes!Cannibalize
#75947949Commit
C
4

I would try spark-submit --version to know what scala version is using spark

With spark-submit --version I get this information

[cloudera@quickstart scala-programming-for-data-science]$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0.cloudera4
      /_/
                        
Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_202
Branch HEAD
Compiled by user jenkins on 2018-09-27T02:42:51Z
Revision 0ef0912caaab3f2636b98371eb29adb42978c595
Url git://github.mtv.cloudera.com/CDH/spark.git
Type --help for more information.

from the spark-shell you could try this to know the scala version

scala> util.Properties.versionString
res3: String = version 2.11.8

The OS could be using other scala version, in my case as you can see spark scala version and OS scala version are different

[cloudera@quickstart scala-programming-for-data-science]$ scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

Note From O'Really Learning Spark "Holden Karau, Andy Konwinski,Patrick Wendell & Matei Zaharia"

Dependency Conflicts

One occasionally disruptive issue is dealing with dependency conflicts in cases where a user application and Spark itself both depend on the same library. This comes up relatively rarely, but when it does, it can be vexing for users. Typically, this will manifest itself when a NoSuchMethodError, a ClassNotFoundException, or some other JVM exception related to class loading is thrown during the execution of a Spark job. There are two solutions to this problem. The first is to modify your application to depend on the same version of the third-party library that Spark does. The second is to modify the packaging of your application using a procedure that is often called “shading.” The Maven build tool supports shading through advanced configuration of the plug-in shown in Example 7-5 (in fact, the shading capability is why the plugin is named maven-shade-plugin). Shading allows you to make a second copy of the conflicting package under a different namespace and rewrites your application’s code to use the renamed version. This somewhat brute-force technique is quite effective at resolving runtime dependency conflicts. For specific instructions on how to shade dependencies, see the documentation for your build tool.

Cannibalize answered 8/5, 2020 at 11:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.