Apache Spark -- using spark-submit throws a NoSuchMethodError
Asked Answered
S

3

7

To submit a Spark application to a cluster, their documentation notes:

To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. -- http://spark.apache.org/docs/latest/submitting-applications.html

So, I added the Apache Maven Shade Plugin to my pom.xml file. (version 3.0.0)
And I turned my Spark dependency's scope into provided. (version 2.1.0)

(I also added the Apache Maven Assembly Plugin to ensure I was wrapping all of my dependencies in the jar when I run mvn clean package. I'm unsure if it's truly necessary.)


Thus is how spark-submit fails. It throws a NoSuchMethodError for a dependency I have (note that the code works from a local instance when compiling inside IntelliJ, assuming that provided is removed).

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;

The line of code that throws the error is irrelevant--it's simply the first line in my main method that creates a Stopwatch, part of the Google Guava utilities. (version 21.0)

Other solutions online suggest that it has to do with version conflicts of Guava, but I haven't had any luck yet with those suggestions. Any help would be appreciated, thank you.

Seaside answered 22/2, 2017 at 17:45 Comment(0)
N
6

If you take a look at the /jars subdirectory of the Spark 2.1.0 installation, you will likely see guava-14.0.1.jar. Per the API for the Guava Stopwatch#createStarted method you are using, createStarted did not exist until Guava 15.0. What is most likely happening is that the Spark process Classloader is finding the Spark-provided Guava 14.0.1 library before it finds the Guava 21.0 library packaged in your uberjar.

One possible resolution is to use the class-relocation feature provided by the Maven Shade plugin (which you're already using to construct your uberjar). Via "class relocation", Maven-Shade moves the Guava 21.0 classes (needed by your code) during the packaging of the uberjar from a pattern location reflecting their existing package name (e.g. com.google.common.base) to an arbitrary shadedPattern location, which you specify in the Shade configuration (e.g. myguava123.com.google.common.base).

The result is that the older and newer Guava libraries no longer share a package name, avoiding the runtime conflict.

Nautical answered 22/2, 2017 at 19:2 Comment(1)
Thanks, adding the relocation section in the shade plugin fixed the error. Also, I was able to remove the assembly plugin and it still worked fine.Seaside
P
2

Most likely you're having a dependency conflict, yes.

First you can look if you have a dependency conflict when you build your jar. A quick way is to look in your jar directly to see if the Stopwatch.class file is there, and if, by looking at the bytecode, it appears that the method createStarted is there. Otherwise you can also list the dependency tree and work from there : https://maven.apache.org/plugins/maven-dependency-plugin/examples/resolving-conflicts-using-the-dependency-tree.html

If it's not an issue with your jar, you might have a dependency issue due to a conflict between your spark installation and your jar. Look in the lib and jars folder of your spark installation. There you can see if you have jars that include an alternate version of guava that wouldnt support the method createStarted() from Stopwatch

Pillage answered 22/2, 2017 at 18:39 Comment(2)
The Stopwatch.createStarted() method exists in the jar, I'm checking now to see if Spark has any conflicting version of Guava already.Seaside
It looks like Spark 2.1.0 has guava-14.0.1.jar in the jars directory.Seaside
T
1

Apply above answers to solve the problem by following config:

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.1.0</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
            <relocations>
                <relocation>
                    <pattern>com.google.common</pattern>
                    <shadedPattern>shade.com.google.common</shadedPattern>
                </relocation>
                <relocation>
                    <pattern>com.google.thirdparty.publicsuffix</pattern>
                    <shadedPattern>shade.com.google.thirdparty.publicsuffix</shadedPattern>
                </relocation>
          </relocations>
        </configuration>
      </execution>
    </executions>
  </plugin>
Tintoretto answered 28/12, 2017 at 6:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.