spark-submit error: ClassNotFoundException

name := "counter" mainClass := Some("Counter") scalaVersion := "2.11.0" val sparkVersion = "2.1.1"; libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"; libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"; libraryDependencies += "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided"; libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.2"; libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-8" % sparkVersion; libraryDependencies += "com.github.scopt" %% "scopt" % "3.5.0"; libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.1"; libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" % "test"; mergeStrategy in assembly := { case PathList("org", "apache", "spark", "unused", "UnusedStubClass.class") => MergeStrategy.first case x => (mergeStrategy in assembly).value(x) }

17/06/21 19:00:25 INFO Utils: Successfully started service 'Driver' on port 50140. 17/06/21 19:00:25 INFO WorkerWatcher: Connecting to worker spark://[email protected]:52476 Exception in thread "main" java.lang.ClassNotFoundException: Counter at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:229) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:56) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

It appears that you've just started developing Spark applications with Scala so for the only purpose to help you and the other future Spark developers, I hope to give you enough steps to get going with the environment.

Project Build Configuration - build.sbt

It appears that you use multi-project sbt build and that's why you have two build.sbts. For the purpose of fixing your issue I'd pretend you don't use this advanced sbt setup.

It appears that you use Spark Streaming so define it as a dependency (as libraryDependencies). You don't have to define the other Spark dependencies (like spark-core or spark-sql).

You should have build.sbt as follows:

organization := "com.me"
version := "0.1.0"
scalaVersion := "2.11.0"
val sparkVersion = "2.1.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided"

Building Deployable Package

With build.sbt above, you execute sbt package to build a deployable Spark application package that you eventually spark-submit to a Spark cluster.

You don't have to use sbt assembly for that...yet. I can see that you use Spark Cassandra Connector and other dependencies that could also be defined using --packages or --jars instead (which by themselves have their pros and cons).

sbt package

The size of the final target/scala-2.11/counter_2.11-0.1.0.jar is going to be much smaller than counter-assembly-0.1.0.jar you have built using sbt assembly because sbt package does not include the dependencies in a single jar file. That's expected and fine.

Submitting Spark Application - spark-submit

After sbt package you should have the deployable package in target/scala-2.11 as counter-assembly-0.1.0.jar.

You should just spark-submit with required options which in your case would be:

spark-submit \
  --master spark://10.1.204.67:6066
 target/scala-2.11/counter-assembly-0.1.0.jar

That's it.

Please note that:

--deploy-mode cluster is too advanced for the exercise (let's keep it simple and bring it back when needed)
file:// makes things broken (or at least is superfluous)
--class "Counter" is taken care of by sbt package when you have a single Scala application in a project where you execute it. You can safely skip it.

Project Build Configuration - build.sbt

Building Deployable Package

Submitting Spark Application - spark-submit

Recommended topics

Hot tags