spark-submit error: ClassNotFoundException
Asked Answered
L

1

0

build.sbt

lazy val commonSettings = Seq(
    organization := "com.me",
    version := "0.1.0",
    scalaVersion := "2.11.0"
)

lazy val counter = (project in file("counter")).
    settings(commonSettings:_*)

counter/build.sbt

name := "counter"
mainClass := Some("Counter")
scalaVersion := "2.11.0"

val sparkVersion = "2.1.1";

libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided";
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided";
libraryDependencies += "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided";

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.2";
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-8" % sparkVersion;

libraryDependencies += "com.github.scopt" %% "scopt" % "3.5.0";

libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.1";
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" % "test";

mergeStrategy in assembly := {
  case PathList("org", "apache", "spark", "unused", "UnusedStubClass.class") => MergeStrategy.first
  case x => (mergeStrategy in assembly).value(x)
}

counter.scala:

object Counter extends SignalHandler
{
    var ssc : Option[StreamingContext] = None;
    def main( args: Array[String])

Run

./spark-submit --class "Counter" --master spark://10.1.204.67:6066 --deploy-mode cluster file://counter-assembly-0.1.0.jar

Error:

17/06/21 19:00:25 INFO Utils: Successfully started service 'Driver' on port 50140.
17/06/21 19:00:25 INFO WorkerWatcher: Connecting to worker spark://[email protected]:52476
Exception in thread "main" java.lang.ClassNotFoundException: Counter
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:56)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

Any idea? Thanks

UPDATE

I had the problem here Failed to submit local jar to spark cluster: java.nio.file.NoSuchFileException. Now, I copied the jar into spark-2.1.0-bin-hadoop2.7/bin and then run ./spark-submit --class "Counter" --master spark://10.1.204.67:6066 --deploy-mode cluster file://Counter-assembly-0.1.0.jar

The spark cluster is of 2.1.0

But the jar was assembled in 2.1.1 and Scala 2.11.0.

Lobbyism answered 21/6, 2017 at 19:7 Comment(11)
Have you tried renaming counter.scala to Counter.scala?Kozak
@TomLous No. I will tryLobbyism
@TomLous Just now, I tried, not workingLobbyism
Hmm, hard to tell without the entire project available. Some small pointers / unwanted advice though (probably won't help, but here anyway). 1. Don't use scala 2.11.0, rather 2.11.11 2. Don't use semicolons in scala (very Java) 3. Why have 2 build.sbt files. 1 should be enough for such a small project? 4. Reorganise your code in a src/main/scala/ folder 5. Don't use var's (not very FP). Sorry I couldn't help you, but if you could share the project code via github or so I or someone else could check it out quickly?Kozak
@TomLous Thanks. Just curious why complains the class name is wrong.Lobbyism
I bet it's because the jar is not used correctly as you reference it using file://counter-assembly-0.1.0.jar not target/scala-2.11/counter-assembly-0.1.0.jar. In other words, where do you start spark-submit from?Scrivner
@JacekLaskowski I had the problem here #44663251. Now, I copied the jar into spark-2.1.0-bin-hadoop2.7/bin and then run ./spark-submit --class "Counter" --master spark://10.1.204.67:6066 --deploy-mode cluster file://Counter-assembly-0.1.0.jarLobbyism
I read the documents and same similar posts again and again. It seems that what I did is right. but it does not work.Lobbyism
@JacekLaskowski where to put the jar and how to submit it?Lobbyism
@JacekLaskowski are you author of mastering apache spark 2?Lobbyism
@Lobbyism Yes, I'm the author of the gitbook.Scrivner
S
3

It appears that you've just started developing Spark applications with Scala so for the only purpose to help you and the other future Spark developers, I hope to give you enough steps to get going with the environment.

Project Build Configuration - build.sbt

It appears that you use multi-project sbt build and that's why you have two build.sbts. For the purpose of fixing your issue I'd pretend you don't use this advanced sbt setup.

It appears that you use Spark Streaming so define it as a dependency (as libraryDependencies). You don't have to define the other Spark dependencies (like spark-core or spark-sql).

You should have build.sbt as follows:

organization := "com.me"
version := "0.1.0"
scalaVersion := "2.11.0"
val sparkVersion = "2.1.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided"

Building Deployable Package

With build.sbt above, you execute sbt package to build a deployable Spark application package that you eventually spark-submit to a Spark cluster.

You don't have to use sbt assembly for that...yet. I can see that you use Spark Cassandra Connector and other dependencies that could also be defined using --packages or --jars instead (which by themselves have their pros and cons).

sbt package

The size of the final target/scala-2.11/counter_2.11-0.1.0.jar is going to be much smaller than counter-assembly-0.1.0.jar you have built using sbt assembly because sbt package does not include the dependencies in a single jar file. That's expected and fine.

Submitting Spark Application - spark-submit

After sbt package you should have the deployable package in target/scala-2.11 as counter-assembly-0.1.0.jar.

You should just spark-submit with required options which in your case would be:

spark-submit \
  --master spark://10.1.204.67:6066
 target/scala-2.11/counter-assembly-0.1.0.jar

That's it.


Please note that:

  1. --deploy-mode cluster is too advanced for the exercise (let's keep it simple and bring it back when needed)

  2. file:// makes things broken (or at least is superfluous)

  3. --class "Counter" is taken care of by sbt package when you have a single Scala application in a project where you execute it. You can safely skip it.

Scrivner answered 21/6, 2017 at 23:17 Comment(5)
thanks. but (1) after running sbt package, counter_2.11-0.1.0.jar in target/scala-2.11; (2) counter_2.11-0.1.0.jar is much smaller than counter-assembly-0.1.0.jar; (3) 6066 is port for rest; (4) I really would like to deploy jar to remote spark cluster in cluster mode. I read spark.apache.org/docs/latest/submitting-applications.html again and again and tried but cannot make it work.Lobbyism
Improved my answer to answer your question from the comment above. "I really would like to deploy jar to remote spark cluster in cluster mode" Why? Can you explain why you insist on doing cluster when you haven't even spark-submited the application using client mode yet? That's another issue and we should discuss it after we have this case closed. Don't you think?Scrivner
I already tried client mode and it is working fine. But I prefer the cluster mode and submitting through rest apis. please view my update here #44663251Lobbyism
So your question is about spark-submit --deploy-mode cluster then, isn't it? That was not clear from the question.Scrivner
Can you go to 10.1.204.67:8080 and take a screenshot of the first page of the standalone Master's web UI? Include the screenshot to your question please. Thanks!Scrivner

© 2022 - 2024 — McMap. All rights reserved.