Apache Livy cURL not working for spark-submit command
Asked Answered
M

3

1

I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch.

Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar.
java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This is the error statement, showing in livy batch log.

My spark-submit command is working perfectly fine for local .jar file.

spark-submit --class "SimpleApp" --master local target/scala-2.11/simple-project_2.11-1.0.jar

But same for livy (in cURL) it is throwing error.

"requirement failed: Local path /target/scala-2.11/simple-project_2.11-1.0.jar cannot be added to user sessions."

So, I shift .jar file in hdfs. My new code for livy is -

curl -X POST --data '{
    "file": "/jar/project.jar",
    "className": "SimpleApp",
    "args": ["ddd"]
}'  
-H 
"Content-Type: application/json" 
http://server:8998/batches

This is throwing error which is mention above.

Please let me know, where am I wrong?

Thanks in advance!

Mica answered 21/6, 2018 at 13:5 Comment(0)
S
0
hdfs://localhost:9001/jar/project.jar.

It's expecting your jar file located on hdfs.

If it's local, maybe you should try to specify protocol in a path, or just upload that into hdfs:

 "file": "file:///absolute_path/jar/project.jar",
Spermic answered 21/6, 2018 at 13:14 Comment(5)
Okay and what is the solution for ClassNotFoundException?Mica
Jar that contains that class can’t be found, as soon as you provide correct path to class issue should be resolved.Spermic
In jar file class is in spark/wordcount folder, I tried spark.wordcount.SimpleApp as class name but still throwing ClassNotFoundExceptionMica
did you solve issue with path to jar file? is error message the same?Spermic
I have uploaded jar file in hdfs, but error still same.Mica
F
0

You have to make a fat jar file with your codebase + necessary jar - sbt assembly or use maven plugin, upload this jar file to HDFS and run spark-submit with this jar file which is placed on HDFS or you can use cURL as well.

Steps with Scala/Java:

  1. Make fat jar with SBT/Maven or whatever.
  2. Upload fat jar to HDFS
  3. Use cURL for submitting jobs:

curl -X POST --data '{ //your data should be here}' -H "Content-Type: plication/json" your_ip:8998/batches

If you don't want to make a fat jar file and upload it to HDFS, you can consider python scripts, it could be submitted like a plain text without any jar file.

The example with plain python code:

curl your_ip:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"print(\"asdf\")"}'

In data body, you have to send valid Python code. It's a way in which tools like Jupyter Notebook/Torch works.

Also, I made one more example with Livy and Python. For checking results:

curl your_ip:8998/sessions/0/statements/1

As I mentioned above, for Scala/Java fat jar and uploading to HDFS are required.

Foliose answered 26/6, 2018 at 10:1 Comment(2)
I have created fat jar, as per your instruction and uploaded to HDFS but problem statement still same, jar file still work with local path i.e. "spark-submit --class "SimpleApp" --master local myProject/target/scala-2.11/SimpleProject-assembly-1.0.jar" but doesn't work with HDFS path i.e. "spark-submit --class "SimpleApp" --master local hdfs://localhost:9001/jar/SimpleProject-assembly-1.0.jar"Mica
@Divine You specified local for the path to HDFS - it's wrong.Foliose
C
0

To use local files for livy batch jobs you need to add the local folder to the livy.file.local-dir-whitelist property in livy.conf.

Description from livy.conf.template:

List of local directories from where files are allowed to be added to user sessions. By default it's empty, meaning users can only reference remote URIs when starting their sessions.

Curly answered 13/8, 2018 at 20:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.