spark submit add multiple jars in classpath
Asked Answered
D

9

52

I am trying to run a spark program where i have multiple jar files, if I had only one jar I am not able run. I want to add both the jar files which are in same location. I have tried the below but it shows a dependency error

spark-submit \
  --class "max" maxjar.jar Book1.csv test \
  --driver-class-path /usr/lib/spark/assembly/lib/hive-common-0.13.1-cdh​5.3.0.jar

How can i add another jar file which is in the same directory?

I want add /usr/lib/spark/assembly/lib/hive-serde.jar.

Dishrag answered 17/3, 2015 at 12:29 Comment(4)
Welcome @avinash, for you next post I recommand yout to have a look to stackoverflow.com/editing-helpJavier
spark-submit [restofyouroptions] --conf "spark.driver.extraClassPath=myjarfile.jar"Unconscionable
multiple jar files: "spark.driver.extraClassPath=/path/myjarfile1.jar:/path/myjarfile2.jar"Unconscionable
@zahra didn't work for me, 'No suitable driver' found error. This problem is due to the JVM is already started before setting the 'extraClassPath' conf.. Is there any way that we can set it before the JVM starts?Elin
A
6

I was trying to connect to mysql from the python code that was executed using spark-submit.

I was using HDP sandbox that was using Ambari. Tried lot of options such as --jars, --driver-class-path, etc, but none worked.

Solution

Copy the jar in /usr/local/miniconda/lib/python2.7/site-packages/pyspark/jars/

As of now I'm not sure if it's a solution or a quick hack, but since I'm working on POC so it kind of works for me.

Amaranth answered 11/9, 2017 at 9:34 Comment(2)
meta.https://mcmap.net/q/353780/-how-can-i-use-excel-for-project-management-closed/1434041 though it makes more sense to remove those as part of a larger edit.Unconscionable
Just for reference since it was one of the first questions I've found when searching this in Google, in AWS EMR with Spark 2.x the jars folder is in /usr/lib/spark/jars/. There's an official tutorial from AWS on how to do that.Nat
T
54

Just use the --jars parameter. Spark will share those jars (comma-separated) with the executors.

Twostep answered 17/3, 2015 at 13:22 Comment(4)
i tried comma separated spark-submit --class "max" maxjar.jar Book1.csv test /usr/lib/spark/assembly/lib/hive-common-0.13.1-cdh5.3.0.jar,hive-serde.jar . but it doesnt read neither of the jars . I get this error the org/apache/hadoop/hive/conf/HiveConfDishrag
I meant, use it like this: spark-submit --master master_url --jars jar1,jar2 --class classname application_jarTwostep
actually i want add multiple jars in my classpath .i dont have access to copy thejars in my locale file so i am just accesing the jars through class pathDishrag
I tried it also, but it doesn't work, spark took into account just the 1st jar, the second consider it as the job jar, thus, it throws an exception telling that the class specified with --class is not foundFlorez
S
43

Specifying full path for all additional jars works.

./bin/spark-submit --class "SparkTest" --master local[*] --jars /fullpath/first.jar,/fullpath/second.jar /fullpath/your-program.jar

Or add jars in conf/spark-defaults.conf by adding lines like:

spark.driver.extraClassPath /fullpath/firs.jar:/fullpath/second.jar
spark.executor.extraClassPath /fullpath/firs.jar:/fullpath/second.jar
Sheelagh answered 27/4, 2016 at 2:52 Comment(2)
How do I do it in windows? Because on windows path includes colon e.g. D:\pathIntervale
a comma separated list of packages helped me.. Create a spark-defaults.conf file within the bin folder of spark folder. In the spark-defaults.conf type "spark.jars.packages org.apache.spark:spark-streaming-kafka-0-10_2.12:3.0.2,org.apache.spark:spark-avro_2.12:3.0.2" As you see, i am getting the 1st package "streaming kafka" and 2nd package "spark avro".. All you have to do is add as much packages as needed by specifying them with a comma separator.Grumble
J
24

You can use * for import all jars into a folder when adding in conf/spark-defaults.conf .

spark.driver.extraClassPath /fullpath/*
spark.executor.extraClassPath /fullpath/*
Jolly answered 14/9, 2016 at 17:19 Comment(3)
Are you sure? I got "16/10/20 19:56:43 ERROR SparkContext: Jar not found at file:/root/.ivy2/jars/*.jar"Transmute
Relative path works too! My setting is "spark.driver.extraClassPath lib/*" where lib is a directory under spark home and all 3rd party jars are there.Thatcher
This solution works! I had similar issue where i needed two different jdbc drivers for multiple DB connection scenario and this approach works a charm! Thank you.Caudell
A
6

I was trying to connect to mysql from the python code that was executed using spark-submit.

I was using HDP sandbox that was using Ambari. Tried lot of options such as --jars, --driver-class-path, etc, but none worked.

Solution

Copy the jar in /usr/local/miniconda/lib/python2.7/site-packages/pyspark/jars/

As of now I'm not sure if it's a solution or a quick hack, but since I'm working on POC so it kind of works for me.

Amaranth answered 11/9, 2017 at 9:34 Comment(2)
meta.https://mcmap.net/q/353780/-how-can-i-use-excel-for-project-management-closed/1434041 though it makes more sense to remove those as part of a larger edit.Unconscionable
Just for reference since it was one of the first questions I've found when searching this in Google, in AWS EMR with Spark 2.x the jars folder is in /usr/lib/spark/jars/. There's an official tutorial from AWS on how to do that.Nat
T
6

In Spark 2.3 you need to just set the --jars option. The file path should be prepended with the scheme though ie file:///<absolute path to the jars> Eg : file:////home/hadoop/spark/externaljsrs/* or file:////home/hadoop/spark/externaljars/abc.jar,file:////home/hadoop/spark/externaljars/def.jar

Tow answered 26/4, 2018 at 10:9 Comment(0)
M
5

Pass --jars with the path of jar files separated by , to spark-submit.

For reference:

--driver-class-path is used to mention "extra" jars to add to the "driver" of the spark job

--driver-library-path is used to "change" the default library path for the jars needed for the spark driver

--driver-class-path will only push the jars to the driver machine. If you want to send the jars to "executors", you need to use --jars

And to set the jars programatically set the following config: spark.yarn.dist.jars with comma-separated list of jars.

Eg:

from pyspark.sql import SparkSession

spark = SparkSession \
        .builder \
        .appName("Spark config example") \
        .config("spark.yarn.dist.jars", "<path-to-jar/test1.jar>,<path-to-jar/test2.jar>") \
        .getOrCreate()
Mineralogist answered 18/2, 2020 at 13:14 Comment(0)
D
4

You can use --jars $(echo /Path/To/Your/Jars/*.jar | tr ' ' ',') to include entire folder of Jars. So, spark-submit -- class com.yourClass \ --jars $(echo /Path/To/Your/Jars/*.jar | tr ' ' ',') \ ...

Damselfish answered 7/8, 2019 at 7:47 Comment(0)
T
0

For --driver-class-path option you can use : as delimeter to pass multiple jars. Below is the example with spark-shell command but I guess the same should work with spark-submit as well

    spark-shell --driver-class-path /path/to/example.jar:/path/to/another.jar

Spark version: 2.2.0

Tass answered 23/11, 2018 at 8:7 Comment(0)
S
0

if you are using properties file you can add following line there:

spark.jars=jars/your_jar1.jar,...

assuming that

<your root from where you run spark-submit>
  |
  |-jars
      |-your_jar1.jar
Skyros answered 6/11, 2019 at 15:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.