Do we still have to make a fat jar for submitting jobs in Spark 2.0.0?
Asked Answered
D

1

7

In the Spark 2.0.0's release note, it says that:

Spark 2.0 no longer requires a fat assembly jar for production deployment.

  • Does this mean that we do not need to make a fat jar anymore for submitting jobs ?

  • If yes, how ? Thus the documentation here isn't up-to-date.

Disloyal answered 10/8, 2016 at 9:1 Comment(0)
B
11

Does this mean that we do not need to make a fat jar anymore for submitting jobs ?

Sadly, no. You still have to create an uber JARs for Sparks deployment.

The title from the release notes is very misleading. The actual meaning is that Spark itself as a dependency is no longer compiled into an uber JAR, but acts like a normal application JAR with dependencies. You can see this in more detail @ SPARK-11157 which is called "Allow Spark to be built without assemblies", and read the paper called "Replacing the Spark Assembly with good old jars" which describes the pros and cons of deploying Spark not as several huge JARs (Core, Streaming, SQL, etc..) but as a several relatively regular sized JARs containing the code and a lib/ directory with all the related dependencies.

If you really want the details, this pull request touches several key parts.

Brio answered 10/8, 2016 at 13:37 Comment(4)
I read you answer like 10 times. I also read links you provided. Can you clearly specify a reason why I must create fat jar? What if I won't, what are potential problems with just specifying a provided scope for all spark dependencies?Teasley
@Teasley If you don't, and ship only your compiled code, who will provide the third party dependencies? At runtime, your app will blow up with a ClassNotFoundException.Brio
I asked a wrong question. What if I make a fat jar but specify a provided scope for all spark dependencies because they are provided by the environment anyway? So, third party dependencies that my app uses but Spark doesn't will be packed in a fat jar.Teasley
Or maybe I understand the whole situation incorrectly. Is it required that Spark should be packaged as a dependency in a fat jar or not? That's a question I really try to find an answer to.Teasley

© 2022 - 2024 — McMap. All rights reserved.