Why Maven assembly works when SBT assembly find conflicts
Asked Answered
L

3

11

The title could also be:
What are the differences between Maven and SBT assembly plugins.

I have found this to be an issue, while migrating a project from Maven to SBT.

To describe the problem I have created an example project with dependencies that I found to behave differently, depending on the build tool.

https://github.com/atais/mvn-sbt-assembly


The only dependencies are (sbt style)

"com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
"org.apache.cassandra" % "cassandra-all" % "3.4",

and what I do not understand is, why mvn package creates the fat jar successfully, while sbt assembly gives conflicts:

[error] 39 errors were encountered during merge
[error] java.lang.RuntimeException: deduplicate: different file contents found in the following:
[error] /home/siatkowskim/.ivy2/cache/org.slf4j/jcl-over-slf4j/jars/jcl-over-slf4j-1.7.7.jar:org/apache/commons/logging/<some classes>
[error] /home/siatkowskim/.ivy2/cache/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar:org/apache/commons/logging/<some classes>
...
[error] /home/siatkowskim/.ivy2/cache/com.github.stephenc.high-scale-lib/high-scale-lib/jars/high-scale-lib-1.1.2.jar:org/cliffc/high_scale_lib/<some classes>
[error] /home/siatkowskim/.ivy2/cache/com.boundary/high-scale-lib/jars/high-scale-lib-1.0.6.jar:org/cliffc/high_scale_lib/<some classes>
...
Luke answered 9/5, 2018 at 9:22 Comment(5)
You need to have merge strategy #32497780, #39850868, #14792455, For maven see this #44613286, also see this bryantsai.com/…Syne
@TarunLalwani your last link (the article) quite well describes the case, but the thing is, that in case of conflicts, they recommend maven-shade-plugin. And this is where things are getting interesting. Because in my example projects, there ARE conflicts, but somehow maven-assembly-plugin resolves them, and sbt-assembly does not.Luke
I tried finding any reference on the same but I didn't find any such thing which describe how maven shade plugin does itSyne
sbt-assembly can do shading as well: github.com/sbt/sbt-assembly#shadingGoer
What a coincidence as I've been running into it too and spent over a week trying to figure it out. Glad you asked this question.Drunkard
L
6

Extension to Alexey Romanov answer.

I have also updated my project with detailed explanation, so you might want to check it out.

Following the advice

You can verify it for this case by unpacking the jar Maven produces and the dependency jars in SBT error message, then checking which .class file Maven used.

I compared the fat-jars produced by maven and sbt with

  • MergeStrategy.first, that showed some extra files
  • MergeStrategy.last, that showed binary differences & extra files

I have taken the next step and checked the fat-jars against the dependencies sbt found conflicts at, specifically:

Conclusion

maven-assembly-plugin resolves conflicts on jar level. When it finds any conflict, it picks the first jar and simply ignores all the content from the other.

Whereas sbt-assembly mixes all the class files, resolving conflicts locally, file by file.

My theory would be, that if your fat-jar made with maven-assembly-plugin works, you can specify MergeStrategy.first for all the conflicts in sbt. They only difference would be, that the jar produced with sbt will be even bigger, containing extra classes that were ignored by maven.

Luke answered 23/5, 2018 at 11:51 Comment(1)
Wow, its resolution strategy is even worse than I thought :)Olag
O
4

It seems maven-assembly-plugin resolves conflicts equivalently to MergeStrategy.first (not sure if it's completely equivalent) by just picking one of the files in an unspecified way when jar-with-dependencies is used (since it only has one phase):

If two or more elements (e.g., file, fileSet) select different sources for the same file for archiving, only one of the source files will be archived.

As per version 2.5.2 of the assembly plugin, the first phase to add the file to the archive "wins". The filtering is done solely based on name inside the archive, so the same source file can be added under different output names. The order of the phases is as follows: 1) FileItem 2) FileSets 3) ModuleSet 4) DepenedencySet and 5) Repository elements.

Elements of the same type will be processed in the order they appear in the descriptors. If you need to "overwrite" a file included by a previous set, the only way to do this is to exclude that file from the earlier set.

Note that this behaviour was slightly different in earlier versions of the assembly plugin.

Even if one of the conflicting files would work for all of your dependencies (which isn't necessarily so), Maven doesn't know which one, so you can just silently get the wrong result. Silently at build-time, I mean; at runtime you can get e.g. AbstractMethodError, or again just a wrong result.

You can influence which file gets picked by writing your own descriptor, but it's horribly verbose, there's no equivalent to just writing MergeStrategy.first/last (and concat/discard are not allowed).

The SBT plugin could do the same: default to a strategy when you don't specify one, but then, well, you could silently get the wrong result.

Olag answered 22/5, 2018 at 9:27 Comment(5)
It seems possible, that it behaves like MergeStrategy.first but different sources state different things. Fe. gist.github.com/simonwoo/04b133cb0745e1a0f1d6 says it would "it will cause Java class name conflict issue". If you could find some way to confirm the way it works I would be 110% satisfied.Luke
Well, the documentation is more likely to be right than a random gist. But they don't actually disagree: the issue it causes can be exactly "you silently get the wrong result".Olag
You can verify it for this case by unpacking the jar Maven produces and the dependency jars in SBT error message, then checking which .class file Maven used. For the general case, you have to either rely on documentation, or check the sources of maven-assembly-plugin.Olag
@Luke Actually, after rereading what it says more carefully, and looking at the definition of jar-with-dependencies in maven.apache.org/plugins/maven-assembly-plugin/…, it's allowed to pick arbitrary .class file, not necessarily the first: the first phase wins, but in jar-with-dependencies there is only one phase including all dependencies. So you do need to look at the sources.Olag
@Luke But the point that it just picks a single class and if it picked the wrong one you won't know until running the program (if you are lucky) remains the same. Except it's less predictable which one it picks.Olag
Q
0

From the build.sbt I can see that their is no Merge-Strategy in you build. Plus there is a Rogue "," in your libraryDependencies Key placed after the dependency of "org.apache.cassandra" % "cassandra-all" % "3.4" in your build.sbt in the project to which the link you have shared above.

A merge strategy is required to handle all the duplicate files and in the jar as well as versions. The following one is an example of how to get one in place in your build.

assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")       => MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")   => MergeStrategy.discard
  case "reference.conf"                                 => MergeStrategy.concat
  case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
  case _                                                => MergeStrategy.first
}

You could try writing a simple build file if you do not have sub-projects in your project. You can try the following build.sbt.

name := "assembly-test",

version := "0.1",

scalaVersion := "2.12.4",

libraryDependencies ++= Seq(
      "com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
      "org.apache.cassandra" % "cassandra-all" % "3.4"
)

mainClass in assembly := Some("com.atais.cassandra.MainClass")

assemblyMergeStrategy in assembly := {
      case m if m.toLowerCase.endsWith("manifest.mf")       => MergeStrategy.discard
      case m if m.toLowerCase.matches("meta-inf.*\\.sf$")   => MergeStrategy.discard
      case "reference.conf"                                 => MergeStrategy.concat
      case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
      case _                                                => MergeStrategy.first
    }
Questionary answered 22/5, 2018 at 8:39 Comment(1)
I know about assemblyMergeStrategy and the extra commas do not matter. You did not answer the question. I know how to "make it work". I want to understand why it does not. Also, your merging strategies look really random, which is not a good idea, imo.Luke

© 2022 - 2024 — McMap. All rights reserved.