Can the Java 8 compiler be forced into creating reproducible class files?
Asked Answered
A

3

7

My employer has a business need to make Java builds byte-for-byte reproducible. I am aware of the difficulties in making JAR files reproducible (due to archiving order and time stamps), but at this point I’m talking about class files.

I have builds of the same code using Java 8u65, both on Mac and on Linux. The class files are binarily different. Both classes decompile back to the same source; to see the difference requires the javap disassembler.

The source code seems to be:

final TrustStrategy acceptingTrustStrategy =
              (X509Certificate[] chain, String authType) -> true;

On one build, the result is:

private static boolean lambda$restTemplate$38(java.security.cert.X509Certificate[], java.lang.String) throws java.security.cert.CertificateException;
        Code:
           0: iconst_1
           1: ireturn
     

On the other, it is:

private static boolean lambda$restTemplate$15(java.security.cert.X509Certificate[], java.lang.String) throws java.security.cert.CertificateException;
        Code:
           0: iconst_1
           1: ireturn

Anonymous lambdas are getting names with different numbers in them (lambda$restTemplate$15 versus lambda$restTemplate$38).

It appears that, when I rebuild on the same host, I get the same bytes. When the host differs, the numbers change; two Linux hosts produced different bytes.

What determines these numbers? Is there a way to force every compilation to use the same numbers in this place, and thus produce the same class files? Or is Java 8 class file compilation indeterministic?

Agreement answered 13/3, 2019 at 20:36 Comment(3)
Different counter value at the same place indicates the lambdas are compiled in a different order. The sequence is kept here: hg.openjdk.java.net/jdk8/jdk8/langtools/file/1ff9d5118aae/src/…Vernellvernen
The "Different counter value" comment above is almost an answer itself. Is the counter value controllable in any way? Does it get reset per source file, or per javac invocation?Agreement
@RobertMandeville: A bit hard to diagnose without knowing exactly how you build your code, a few suggestions to check on though, is that most filesystems return directory listing in arbitrary order, which usually depends on how the files/directory entries are physically organised on disk. Posix systems don't define how files are ordered when you list files on a folder, so you'll have to sort/request specific ordering (e.g. alphanumeric).Lockhart
D
3

I haven't looked into it too much, but this article talks about reproducible builds in Java, and reproducible-builds has some tools to try to help making builds (and classes) reproducible.

The link you're probably looking for is the Reproducible Build Maven Plugin, made specifically for Java to try to "strip non-reproducible data from the generated artifacts".

Decompound answered 15/3, 2019 at 2:27 Comment(1)
I saw this site and this tool. Unfortunately, I don't see anything about reproducible classes. Making reproducible JARs/WARs out of class files is a known and solvable problem, and explicitly what that plugin solves. The internet is strangely quiet about reproducible class builds since Lambdas were introduced in Java 8; they used to be a given.Agreement
E
3

The counting of lambda expression is done by the compiler and increased as it encounters other lambda expressions.

If the files are read by the compiler in the same order, it should give the same compiled classes.

In any case, since you are building the code yourself, you could simply change the lambda expression to annonymous class declarations.

EDIT: I just noticed you indicated that the classes are built on two different OS. This can introduce difference in the compiling phase of your code. In order to have a reproducible build, it must be performed on the same architecture. Is there a reason you cannot deploy the artefacts as build on one architecture (either MacOS or Linux)?

Empirical answered 15/3, 2019 at 2:45 Comment(1)
I may be able to force the order that the compiler reads files in. I doubt that I can get my developers to convert all their lambdas into class definitions. As far as architecture goes, I'm way ahead of you. Unfortunately, my problem isn't platform-dependent; I am experiencing this on two hosts with the same Ubuntu version and same JDK version.Agreement
O
2

As mentioned in the DZone article, linked in Major's answer, for gradle this is all you need:

tasks.withType(AbstractArchiveTask) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}

After adding this to build.gradle, the md5sum of the .jar file was stable between builds on the same system. I could not test with other systems because everyone I asked had different compiler versions, and that makes the build different.

Orgell answered 21/3, 2019 at 10:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.