Why does a Java class compile differently with a blank line?
Asked Answered
M

4

212

I have the following Java class

public class HelloWorld {
  public static void main(String []args) {
  }
}

When I compile this file and run a sha256 on the resulting class file I get

9c8d09e27ea78319ddb85fcf4f8085aa7762b0ab36dc5ba5fd000dccb63960ff  HelloWorld.class

Next I modified the class and added a blank line like this:

public class HelloWorld {

  public static void main(String []args) {
  }
}

Again I ran a sha256 on the output expecting to get the same result but instead I got

11f7ad3ad03eb9e0bb7bfa3b97bbe0f17d31194d8d92cc683cfbd7852e2d189f  HelloWorld.class

I have read on this TutorialsPoint article that:

A line containing only white space, possibly with a comment, is known as a blank line, and Java totally ignores it.

So my question is, since Java ignores blank lines why is the compiled bytecode different for both programs?

Namely the difference in that in HelloWorld.class a 0x03 byte is replaced by a 0x04 byte.

Mcmillen answered 3/10, 2018 at 10:35 Comment(7)
Note that the compiler is not obliged to be deterministic in producing class files, even though normally they are. See this question. Jar files by default are not reproducible, i.e. even compiling the same code will result in two different JARs. That is because the order of the files and the timestamps will not match. Reproducible builds are possible with specific configuration.Dissuasive
TutorialsPoint claims that "Java totally ignores" blank lines. Section 3.4 of the Java Language Specification says otherwise. Which one to believe?...Kalliekallista
@Kalliekallista The specification.Outlook
Lesson learnt: when in doubt, always refer to the SDK.Apophyllite
@GiacomoAlzetta there’s not even a specified bytecode form for a single bytecode file. E.g., the order of members is unspecified, so if the compiler uses the new immutable Sets with randomization internally, it could produce a different order on each run. It also could add a custom attribute containing the compile-time. And so on…Aminopyrine
@DioPhung another lesson learned: tutorialspoint is not a reliable source for good tutorialsOster
It's quite usual to make simplifications in tutorials, though. TutorialsPoint is not incorrect in saying that "Java totally ignores blank lines", they were probably referring only to the semantics of the language.Tidewater
R
339

Basically, line numbers are kept for debugging, so if you change your source code the way you did, your method starts at a different line and the compiled class reflects the difference.

Reichard answered 3/10, 2018 at 10:43 Comment(4)
That also explains why its differs in the Bytes reported by the OP: end-of-transmission stands for the ASCII code 4 and end-of-text stands for the ASCII code 3Littleton
To experimentally prove this I compared the hashes of the class files of OP's source using the -g:none flag when compiling (which removes all debugging information, see here) and got the same hash in both scenarios.Nabokov
In formal support of your answer, from section 3.4 ("Line Terminators") of the Java Language Specification for Java SE 11: "A Java compiler next divides the sequence of Unicode input characters into lines by recognizing line terminators...The lines defined by line terminators may determine the line numbers produced by a Java compiler".Kalliekallista
One important use of these line numbers is if an exception is thrown; it can tell you the line number of the exception in the stack trace.Battiste
U
116

You can see the change by using javap -v which will output verbose information. Like other already mentioned the difference will be in line numbers:

$ javap -v HelloWorld.class > with-line.txt
$ javap -v HelloWorld.class > no-line.txt
$ diff -C 1 no-line.txt with-line.txt
*** no-line.txt 2018-10-03 11:43:32.719400000 +0100
--- with-line.txt       2018-10-03 11:43:04.378500000 +0100
***************
*** 2,4 ****
    Last modified 03-Oct-2018; size 373 bytes
!   MD5 checksum 058baea07fb787bdd81c3fb3f9c586bc
    Compiled from "HelloWorld.java"
--- 2,4 ----
    Last modified 03-Oct-2018; size 373 bytes
!   MD5 checksum 435dbce605c21f84dda48de1a76e961f
    Compiled from "HelloWorld.java"
***************
*** 50,52 ****
        LineNumberTable:
!         line 3: 0
        LocalVariableTable:
--- 50,52 ----
        LineNumberTable:
!         line 4: 0
        LocalVariableTable:

More precisely the class file differs in the LineNumberTable section:

The LineNumberTable attribute is an optional variable-length attribute in the attributes table of a Code attribute (§4.7.3). It may be used by debuggers to determine which part of the code array corresponds to a given line number in the original source file.

If multiple LineNumberTable attributes are present in the attributes table of a Code attribute, then they may appear in any order.

There may be more than one LineNumberTable attribute per line of a source file in the attributes table of a Code attribute. That is, LineNumberTable attributes may together represent a given line of a source file, and need not be one-to-one with source lines.

Underglaze answered 3/10, 2018 at 10:45 Comment(0)
R
61

The assumption that "Java ignores blank lines" is wrong. Here is a code snippet that behaves differently depending on the number of empty lines before the method main:

class NewlineDependent {

  public static void main(String[] args) {
    int i = Thread.currentThread().getStackTrace()[1].getLineNumber();
    System.out.println((new String[]{"foo", "bar"})[((i % 2) + 2) % 2]);
  }
}

If there are no empty lines before main, it prints "foo", but with one empty line before main, it prints "bar".

Since the runtime behavior is different, the .class files must be different, regardless of any timestamps or other metadata.

This holds for every language that has access to the stack frames with line numbers, not only for Java.

Note: if it's compiled with -g:none (without any debugging information), then the line numbers will not be included, getLineNumber() always returns -1, and the program always prints "bar", regardless of the number of line breaks.

Richellericher answered 4/10, 2018 at 11:48 Comment(3)
It can also print Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1.Picot
@Picot The only way I could get a -1 was to use the -g:none flag. Is there any other way to get this exception using ordinary javac?Richellericher
I guess only with the -g option. There's also -g:vars and -g:source which prevents the generation of the LineNumberTable.Picot
Z
14

As well as any line number details for debugging, your manifest may also store the build time and date. This will naturally be different every time you compile.

Zina answered 3/10, 2018 at 21:42 Comment(6)
C# has this issue as well; until recently the compiler always embedded a fresh GUID in the generated assembly so that you would be guaranteed that two builds would not be binary identical, so that you could tell them apart!Catullus
@EricLippert if two builds are only different by their generated time (i.e identical code base), shouldn't we treat them as the same? With modern CI / CD build pipeline (Jenkins, TeamCity, CircleCI ), we will have a way to differentiate between builds, but from application perspective, deploying newer binaries with identical code base doesn't seem to be useful.Apophyllite
@DioPhung It's the other way around. You don't want two different builds to have the same GUID, because that's how the system can decide which one to use. So it's easiest to generate a new GUID each time; and then you get the side effect that Eric describes as an unintended consequence.Zina
I cant see that it's helpful for what is essentially the same build to result in 2 different binaries. Meta information should stay... meta, imo.Replay
@Replay Like I said, it would be even less helpful for two different builds to be reported with the same GUID, which would then be reported to the system as being the same software. This would cause total failure of any kind of provisioning scheme, so it's mission-critical that GUIDs are never duplicated (within reasonable probability!). Having different GUIDs for two separate builds of the same source code is a trivial annoyance at most. So in the face of a mission-critical failure scenario, what you think is slightly unhelpful really doesn't figure.Zina
@Replay The code part of the binary is still the same (if I'm understanding, I'm not a C# dev), it's just some metadata that is attached to the binary.Nabokov

© 2022 - 2024 — McMap. All rights reserved.