Is caching of boxed Byte objects not required by Java 13 SE spec?
Asked Answered
C

2

6

Reading the JAVA 13 SE specification, I found in chapter 5, section 5.1.7. Boxing Conversion the following guarantee:

If the value p being boxed is the result of evaluating a constant expression (§15.28) of type boolean, char, short, int, or long, and the result is true, false, a character in the range '\u0000' to '\u007f' inclusive, or an integer in the range -128 to 127 inclusive, then let a and b be the results of any two boxing conversions of p. It is always the case that a == b

I find it odd that values of type byte are left out from that wording.

For example, in a code such as:

Byte b1=(byte)4;
Byte b2=(byte)4;
System.out.println(b1==b2);

We have a constant expression of type byte, and after the boxing, the values of b1 and b2 may or may not be the same object.

It works actually the same way without the cast:

Byte b1=4;

Here, we have a constant expression of type int in an assignment context. So, according to the spec

A narrowing primitive conversion followed by a boxing conversion may be used if the variable is of type Byte, Short, or Character, and the value of the constant expression is representable in the type byte, short, or char respectively.

So the expression will be converted to byte, and that byte type value will be boxed, so there is no guarantee that the value is interned.

My question is am I right in interpreting the spec, or am I missing something? I have looked if the spec requires using of method Byte.valueOf() for the boxing (for which it would be guaranteed), but it does not.

Candace answered 1/1, 2020 at 19:11 Comment(2)
It is not required to use a cached value in the language spec.Endomorph
Possibly related Does autoboxing call valueOf()?, stating that valueOf() is not mandated.Candace
P
5

TL;DR this has been fixed with JDK 14, which now includes byte.

I consider this a specification bug, result of multiple rewritings.

Note the text of the JLS 6 counterpart:

If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

Here, byte is explicitly mentioned as being boxed to an object with canonical identity, unconditionally. Since all bytes are in the -127..128 range, there was no need for adding such a restriction.

But note that long has not been mentioned.

Then, meet JDK-7190924, 5.1.7: JLS does not mention caching of autoboxed longs

In the comments, you can see, how it happened.

In his first comment, Alex Buckley criticizes that "byte is a type, not a value", not considering that "byte" could mean "all values in the byte range", but since he also assumes that "number" originally meant "literal" (instead of, e.g. "numeric value"), he focuses on the point that all integer literals are either int or long.

His first draft uses the term "integer literal" and removes the types completely. A slightly modified version of it made it into the Java 8 JLS:

If the value p being boxed is an integer literal of type int between -128 and 127 inclusive (§3.10.1), or the boolean literal true or false (§3.10.3), or a character literal between '\u0000' and '\u007f' inclusive (§3.10.4), then let a and b be the results of any two boxing conversions of p. It is always the case that a == b.

So in Java 8, the type doesn't matter at all, but the guaranty is limited to literals.

So this would imply that

Byte b1 = 4;

does evaluate to a canonical object due to the integer literal, where as

Byte b1 = (byte)4;

may not, as (byte)4 is a constant expression but not a literal.

In his next comment, years later, he considers "constant expressions", which can indeed be typed, and reformulates the phrase, bringing back the types, "boolean, char, short, int, or long", having added long, but forgotten about "byte".

This resulting phrase is what you've cited, which is in the specification since Java 9.

The omission of byte surely isn't intentional, as there is no plausible reason to omit it, especially, when it was there before, so this would be a breaking change when taken literally.

Though, restricting the caching to compile-time constants, when JLS 6 specified it for all values in the range without such a restriction, is already a breaking change (which doesn't matter in practice, as long as it is implemented via valueOf, which has no way of knowing whether the value originated from a compile-time constant or not).

As a side note, the documentation of Byte.valueOf(byte) explicitly says:

...all byte values are cached

as long as since Java 7.

Pareto answered 8/1, 2020 at 0:32 Comment(6)
There is a difference between a documentation (of a particular implementation), and the JLS. The question asks about the JLS.Stainless
@Stainless the API documentation hosted by Oracle is the specification of the API. There is no other authoritative API specification. Things that are implementation details are explicitly marked as such. Besides that, I don’t see the point of focusing on a small side note of my answer, when the entire answer is about the JLS and not that API mentioned in the side note. That’s especially hilarious as your answer is citing code comments of internal implementation classes.Pareto
Very interesting history.Timbering
I don’t see the point of focusing on a small side note of my answer - so you can maliciously and also wrongly nitpick a part of my answer, but I may not point out when you are talking about something unrelated to the question.Stainless
@Stainless there is a fundamental difference between a wrong statement and an unrelated statement.Pareto
Updated. JLS 14 includes byte, which is a indicates that it was an unintentional omission in JLS 13.Pareto
S
5

You understand it correctly. The end of the same 5.1.7 section (from https://docs.oracle.com/javase/specs/jls/se13/html/jls-5.html) says:

A boxing conversion may result in an OutOfMemoryError if a new instance of one of the wrapper classes (Boolean, Byte, Character, Short, Integer, Long, Float, or Double) needs to be allocated and insufficient storage is available.

Byte would not be there if it was expected to be pre-generated.

Another thing, still from the same paragraph:

Ideally, boxing a primitive value would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rule above is a pragmatic compromise, requiring that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly. For other values, the rule disallows any assumptions about the identity of the boxed values on the programmer's part. This allows (but does not require) sharing of some or all of these references.


Not a "proof", but perhaps it is worth to mention: Integer describes the boxing promise, 13 and even 7
 * Cache to support the object identity semantics of autoboxing for values between
 * -128 and 127 (inclusive) as required by JLS.

The text is the same, even though the implementation has changed over time.

Byte has no such statement, though it is cached too. 7, 13. The cache is there in both, but there is not a single word about it (and neither about boxing).

Stainless answered 1/1, 2020 at 19:22 Comment(9)
The fact that OutOfMemoryError can happen is something else, I think. Boolean is mentioned also, even though caching of Boolean values is required. The value might be created lazily, and only then cached. The OutOfMemoryError can happen on creating the first value when adding it to the cache.Candace
@Candace added another part. Being necessary/unnecessary and even happening "normally" is different from being guaranteed.Stainless
"there is not a single word about it", except for its documentation, which says "all byte values are cached"Pareto
@Holger, there is not a single word about it in Byte.java, about this being a JLS requirement, while I found such indication in Integer.java. That is what is written there, given the context, which is JLS (from question) and implementation (what I linked). I did not say anything about the documentation. And in my opinion the documentation says "this one works this way", while JLS say "it has to work this way".Stainless
Your sentence is “The cache is there in both, but there is not a single word about it…”, so “it” refers to the cache, not the JLS. The term JLS does not even appear in the preceding two sentences, so I doubt that any reader would conclude that “it” refers to JLS.Pareto
@Holget ... so it does not even matter what that "it" is, what is not mentioned, because nothing is mentioned. Byte.java simply does not dedcribe its cache and if it is a requirement or it is related to boxing, at all. While Integer.java does.Stainless
@Pareto documentation comment of the API method valueOf, a few lines under the part you’ve linked, where it says “all byte values are cached” - which is a documentation comment, does not refer to JLS and thus irrelevant to the question. On a side note: I certainly will not touch this answer as due to the pleasant tone applied I prefer avoiding even the look of it that I ever felt your comments relevant.Stainless
Your entire last section is “irrelevant to the question”. You even start this saying “Not a proof” and, as said, this JLS referring comment is only present in Integer, it doesn’t relate to Byte at all. Speaking of relevance, your answer’s first conclusion is wrong either, as the statement that boxing may result in an OutOfMemoryError is about boxing in general, not contradicting the possibility that boxing certain compile-time constants lead to pre-allocated objects or lazily allocated objectsPareto
As said, neither Boolean, Short, Character, nor Long contain such a comment referring to the JLS, only Integer does, so “by comparison” the conclusion would be that only Integer has such a guaranty, even when the current JLS says explicitly that boolean, short, char, and long have it? The question is whether “byte” is special, but a comment only appearing in Integer would make int special, so this is irrelevant. Anyway, there is no need to comment again, I accept that you have a different opinion and keep ignoring things, just because it was me who said them.Pareto
P
5

TL;DR this has been fixed with JDK 14, which now includes byte.

I consider this a specification bug, result of multiple rewritings.

Note the text of the JLS 6 counterpart:

If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

Here, byte is explicitly mentioned as being boxed to an object with canonical identity, unconditionally. Since all bytes are in the -127..128 range, there was no need for adding such a restriction.

But note that long has not been mentioned.

Then, meet JDK-7190924, 5.1.7: JLS does not mention caching of autoboxed longs

In the comments, you can see, how it happened.

In his first comment, Alex Buckley criticizes that "byte is a type, not a value", not considering that "byte" could mean "all values in the byte range", but since he also assumes that "number" originally meant "literal" (instead of, e.g. "numeric value"), he focuses on the point that all integer literals are either int or long.

His first draft uses the term "integer literal" and removes the types completely. A slightly modified version of it made it into the Java 8 JLS:

If the value p being boxed is an integer literal of type int between -128 and 127 inclusive (§3.10.1), or the boolean literal true or false (§3.10.3), or a character literal between '\u0000' and '\u007f' inclusive (§3.10.4), then let a and b be the results of any two boxing conversions of p. It is always the case that a == b.

So in Java 8, the type doesn't matter at all, but the guaranty is limited to literals.

So this would imply that

Byte b1 = 4;

does evaluate to a canonical object due to the integer literal, where as

Byte b1 = (byte)4;

may not, as (byte)4 is a constant expression but not a literal.

In his next comment, years later, he considers "constant expressions", which can indeed be typed, and reformulates the phrase, bringing back the types, "boolean, char, short, int, or long", having added long, but forgotten about "byte".

This resulting phrase is what you've cited, which is in the specification since Java 9.

The omission of byte surely isn't intentional, as there is no plausible reason to omit it, especially, when it was there before, so this would be a breaking change when taken literally.

Though, restricting the caching to compile-time constants, when JLS 6 specified it for all values in the range without such a restriction, is already a breaking change (which doesn't matter in practice, as long as it is implemented via valueOf, which has no way of knowing whether the value originated from a compile-time constant or not).

As a side note, the documentation of Byte.valueOf(byte) explicitly says:

...all byte values are cached

as long as since Java 7.

Pareto answered 8/1, 2020 at 0:32 Comment(6)
There is a difference between a documentation (of a particular implementation), and the JLS. The question asks about the JLS.Stainless
@Stainless the API documentation hosted by Oracle is the specification of the API. There is no other authoritative API specification. Things that are implementation details are explicitly marked as such. Besides that, I don’t see the point of focusing on a small side note of my answer, when the entire answer is about the JLS and not that API mentioned in the side note. That’s especially hilarious as your answer is citing code comments of internal implementation classes.Pareto
Very interesting history.Timbering
I don’t see the point of focusing on a small side note of my answer - so you can maliciously and also wrongly nitpick a part of my answer, but I may not point out when you are talking about something unrelated to the question.Stainless
@Stainless there is a fundamental difference between a wrong statement and an unrelated statement.Pareto
Updated. JLS 14 includes byte, which is a indicates that it was an unintentional omission in JLS 13.Pareto

© 2022 - 2024 — McMap. All rights reserved.