Why aren't Integers cached in Java?
Asked Answered
H

15

27

I know there are similar posts on the topic, but they don't quite address my question. When you do:

Integer a = 10;
Integer b = 10;
System.out.println("a == b: " + (a == b));

This will (apparently) print true most of the time because integers in the range [-128, 127] are somehow cached. But:

Integer a = new Integer(10);
Integer b = new Integer(10);
System.out.println("a == b: " + (a == b));

Will return false. I understand that I am asking for new instances of an Integer, but since boxed primitives are immutable in Java, and the machinery is already there to do the "right thing" (as seen in the first case), why does this happen?

Wouldn't it make more sense if all instances of an Integer with a 10 be the same object in memory? In other words, why don't we have "Integer interning" which would be similar to "String interning"?

Better yet, wouldn't it make more sense if instances of a boxed primitive representing the same thing, regardless of value (and type), be the same object ? Or at least respond correctly to ==?

Hydrocellulose answered 11/3, 2011 at 20:20 Comment(5)
I disagree, I think to behave in this manner is a misrepresentation of what is actually happening, I actually think Integer caching and the implementation of String '==' shouldn't be part of the core for the same reason, admittedly the issue indentified in this post does seem inconsistent.Velate
While not a duplicate by any means, I illustrate much of what's concerned here in my answer here: #5199859Coquet
the current behavior is consistent with String, where constants will be interned, but it you do new String("foo") you will always get a new instance.Draper
@Draper Only partially consistent, because larger integers are not "interned" at all.Hydrocellulose
i was referring to the "new Foo()", not the constant version. yes, i realize not all the constants are interned, but the original question was about the explicit use of the constructor.Draper
B
22

It should be very clear that caching has an unacceptable performance hit -- an extra if statement and memory lookup every time you create an Integer. That alone overshadows any other reason and the rest of the agonizing on this thread.

As far as responding "correctly" to ==, the OP is mistaken in his assumption of correctness. Integers DO respond correctly to == by the general Java community's expectation of correctness and of course by the specification's definition of correctness. That is, if two references point to the same object, they are ==. If two references point to different objects, they are not == even if they have the same contents. Thus, it should be no surprise that new Integer(5) == new Integer(5) evaluates to false.

The more interesting question is why new Object(); should be required to create a unique instance every time? i. e. why is new Object(); not allowed to cache? The answer is the wait(...) and notify(...) calls. Caching new Object()s would incorrectly cause threads to synchronize with each other when they shouldn't.

If it were not for that, then Java implementations could totally cache new Object()s with a singleton.

And that should explain why new Integer(5) done 7 times must be required to create 7 unique Integer objects each containing the value 5 (because Integer extends Object).


Secondary, Less Important Stuff: One problem in this otherwise nice scheme results from the autoboxing and autounboxing feature. Without the feature you could not do comparisons such as new Integer(5) == 5. To enable these, Java unboxes the object (and does not box the primitive). Therefore new Integer(5) == 5 is converted to: new Integer(5).intValue() == 5 (and not new Integer(5) == new Integer(5).

One last thing to understand is that autoboxing of n is not done by new Integer(n). It is done internally by a call to Integer.valueOf(n).

If you think you understand and want to test yourself, predict the output of the following program:

public class Foo {
  public static void main (String[] args) {
    System.out.println(Integer.valueOf(5000) == Integer.valueOf(5000));
    System.out.println(Integer.valueOf(5000) == new Integer(5000));
    System.out.println(Integer.valueOf(5000) == 5000);
    System.out.println(new Integer(5000) == Integer.valueOf(5000));
    System.out.println(new Integer(5000) == new Integer(5000));
    System.out.println(new Integer(5000) == 5000);
    System.out.println(5000 == Integer.valueOf(5000));
    System.out.println(5000 == new Integer(5000));
    System.out.println(5000 == 5000);
    System.out.println("=====");
    System.out.println(Integer.valueOf(5) == Integer.valueOf(5));
    System.out.println(Integer.valueOf(5) == new Integer(5));
    System.out.println(Integer.valueOf(5) == 5);
    System.out.println(new Integer(5) == Integer.valueOf(5));
    System.out.println(new Integer(5) == new Integer(5));
    System.out.println(new Integer(5) == 5);
    System.out.println(5 == Integer.valueOf(5));
    System.out.println(5 == new Integer(5));
    System.out.println(5 == 5);
    System.out.println("=====");
    test(5000, 5000);
    test(5, 5);
  }
  public static void test (Integer a, Integer b) {
    System.out.println(a == b);
  }
}

For extra credit, also predict the output if all the == are changed to .equals(...)

Update: Thanks to comment from user @sactiw : "default range of cache is -128 to 127 and java 1.6 onward you can reset the upper value >=127 by passing -XX:AutoBoxCacheMax= from command line"

Brenan answered 11/3, 2011 at 23:30 Comment(10)
The performance hit is already there because smaller ints are cached. And yes, the correctness of == depends on the definition. I am arguing here that there's no reason why two Integers with the same value should return false on a == comparison.Hydrocellulose
BTW some of the "agony" here is because I spent some time recently coding in C++, where you can overload operators (eg: ==). Ah, if only that was possible in Java.Hydrocellulose
we are cross commenting :-) i happen to be a decent ex-c++ programmer too. on one hand it should make it easier for you to understand that in java == is always a pointer comparison. and yes, it is distressful to not be able to overload operators but on the whole i find it a plus because i can read an isolated fragment of java code and be verrrry sure what the operators are doing. good luck!Brenan
@no_answer_not_upvoted: Java overloads == for value comparisons of primitives and reference comparisons of everything else, a design which might be okay were comparisons between reference types and primitives forbidden, but which becomes dubious if mixed comparisons are allowed [personally I think == should forbid all mixed comparisons other than those which involve only integer primitives, or specifically involve a double and a non-long integer primitive]. Given int i=2; Integer I1=new Integer(i); Integer I2=new Integer(i);, == now implements a broken equivalence relation.Astrahan
@Astrahan I've updated the answer to address your point. The == operator is not overloaded as far as I understand. What happens is that Java unboxes Integer before comparing with the primitive. Thus the equivalence relation is not truly broken; the domains are different.Brenan
@necromancer: The == operator is overloaded (at the language level) for what used to be two disjoint categories of operands; the expression x==y would have one meaning if x and y were both primitives and a different meaning if they were both reference types. The operator would be forbidden in contexts involving one of each. When auto-unboxing was added, no new overloads were added, but the addition of auto-unboxing meant that nonsensical mixed-operand comparisons which used to be forbidden instead get evaluated with dubious semantics.Astrahan
@Astrahan that's correct. I meant not overloaded for mixed-operands.Brenan
@necromancer: Perhaps my statement was unclear? I meant my statement about the overloads to describe two disjoint cases. Personally, I think that Java was going to allow mixed comparisons it should have defined the rules such that someInt==someInteger would test "if someInteger has a value which is equal to someInt`; I would consider that a potentially-useful behavior, even if unboxing otherwise required a cast (as it should). Java's type conversion rules are generally a mess, though; 16777216f==16777217, and 16777217==16777217.0, but 16777217.0!=16777216f.Astrahan
+1 for pointing out that "java unboxes Integer object instead of autoboxing int primitive during == comparison" and for also mentioning that during autoboxing process java internally calls Integer.valueOf(n) method and not new Integer(n) constructor. However, it would also be worth mentioning that default range of cache is -128 to 127 and java 1.6 onward you can reset the upper value >=127 by passing -XX:AutoBoxCacheMax=<new size> from command lineCapet
Thank you @Capet That's cool to know! I have added an update to my answer :-)Brenan
C
7

This would potentially break code written before this design change, when everybody righfully assumed that two newly created instances were different instances. It could be done for autoboxing, because autoboxing didn't exist before, but changing the meaning of new is too dangerous, and probably doesn't bring much gain. The cost of short-lived objects is not big in Java, and could even be lower than the cost of maintaining a cache of long-lived objects.

Ceroplastics answered 11/3, 2011 at 20:26 Comment(4)
+1 It is really as simple as that. Plain old backward compatibility.Pouliot
True, but I can't think of a situation where it would make sense for a comparison of two boxed primitives to be based on the reference. In other words, when would it make sense to have a == b to be false if they are both Integer(10)?Hydrocellulose
@NullUserException, your argument there is essentially that == on Integers should return whether the integers are equal. I agree. But that's an argument for operator overloading not for the caching of integer objects.Sheritasherj
@NullUserException: Code which needs to hold a bunch of identity tokens, each of which is assigned a numeric value, could use an Integer[] (or Long[], or whatever) for that purpose. It would probably be better to define a SequencedLockingToken class which contained an appropriate numeric primitive field, and then use a SequencedLockingToken class, but provided they are constructed with new, it is legitimate to use boxed primitives as identity tokens.Astrahan
A
5

If you check the source you see:

/**
 * Returns an Integer instance representing the specified int value. If a new
 * Integer instance is not required, this method should generally be used in
 * preference to the constructor Integer(int), as this method is likely to
 * yield significantly better space and time performance by caching frequently
 * requested values.
 * 
 * @Parameters: i an int value.
 * @Returns: an Integer instance representing i.
 * @Since: 1.5
 */
 public static Integer valueOf(int i) {
      final int offset = 128;
      if (i >= -128 && i <= 127) { // must cache
          return IntegerCache.cache[i + offset];
      }
      return new Integer(i);
 }

Source: link

It's the performance reasons why == returns boolean true with integers - it is totally a hack. If you want to compare values, then for that you have compareto or equals method.

In other languages, for example you can use == to compare strings as well, it is basically the same reason and it is called as one of the biggest mishaps of java language.

int is a primitive type, predefined by the language and named by a reserved keyword. As a primitive it does not contain class or any class associated information. Integer is an immutable primitive class, that is loaded through a package-private, native mechanism and casted to be Class - this provides auto boxing and was introduced in JDK1.5. Prior JDK1.5 int and Integer where 2 very different things.

Alonzoaloof answered 11/3, 2011 at 20:46 Comment(0)
V
2

In Java, every time you call the new operator, you allocate new memory and you create a new object. That's standard language behavior, and to my knowledge there is no way to bypass this behavior. Even standard classes have to abide by this rule.

Varve answered 11/3, 2011 at 20:24 Comment(3)
IDK, Java does have some special machinery for some of the standard classes, eg: autoboxing for primitive wrappers, String does interning and responds to the + operator. So this could be built into the language.Hydrocellulose
Yes, this could have been, but it's not the case. The semantics of new is always consistent: create a new object.Varve
@NullUserException: yes, but those examples are not using the new keyword.Pulchritudinous
R
2

It is my understanding that new will create a new object, no matter what. The order of operations here is that you first call new, which instantiates a new object, then the constructor gets called. There is no place for the JVM to intervene and turn the new into a "grab a cached Integer object based on the value passed into the constructor".

Btw, have you considered Integer.valueOf? That works.

Ripieno answered 11/3, 2011 at 20:25 Comment(2)
I know how to make it work; I am just wondering why a more efficient solution isn't built into the language since these objects are immutable.Hydrocellulose
It could be by design - the idea being that new implies you want to create a new object, maybe because you want two Integer objects with the same integer that will not return true if you compare them via ==. Just to give the programmer the option to do that.Ripieno
I
2

Wouldn't it make more sense if all instances of an Integer with a 10 be the same object in memory? In other words, why don't we have "Integer interning" which is similar to "String interning"?

Because it would be awful!

First, this code would throw an OutOfMemoryError:

for (int i = 0; i <= Integer.MAX_VALUE; i++) {
    System.out.printf("%d\n", i);
}

Most Integer objects are probably short-lived.

Second, how would you maintain such a set of canonical Integer objects? With some kind of table or map. And how would you arbitrate access to that map? With some kind of locking. So suddenly autoboxing would become a performance-killing synchronization nightmare for threaded code.

Injun answered 11/3, 2011 at 21:23 Comment(3)
It wouldn't throw an OutOfMemoryErrory, he's only proposing the caching of small values. In that case you would hold the Integer objects in an array which wouldn't need any synchronization.Sheritasherj
@Winston Ewert, plenty of others responded with the answer about the semantics of Java's new keyword. I was responding to the idea of interning Integers in general (as I quoted). Small values already are cached, you just have to use the right API (i.e. Integer.valueOf(int)). So I gave my opinion on why I think interning large values would be dumb.Injun
Your answer makes the false assumption, that interning means that all objects have to stay in memory forever. Since the question already said “similar to ‘String interning’”, you may simply compare with for(int i = 0; i <= Integer.MAX_VALUE; i++) System.out.println(String.valueOf(i).intern());, which runs without ever throwing an OutOfMemoryError.Bui
M
1

A new instance is a new instance, so they are equal in value, but they are not equal as objects.

So a == b can't return true.

If they were 1 object, as you ask for: a+=2; would add 2 to all int = 10 - that would be awful.

Minter answered 11/3, 2011 at 20:26 Comment(2)
No. a+= 2 is similar to a = Integer.valueOf(a.intValue() + 2). You get another Integer instance. Integer is immutable. Its value never changes.Ceroplastics
I guess both of you are right, if you use 'new' you will always get new instance but Integer being immutable class you can't modify it and thus if you try to modify it like a = a + 2; you get another instance with updated value. This also holds true for integers which are present in cache (for example from initialization like Integer x = 5)Capet
R
1

Your first example is a byproduct of the spec requiring that flyweights be created in a certain range around 0. It should never, ever, be relied on.

As for why Integer doesn't work like String ? I would imagine avoiding overhead to an already slow process. The reason you use primitives where you can is because they are significantly faster and take up way less memory.

Changing it now could break existing code because you're changing the functionality of the == operator.

Rainarainah answered 11/3, 2011 at 20:33 Comment(0)
E
1

new means new.

new Object() isn't frivolous.

Eldon answered 11/3, 2011 at 21:23 Comment(0)
W
1

BTW, If you do

Integer a = 234345;
Integer b = 234345;

if (a == b) {}

it is possible that this will be true.

This is because since you didn't use new Integer(), the JVM (not the class code) is allowed to cache its own copies of Integers if it sees fit. Now you shouldn't write code based on this, but when you say new Integer(234345) you are guaranteed by the spec that you will definitely have different objects.

Worlock answered 11/3, 2011 at 22:30 Comment(3)
And that's one more reason why this bugs me, because it's an implementation dependent thing that adds to the inconsistency of all this.Hydrocellulose
@Worlock That would be possible in java 1.6 and onward where you can reset the upper limit to >=127 by passing -XX:AutoBoxCacheMax=<new size> but not possible in java 1.5 because in java 1.5 the cache range was fix i.e. -128 to 127 only -OR- am I missing something here?Capet
My answer has nothing to do with the Integer cache. The JVM is allowed to optimize integer boxing if it sees fit to do so regardless of the actual value. So if you use the value 165234234 a gajillion times in your code, the JVM is allowed to cache that boxed primitive. Now you will never know if this actually happens for you, but it can. This just adds to the 'apparent flakeyness' of comparing boxed primitives. So DON"T DO IT.Worlock
B
1

Let me just expand slightly on ChrisJ's and EboMike's answers by giving links to the relevant sections of the JLS.

new is a keyword in Java, allowed in class instance creation expressions (Section 15.9 of the JLS). This is different from C++, where new is an operator and can be overloaded.

The expression always tries to allocate memory, and yields a fresh object each time it is evaluated (Section 15.9.4). So at that point it's already too late for cache lookup.

Bairn answered 8/12, 2015 at 11:40 Comment(0)
R
0

For Integer objects use the a.equals(b) condition to compare.

The compiler will not do the unboxing for you while you compare, unless you assign the value to a basic type.

Ridge answered 11/3, 2011 at 20:25 Comment(2)
I know that; that's not my question.Hydrocellulose
I guess your title should be "why intern() is not defined for Integers?"Ridge
N
0

Assuming your describing the behavior of you code accurately it sounds like autoboxing isn't working on the 'gets' (=) operatior, instead it sounds like Integer x = 10; gives the object x a memory pointer of '10' instead of a vale of 10. Therefore ((a == b) == true)( will evaluate to true because == on objects operates on the memory addresses which you assigned both to 10.

So when should you use autoboxing and unboxing? Use them only when there is an “impedance mismatch” between reference types and primitives, for example, when you have to put numerical values into a collection. It is not appropriate to use autoboxing and unboxing for scientific computing, or other performance-sensitive numerical code. An Integer is not a substitute for an int; autoboxing and unboxing blur the distinction between primitive types and reference types, but they do not eliminate it.

What oracle has to say on the subject.

Notice that the documentation doesn't supply any examples with the '=' operator.

Nerval answered 11/3, 2011 at 20:38 Comment(2)
That is not true. This is not C, there is no notion of pointers in Java. Autoboxing is working correctly in the first case.Hydrocellulose
Ive spent to much time digging around in the kernel lately, are you sure its not passing the address of the int '10'? I guess the fact that it doesn't throw a type exception would indicate functional autoboxing.Nerval
C
0

Please also note that the cache range was -128 to 127 in Java 1.5 but Java 1.6 onward it is the default range i.e. you can set upper value >= 127 by passing -XX:AutoBoxCacheMax=new_limit from command line

Capet answered 13/5, 2015 at 16:51 Comment(0)
S
-1

It's because you're using the new statement to construct the objetcs.

Integer a = Integer.valueOf(10);
Integer b = Integer.valueOf(10);
System.out.println("a == b: " + (a == b));

That will print out true. Weird, but Java.

Spirogyra answered 11/3, 2011 at 20:27 Comment(4)
The spec requires VMs to create flyweights in a certain range around 0. This is why that works, but it should never be used.Rainarainah
And that's where that cache range of [-128, 127] is used, not for the OP's first example. So (500 == 500) -> true, but (Integer.ValueOf(500) == Integer.ValueOf(500)) -> false.Muzzle
Actually, the spec allows JVMs to cache more than that. It only requires [-128,127]. Which means on one JVM, Integer.valueOf(500) == Integer.valueOf(500) may return true, but on most it will return false. This could introduce a bug that would almost never get tracked down.Coquet
@glowcoder - exactly. It's actually even worse than if it were specified to be [-128,127]Rainarainah

© 2022 - 2024 — McMap. All rights reserved.