Underlying mechanism of String pooling in Java?
Asked Answered
O

7

24

I was curious as to why Strings can be created without a call to new String(), as the API mentions it is an Object of class java.lang.String

So how are we able to use String s="hi" rather than String s=new String("hi")?

This post clarified the use of == operator and absence of new and says this is due to String literals being interned or taken from a literal pool by the JVM, hence Strings are immutable.

On seeing a statement such as

String s="hi"

for the first time what really takes place ?

  1. Does the JVM replace it like this String s=new String("hi") , wherein an Object is created and "hi" is added to the String literal pool and so subsequent calls such as String s1="hi" are taken from the pool?

  2. Is this how the underlying mechanism operates? If so, then is

    String s=new String("Test");
    String s1="Test";
    

    the same as

    String s="Test";
    String s1="Test";
    

    in terms of memory utilization and efficiency?

  3. Also, is there any way by which we can access the String Pool to check how many String literals are present in it, how much space is occupied, etc.?

Opportuna answered 25/11, 2014 at 9:36 Comment(6)
"Does the JVM replace it like this" - I think the compiler replaces that, not the JVM.Lackadaisical
Yes but does a equivalent type of replacement or optimization take place like the one i mentionedOpportuna
btw, have you seen the comments to String's intern() method? docs.oracle.com/javase/7/docs/api:Erlin
Your third question's answered here: stackoverflow.com/questions/19049812Galenical
What do you mean by changing s = "hi" to s = new String("hi")? I don't see how this solved anything except adding a new layer, now you'll need s = new String(new String("hi")) and in the end you need an infinite term new String(new String(.... If by the rhs "hi" you meant something that isn't a string you should use a different syntax.Glaab
Check the JVM class file spec, it's all there. Obviously, the thing that produces a .class file in the first place is the compiler. ;)Dulcet
H
11
  1. String s="hi" for the first time what really takes place ?

Does the JVM replace it like this String s=new String("hi") , wherein an Object is created and "hi" is added to the String literal pool and so subsequent calls such as String s1="hi" are taken from the pool ?.

No. What really happens is - the String Literals are resolved during compile time and interned (added to the String constants pool) as soon as the class is loaded / initialized or lazily. Thus, they are made available to the classes within the JVM. Note that, even if you have a String with value "hi" in the Strings constants pool, new String("hi") will create another String on the heap and return its reference.

  1. is
 String s=new String("Test"); 
 String s1="Test"; 

the same as

 String s="Test"; 
 String s1="Test"; 

in terms of memory utilization and efficiency?

No, in the first case 2 "Test" Strings are created. One will be added to the String constants pool (assuming it is not already present there) and another on the heap. The second one can be GCed.In the second case, only one String literal is present in the String constants pool and there are 2 references to it (s and s1).

  1. Also if there any way by which we can access the String Pool as in check how many String literals are present in it, space occupied etc from the program or from any monitoring tool?

I don't think we can see the contents of the String constants pool. We can merely assume and confirm the behavior based on our assumptions.

Helsa answered 25/11, 2014 at 9:52 Comment(7)
So to make it clear a call such as String s=new String("Test"); will add it to the literal pool,(Assuming not already there) and also create a String Object on the heap whose value is "Test"?Opportuna
@DroidIcs - Yep.. That's right. So the next time you do String someVar="Test", the value from the Strings constants pool will be returned.Helsa
However notice that adding to the literal pool is compile time operation done by compiler (javac) and calling the string constructor happens at runtime by JVM. So actually it's the same mechanism, just when calling new String("test") you pass the reference from literal pool as a constructor arg instead of assigning it directly. Also there is #intern method on docs.oracle.com/javase/8/docs/api/java/lang/… which will remove duplicated instances and replace them with reference from poolGermaine
@saberduck I slightly disagree with the first sentence. The compiler adds the String to the constant pool of the class, and later the JVM adds all the class Strings to its String pool (probably all at once at class loading time).Thalamencephalon
@Thalamencephalon - and I completely agree with you.. :)Helsa
So new String(new char[]{ 'h', 'i' }) creates a new object that is not reference-equal to any "hi" String Literal, no question. Also, all JVM versions seem to return new objects for subsequent calls of new String(new char[]{ 'h', 'i' }). But is this implementation behavior or by specification? So, would a JVM implementation theoretically be allowed to return an existing interned String object from a call of new String(byte[])?Bathyscaphe
@Bathyscaphe - The thing is that the JVM doesn't specify anything related to how this should behave. This could be implementation dependent. Before java7, literals were in perm gen and were not GCed but after java7 String constants pool has been moved into direct heap so they can be GCed. That being said, I have to say - yes, there might be a chance where a particular vendor might decide to return the same String present in the constants pool (which almost always never happens because it will actually be less efficient)Helsa
L
15

The Java compiler has special support for string literals. Suppose it did not, then it would be really cumbersome to create strings in your source code, you'd have to write something like:

// Suppose that we would not have string literals like "hi"
String s = new String(new char[]{ 'h', 'i' });

To answer your questions:

  1. More or less, and if you really want to know the details, you'd have to study the source code of the JVM, which you can find at OpenJDK, but be warned that it's huge and complicated.

  2. No, those two are not equivalent. In the first case you are explicitly creating a new String object:

    String s=new String("Test");
    

    which will contain a copy of the String object represented by the literal "Test". Note that it is never a good idea to write new String("some literal") in Java - strings are immutable, and it is never necessary to make a copy of a string literal.

  3. There's no way I know of to check what's in the string pool.

Loats answered 25/11, 2014 at 9:45 Comment(3)
I understand that new 'String("Hi");' ,results in a copy but taken considering the set of 2 statements together aren't they more or less the same as anyway once copy needs to be created i guess or does String s=new.... result in a copy in the literal and a local copy rather than pointing to it ,in which case i understand the difference , +1 copy of "hi"Opportuna
No, in the first case you have two String objects with the same content "Test", in the second case you have only one String object (and s and s1 both refer to the same object).Loats
return string.intern() == string should check whether a string is interned. It's a dirty hack, but should return the right answer. Most of the time. It also has the side effect of interning the String.Fab
H
11
  1. String s="hi" for the first time what really takes place ?

Does the JVM replace it like this String s=new String("hi") , wherein an Object is created and "hi" is added to the String literal pool and so subsequent calls such as String s1="hi" are taken from the pool ?.

No. What really happens is - the String Literals are resolved during compile time and interned (added to the String constants pool) as soon as the class is loaded / initialized or lazily. Thus, they are made available to the classes within the JVM. Note that, even if you have a String with value "hi" in the Strings constants pool, new String("hi") will create another String on the heap and return its reference.

  1. is
 String s=new String("Test"); 
 String s1="Test"; 

the same as

 String s="Test"; 
 String s1="Test"; 

in terms of memory utilization and efficiency?

No, in the first case 2 "Test" Strings are created. One will be added to the String constants pool (assuming it is not already present there) and another on the heap. The second one can be GCed.In the second case, only one String literal is present in the String constants pool and there are 2 references to it (s and s1).

  1. Also if there any way by which we can access the String Pool as in check how many String literals are present in it, space occupied etc from the program or from any monitoring tool?

I don't think we can see the contents of the String constants pool. We can merely assume and confirm the behavior based on our assumptions.

Helsa answered 25/11, 2014 at 9:52 Comment(7)
So to make it clear a call such as String s=new String("Test"); will add it to the literal pool,(Assuming not already there) and also create a String Object on the heap whose value is "Test"?Opportuna
@DroidIcs - Yep.. That's right. So the next time you do String someVar="Test", the value from the Strings constants pool will be returned.Helsa
However notice that adding to the literal pool is compile time operation done by compiler (javac) and calling the string constructor happens at runtime by JVM. So actually it's the same mechanism, just when calling new String("test") you pass the reference from literal pool as a constructor arg instead of assigning it directly. Also there is #intern method on docs.oracle.com/javase/8/docs/api/java/lang/… which will remove duplicated instances and replace them with reference from poolGermaine
@saberduck I slightly disagree with the first sentence. The compiler adds the String to the constant pool of the class, and later the JVM adds all the class Strings to its String pool (probably all at once at class loading time).Thalamencephalon
@Thalamencephalon - and I completely agree with you.. :)Helsa
So new String(new char[]{ 'h', 'i' }) creates a new object that is not reference-equal to any "hi" String Literal, no question. Also, all JVM versions seem to return new objects for subsequent calls of new String(new char[]{ 'h', 'i' }). But is this implementation behavior or by specification? So, would a JVM implementation theoretically be allowed to return an existing interned String object from a call of new String(byte[])?Bathyscaphe
@Bathyscaphe - The thing is that the JVM doesn't specify anything related to how this should behave. This could be implementation dependent. Before java7, literals were in perm gen and were not GCed but after java7 String constants pool has been moved into direct heap so they can be GCed. That being said, I have to say - yes, there might be a chance where a particular vendor might decide to return the same String present in the constants pool (which almost always never happens because it will actually be less efficient)Helsa
L
7

The following is a slight simplification, so don't try to cite exact details from it, but the general principles apply.

Each compiled Java class contains a data blob that indicates how many strings were declared in that class file, how long each one is, and the characters that belong in all of them. When the class is loaded, the class loader will create a String[] of suitable size to hold all of the strings defined in that class; for each string, it will then generate a char[] of suitable size, read the appropriate number of characters from the class file into the char[], create a String encapsulating those characters, and store the reference into the class's String[].

When compiling some class (e.g. Foo), the compiler knows which string literal it encounters first, second, third, fifth, etc. If code says myString = "George"; and George was the sixth string literal, that will appear in code as a "load string literal #6" instruction; the just-at-time compiler, when it is generating code for that instruction, will generate an instruction to fetch the sixth string reference associated with that class.

Lanham answered 25/11, 2014 at 20:41 Comment(0)
E
6

That's not tightly related to the subject, but whenever you have doubts as to what will java compiler do, you can use the

javap -c CompiledClassName

to print what is actually going on. (CompiledClassName from the dir where CompiledClassName.class is)

To add to Jesper's answer, there are more mechanisms at work, like when you concatenate a String from literals or final variables, it will still use the intern pool:

String s0 = "te" + "st";
String s1 = "test";
final String s2 = "te";
String s3 = s2 + "st";
System.out.println(s0==s1); //true
System.out.println(s3==s1); //true

But when you concatenate using non-final variables it will not use the pool:

String s0 = "te";
String s1 = s0 + "st";
String s2 = "test";
System.out.println(s1 == s2); //false
Erlin answered 25/11, 2014 at 9:55 Comment(2)
Actually, this doesn't add anything to Jesper's answer.Helsa
Just trying to add to the topic. No point in repeating what Jesper said.Erlin
M
5
  1. A kind of, but not exactly.
    String constants are created and interned during constant pool resolution. This happens upon the first execution of LDC bytecode that loads a string literal. After the first execution the JVM replaces JVM_CONSTANT_UnresolvedString constant pool tag with JVM_CONSTANT_String tag so that the next time LDC will take an existing string instead of creating a new one.

  2. No. The first use of "Test" will create a new string object. Then new String("Test") will create the second object.

  3. Yes, using HotSpot Serviceability Agent. Here is an example.

Muriah answered 25/11, 2014 at 21:9 Comment(0)
D
0

I believe that the underlying mechanism for creating a String is a StringBuilder which assembles the String object at the end. At least I know for sure that if you have a string that you want to change, for example:

String str = "my String";
// and then do
System.out.println(str + "new content");

So what this does is it creates a StrigBuilder from the old object and replaces it with a new one that is constructed from the builder. This is why it is more memory efficient to use StringBuilder instead of a regular string to which you would just append stuff.

There is a way to access the already created pool of String which is by using the String.intern() method. It tells java to use the same memory space for Strings which are the same and gives you a reference to that place in memory. This also allows you to use the == operator to compare strings and is more memory efficient.

Dolmen answered 25/11, 2014 at 9:58 Comment(0)
E
-2

String pool as it is pool of string stored in heap for exp:

String s="Test";
String s1="Test";    

both gets stored in heap and refers to a single "Test" thus s1=s, while

String s=new String("Test");

is an object that also get stored in heap but different form s1=s refer here

Expeller answered 25/11, 2014 at 10:2 Comment(2)
Strings in Java must use double quotes. Your code is not valid in Java because you use single quotes.Escorial
next time i will remember itExpeller

© 2022 - 2024 — McMap. All rights reserved.