Search cost of string interning and declaration of literal strings
Asked Answered
E

2

6

Two Questions.

  1. When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning (method intern of class String)?

  2. In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool. And if there are many strings in the pool, it may be high cost. (maybe tradeoff of searching cost and memory?) On this point of view, it might be dangerous to declare mant literal strings. How significant is this searching cost and why java is designed in this way(searching pool when literal strings are declared).

Following is what I referred to understand background.


The JavaDoc for the java.lang.String class states:

Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared.

http://www.janeg.ca/scjp/lang/strLiteral.html comments:

In other words, because the compiler knows the strings original value cannot be changed once it's created it can safely use existing data and avoid cluttering up memory with duplicates.

Ethanethane answered 21/2, 2011 at 1:59 Comment(4)
I modified your reference to the "JSK 1.3" to the official JavaDoc.Transpolar
@joachim Sauer Thanks, but last sentence is from (janeg.ca/scjp/lang/strLiteral.html) you deleted. Could you reflect that? Or I will.Ethanethane
I removed it because the JavaDoc I linked above is the authorative, original source of the quote and that page is of questionable quality (there's no such thing as the "JSK 1.3" and it doesn't actually link to any of its sources).Transpolar
@Joachim Sauer Oh.. I know that but I mean the very last phase starting from In oher words. That sentence might be writen by owner of the site(janeg.ca..) so I think it is better to notice that.Ethanethane
D
4

You're confusing compile time complexity with runtime complexity.

When the class is loaded, yes it does a search to see if each literal already exists (although I imagine it would use a hashmap for O(1) lookup instead of your proposal).

When the code runs, it has the reference to the string in memory so there is no additional cost than a non-literal.

So yes, literals are interned. According to the Javadoc for String,

A pool of strings, initially empty, is maintained privately by the class String.

You can invoke intern() on a String to add it to this pool. It logically follows that if a.equals(b) then a.intern() == b.intern(), since .intern() guarantees to return from the a unique pool.

Example:

class InternTest {
    // assuming InternTest is the only class, internPool.size = 0
    String x = "ABC"; // interned at class load, internPool.size = 1
    String y = "DEF"; // interned at class load, internPool.size = 2
    String z = "ABC"; // interned at class load, but match found - size = 2 still

    void foo() {
        // random int is just a mechanism to get something that I know won't
        // be interned at loadtime - could have loaded from file or database too
        int i = (new java.util.Random()).nextInt(1000) + 100;
        int j = i;
        String s = String.valueOf(i); // not yet interned, size = 2 still
        String t = String.valueOf(j); // not yet interned, size = 2 still

        String sIntern = s.intern(); // manually interned, size = 3 now
        String tIntern = t.intern(); // manually interned, match found, size = 3 still

        System.out.println("equals: " + (s.equals(t))); // should be true
        System.out.println("== raw: " + (s == t)); // should be false, different variables
        System.out.println("== int: " + (sIntern == tIntern)); // should be true, from unique pool

       System.out.println("x and z: " + (x == z)); // should be true, interned at class load
    }

    public static void main(String[] args) {
        (new InternTest()).foo();
    }

}

Results when run:

C:\Documents and Settings\glowcoder\My Documents>java InternTest
equals: true
== raw: false
== int: true
x and z: true

I should point out that the assumption will never be true. The Java language itself has many Strings that would be interned before our Strings ever got the chance to see the light of day. However, assuming everything is loaded sequentially, if you consider only the delta of Strings being interned, and assume no collisions with existing interns (we all know interns can be fussy and full of drama, right? snicker) then the numbers do indeed indicate the delta of the size of the string pool.

Divisibility answered 21/2, 2011 at 2:7 Comment(5)
Actually, String interning does happen at runtime (when the class is loaded). But it happens just once per literal String and the complexity is O(1), so it is not a performance concern.Soldiery
When I think about it, that makes sense - without a JVM to load into, how would we have anything to hold the HashMap with internment? Furthermore, because intern() is a native method, it couldn't be done at compile time. I'll update my answer accordingly. Thanks!Divisibility
Thanks for quick answer! I got some further questions.. In your answer, a unique pool,which one is the unique means: 1) each element of the pool is unique 2) the pool is unique. And if there are 2 element in compile time and then declare third string without interning in runtime, the third one is also going the same pool?Ethanethane
First question, there is only ever one pool, and each element in the pool is unique. Second, I'll show a code example in the post.Divisibility
What a great example! Thanks a lot!Ethanethane
S
4

1 - When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning(method intern of class String)?

Yes. This process is called interning. However, it happens just once ... when the class containing the literal is loaded.

2 - In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool.

No it doesn't. The pool is a hash table.

... And if there are many strings in the pool, it may be high cost.

No it won't. The cost of a lookup in the string pool hash table is O(1).

... On this point of view, it might be dangerous to declare many literal strings.

The cost is not significant compared with the other costs of loading and then JIT compiling a class file. There is no performance related "danger" in declaring lots of literal strings.

Obviously, the String objects corresponding to string literals occupy memory "permanently", and you generally don't want to waste memory unnecessarily. But if you need to use those constant strings, they have to be represented somehow. And other ways of representing them either use the memory in other ways, or involve other runtime costs; e.g. the costs of reading them from a file or retrieving them from database.

The benefit of interning string literals is that the heap does not get cluttered up with multiple copies of the same literal string. This is probably not significant for typical SE / EE applications, but for the ME platforms heap memory is at a premium, and it would be a bad thing to waste it.


@RENO asks about the number of times Strings get interned. There are two cases:

  • Explicit calls to String.intern() happen as many (or as few) times as the application chooses to make.

  • For String literals, the javac compiler will ensure that a given .class file does not contain multiple copies of any String literal in its constant pool. This means that a class that has a given literal in many places will only result in the literal being interned once when the class is loaded. However, if you have two classes with the same literal string in their respective source code, they will both have the string value in their respective constant pools, and both intern the string when the respective classes are loaded.

Soldiery answered 21/2, 2011 at 3:21 Comment(7)
Thanks for nice answer! Could you explain more about your explanation: Yes. This process is called interning. However, it happens just once ... when the class containing the literal is loaded. I think that (the number of interning) is the same as (the number of literal strings + the number of explicit intern() method).Ethanethane
Thanks again! While I read your explanation, I found some contrast part with Java Language Specification, 3.10.5 example. In the link page, the result of (Other.hello == hello) is true. I can't get it because you explained: if you have two classes with..., but the result of them are the same. Is there any point I missed?Ethanethane
@Ethanethane - I don't understand your confusion. As the JLS explains, every String literal is interned. Period. I just answered your question about the number of times that intern() needs to be called to achieve this.Soldiery
Um.. I understand from your explanation that classes each have respective constant pools. However, according to the example, Other.hello and hello reference the same thing(and I think they are in the same pool). This is the point that confuse me.Ethanethane
Perhaps you are confusing the "constant pool" section of a classfile with the JVM's "string pool". They are not the same thing at all.Soldiery
Is it specified that a runtime must consolidate identical character sequences that appear in different classes? In some cases, it would seem more efficient not to try to do so [since such consolidation would require building a hash table, whereas doing without it would simply require each class to hold a reference to each literal.Hemicycle
No it is not. The only thing that is specified is that (entire) literal strings will be interned / shared.Soldiery

© 2022 - 2024 — McMap. All rights reserved.