Garbage collection behaviour for String.intern()
Asked Answered
U

4

27

If I use String.intern() to improve performance as I can use "==" to compare interned string, will I run into garbage collection issues? How does the garbage collection mechanism of interned strings differ from normal strings ?

Unabridged answered 12/3, 2010 at 9:14 Comment(2)
This question may be relevant stackoverflow.com/questions/372547Kodok
See also #18153060Cudlip
S
10

In fact, this not a garbage collection optimisation, but rather a string pool optimization. When you call String.intern(), you replace reference to your initial String with its base reference (the reference of the first time this string was encountered, or this reference if it is not yet known).

Prior to Java 7 interned strings were allocated in PermGen space. This would become a garbage collector issue once your string is of no more use in application, since the interned string pool is a static member of the String class and will never be garbage collected. From Java 7 onward the interned strings are allocated on the Heap and are subject to garbage collection.

As a rule of thumb, i consider preferrable to never use this intern method and let the compiler use it only for constants Strings, those declared like this :

String myString = "a constant that will be interned";

This is better, in the sense it won't let you do the false assumption == could work when it won't.

Besides, the fact is String.equals underlyingly calls == as an optimisation, making it sure interned strings optimization are used under the hood. This is one more evidence == should never be used on Strings.

Seleucid answered 12/3, 2010 at 9:25 Comment(16)
I don't get it, why would myString in your example be interned ? Maybe if we mark it as final then it would be interned.Unabridged
@Ravi: the JavaDoc of intern (java.sun.com/javase/6/docs/api) says this: "All literal strings [...] are interned."Preadamite
+1 for "don't ever use == for Strings". That might develop into a bad habit.Foliar
'... the interned string pool is a static member of the String class' No it isn't. String.intern() is a native method. All this is very out of date. Intern'd strings have been GC-able for quite some years now.Finical
agree with EJP and I'll use intern only if I'm sure it will increase the performance significantlyKailakaile
Wow, thanks for the rectification. When I began programming in 1999, Java 1.2 was quite new, and documentation relative to intern was really sparse. Ten years latter, a mental error is fixed !Seleucid
The answer is incorrect since the interned strings are garbage collectedParry
@Parry since which version of the JDK does it happens ?Seleucid
@Seleucid I am confused, interned Strings are gc 'd or not?Lupitalupo
You must update your question and not mislead people, i almost dropped idea of using intern because of this answer. its completely wrong java-performance.info/string-intern-in-java-6-7-8Dyer
@EJP, I believe you are right that interned strings are collected. But what about string literals referenced by interned strings? Are string literals collected by GC or they live until the class is unloaded?Debose
== should never be used on Strings nah, it can be used and it is even used in some places in JDK code, as they are many strings that are guaranteed to be interned, like field names.Bahamas
@EugeneMaysyuk that’s implementation specific, but in case of the commonly used JVMs, the code containing a literal gets permanently linked with the string instance after the first execution, so it will prevent the string from being garbage collected at least until the class gets unloaded, which may only happen when the entire class loader gets unloaded, so in case of classes loaded by the bootstrap loader or application class loader, literals will never get collected.Mainspring
@Holger, would it be possible for you to provide a link to the source code of openjdk where String Pool is defined?Debose
@Holger, Alexander in next answer says that String literals will be garbage collected if the class that defines them is unloaded. Can you comment on that?Debose
@EugeneMaysyuk that’s what I said too in my comment “at least until the class gets unloaded”, but it depends on the implementation and configuration whether class unloading is supported at all. Even when class unloading is supported, only classes whose class loader became unreachable can be unloaded, which implies that all classes of that loader have to be unreachable. That works for modular software which can unload modules loaded by an explicitly created loader, but, as said, software loaded by the bootstrap loader or application class loader never gets unloaded.Mainspring
A
21

String.intern() manages an internal, native-implemented pool, which has some special GC-related features. This is old code, but if it were implemented anew, it would use a java.util.WeakHashMap. Weak references are a way to keep a pointer to an object without preventing it from being collected. Just the right thing for a unifying pool such as interned strings.

That interned strings are garbage collected can be demonstrated with the following Java code:

public class InternedStringsAreCollected {

    public static void main(String[] args)
    {
        for (int i = 0; i < 30; i ++) {
            foo();  
            System.gc();
        }   
    }

    private static void foo()
    {
        char[] tc = new char[10];
        for (int i = 0; i < tc.length; i ++)
            tc[i] = (char)(i * 136757);
        String s = new String(tc).intern();
        System.out.println(System.identityHashCode(s));
    }
}

This code creates 30 times the same string, interning it each time. Also, it uses System.identityHashCode() to show what hash code Object.hashCode() would have returned on that interned string. When run, this code prints out distinct integer values, which means that you do not get the same instance each time.

Anyway, usage of String.intern() is somewhat discouraged. It is a shared static pool, which means that it easily turns into a bottleneck on multi-core systems. Use String.equals() to compare strings, and you will live longer and happier.

Armandinaarmando answered 12/3, 2010 at 13:45 Comment(8)
could you please develop more on why this turns into a bottleneck on multi-core systems or mention a pointer?Strigose
If two threads call String.intern() on two strings which happen to have the same contents, then they must both obtain the same reference. This necessarily implies some sort of communication between the two cores. In practice, String.intern() is implemented with a sort-of hashtable protected by a mutex, and each access (read or write) locks the mutex. There can be contention on that mutex, but most of the slowdown will be due to the necessity for the cores to synchronize their L1 caches (such synchronization is implied by the mutex locking, and is the expensive part).Armandinaarmando
Why can't the interning table just be a ConcurrentHashMap?Plastid
@ThomasPornin, how can you explain the following code snippet then? public static void main(String[] args) { for (int i = 0; i < 30; i ++) { foo(); System.gc(); } } private static void foo() { String str = new String("a").intern(); System.out.println(System.identityHashCode(str)); }Debose
@EugeneMaysyuk two steps: 1.new String("a") create a new instance each time. 2. .intern() do a search in the string pool and found an instance with identical value(which is put into the string pool when you call .intern() first time), and return the reference to the old instance.Aeneus
@EugeneMaysyuk: The "a" string literal is interned, and is kept alive by its use as a string literal, so it never gets GC'ed. All the intern calls return the same string object as the original "a". That's why this answer goes out of its way to construct a string from a char[] instead of a string literal.Exmoor
@Exmoor you mean to say that interned string is only collected by the garbage collector if the class where it was defined is collected and there are no more references to that interned string, right?Debose
@EugeneMaysyuk: Maybe if the class was loaded by a non-bootstrap classloader that gets GC'ed, and the class gets unloaded (which isn't guaranteed to happen). The details of this kind of case are beyond my knowledge, though.Exmoor
S
10

In fact, this not a garbage collection optimisation, but rather a string pool optimization. When you call String.intern(), you replace reference to your initial String with its base reference (the reference of the first time this string was encountered, or this reference if it is not yet known).

Prior to Java 7 interned strings were allocated in PermGen space. This would become a garbage collector issue once your string is of no more use in application, since the interned string pool is a static member of the String class and will never be garbage collected. From Java 7 onward the interned strings are allocated on the Heap and are subject to garbage collection.

As a rule of thumb, i consider preferrable to never use this intern method and let the compiler use it only for constants Strings, those declared like this :

String myString = "a constant that will be interned";

This is better, in the sense it won't let you do the false assumption == could work when it won't.

Besides, the fact is String.equals underlyingly calls == as an optimisation, making it sure interned strings optimization are used under the hood. This is one more evidence == should never be used on Strings.

Seleucid answered 12/3, 2010 at 9:25 Comment(16)
I don't get it, why would myString in your example be interned ? Maybe if we mark it as final then it would be interned.Unabridged
@Ravi: the JavaDoc of intern (java.sun.com/javase/6/docs/api) says this: "All literal strings [...] are interned."Preadamite
+1 for "don't ever use == for Strings". That might develop into a bad habit.Foliar
'... the interned string pool is a static member of the String class' No it isn't. String.intern() is a native method. All this is very out of date. Intern'd strings have been GC-able for quite some years now.Finical
agree with EJP and I'll use intern only if I'm sure it will increase the performance significantlyKailakaile
Wow, thanks for the rectification. When I began programming in 1999, Java 1.2 was quite new, and documentation relative to intern was really sparse. Ten years latter, a mental error is fixed !Seleucid
The answer is incorrect since the interned strings are garbage collectedParry
@Parry since which version of the JDK does it happens ?Seleucid
@Seleucid I am confused, interned Strings are gc 'd or not?Lupitalupo
You must update your question and not mislead people, i almost dropped idea of using intern because of this answer. its completely wrong java-performance.info/string-intern-in-java-6-7-8Dyer
@EJP, I believe you are right that interned strings are collected. But what about string literals referenced by interned strings? Are string literals collected by GC or they live until the class is unloaded?Debose
== should never be used on Strings nah, it can be used and it is even used in some places in JDK code, as they are many strings that are guaranteed to be interned, like field names.Bahamas
@EugeneMaysyuk that’s implementation specific, but in case of the commonly used JVMs, the code containing a literal gets permanently linked with the string instance after the first execution, so it will prevent the string from being garbage collected at least until the class gets unloaded, which may only happen when the entire class loader gets unloaded, so in case of classes loaded by the bootstrap loader or application class loader, literals will never get collected.Mainspring
@Holger, would it be possible for you to provide a link to the source code of openjdk where String Pool is defined?Debose
@Holger, Alexander in next answer says that String literals will be garbage collected if the class that defines them is unloaded. Can you comment on that?Debose
@EugeneMaysyuk that’s what I said too in my comment “at least until the class gets unloaded”, but it depends on the implementation and configuration whether class unloading is supported at all. Even when class unloading is supported, only classes whose class loader became unreachable can be unloaded, which implies that all classes of that loader have to be unreachable. That works for modular software which can unload modules loaded by an explicitly created loader, but, as said, software loaded by the bootstrap loader or application class loader never gets unloaded.Mainspring
P
8

This article provides the full answer.

In java 6 the string pool resides in the PermGen, since java 7 the string pool resides in the heap memory.

Manually interned strings will be garbage-collected.
String literals will be only garbage collected if the class that defines them is unloaded.

The string pool is a HashMap with fixed size which was small in java 6 and early versions of java 7, but increased to 60013 since java 7u40.
It can be changed with -XX:StringTableSize=<new size> and viewed with -XX:+PrintFlagsFinal java options.

Parry answered 12/8, 2016 at 12:57 Comment(0)
K
0

Please read: http://satukubik.com/2009/01/06/java-tips-memory-optimization-for-string/

The conclusion I can get from your information is: You interned too many String. If you really need to intern so many String for performance optimization, increase the perm gen memory, but if I were you, I will check first if I really need so many interned String.

Kailakaile answered 12/3, 2010 at 13:50 Comment(1)
The correct link to @nanda's blog entry seems to be: blog.firdau.si/2009/01/06/…Merely

© 2022 - 2024 — McMap. All rights reserved.