Is it good practice to use java.lang.String.intern()?
Asked Answered
U

20

196

The Javadoc about String.intern() doesn't give much detail. (In a nutshell: It returns a canonical representation of the string, allowing interned strings to be compared using ==)

  • When would I use this function in favor to String.equals()?
  • Are there side effects not mentioned in the Javadoc, i.e. more or less optimization by the JIT compiler?
  • Are there further uses of String.intern()?
Upmost answered 7/7, 2009 at 8:35 Comment(2)
Calling intern() has its own perform impact, using intern() to improve performance needs to be tested to ensure it really speeds up your program significantly to be worth the extra complexity. You can also use this to reduce memory consumption for large tables with relably repedative values. However, in both cases there are other options which might be better.Pyne
Yes, intern() has its own performance impact. Especially because intern() cost increases linearly as you intern strings and keep a reference to them. At least on a sun/oracle 1.6.0_30 vm.Luu
B
127

When would I use this function in favor to String.equals()

when you need speed since you can compare strings by reference (== is faster than equals)

Are there side effects not mentioned in the Javadoc?

The primary disadvantage is that you have to remember to make sure that you actually do intern() all of the strings that you're going to compare. It's easy to forget to intern() all strings and then you can get confusingly incorrect results. Also, for everyone's sake, please be sure to very clearly document that you're relying on the strings being internalized.

The second disadvantage if you decide to internalize strings is that the intern() method is relatively expensive. It has to manage the pool of unique strings so it does a fair bit of work (even if the string has already been internalized). So, be careful in your code design so that you e.g., intern() all appropriate strings on input so you don't have to worry about it anymore.

(from JGuru)

Third disadvantage (Java 7 or less only): interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.

(from Michael Borgwardt)

Baskin answered 7/7, 2009 at 8:41 Comment(12)
A third disadvantage: interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.Brink
AFAIK newer VMs also garbage collect the PermGen space.Upmost
Yes, but as long as those Strings are not yet eligible for GC, they're taking up scant PermGen space.Brink
Hey, please explain that the "intern" has to calculate the hash code of the string for the first time so it has to iterate over all the string which is more expensive than simply comparing (comparing can end before all the string is iterated). So intern only makes sense to compare if you compare many times the same strings. Aditionally it doesn't seem to be a good programming practice because probably there are nicer ways to improve the performance than degrading the code this way (for me to abandone an abstract solution like equals is degrading).Carrizales
Intern is about memory management, not comparison speed. The difference between if (s1.equals(s2)) and if (i1 == i2) is minimal unless your have a lot of long strings with the same leading characters. In most real-world uses (other than URLs) the strings will differ within the first few characters. And long if-else chains are a code smell anyway: use enums and functor maps.Incardination
you can still use s1.equals syntax throughout your program, DONT use ==, .equals use == internally to short-circuit evaluationMeatus
Michael Borgwardt did NOT say that interned strings can't be garbage collected. And that is a FALSE assertion. What Michael's comments (correctly) say is more subtle than that.Scabrous
in Java 1.7 interned String live in the heap bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931Centeno
Wouldn't string interning have a synchronization cost, to look up the string in the cache of previously interned strings and add it if necessary?Baynebridge
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. Refer: oracle.com/technetwork/java/javase/jdk7-relnotes-418459.htmlOligopoly
when you need speed since you can compare strings by reference (== is faster than equals). I might be wrong, but the first thing equals does is checking if reference is the same. Is there a speed difference being expected from not having the method call?Shogun
@Shogun I thought the same. Take a look here for comparison on bytecode level. stackoverflow.com/a/43613977Cecilececiley
S
195

This has (almost) nothing to do with string comparison. String interning is intended for saving memory if you have many strings with the same content in you application. By using String.intern() the application will only have one instance in the long run and a side effect is that you can perform fast reference equality comparison instead of ordinary string comparison (but this is usually not advisable because it is realy easy to break by forgetting to intern only a single instance).

Seabrooke answered 7/7, 2009 at 8:45 Comment(5)
That's not right. Interning of Strings occurs always, automatically, when each string expression is evaluated. There's always one copy for each unique string of characters used & it's "internally shared" if multiple usages occur. Calling String.intern() does not make this all happen - it just returns the internal canonical representation. See javadoc.Genovese
Need to clarify - interning always happens automatically for compile-time constant Strings (literals & fixed expressions). Additionally it occurs when String.intern() is called on runtime dynamically evaluated Strings.Genovese
So you mean, if there are 1000 objects of "Hello" in Heap and i perform intern() on one of them, then rest 999 objects will be destroyed automatically?Columbus
@ArunRaaj no, you will have your 1000 still on the heap, and an extra one in the intern pool, that can be ready for re-use by later str.intern() when str is "Hello".Walden
Since when are all Strings automatically interned? AFAIK this only happens for string constants (and maybe constant expressions) in source code.Agueweed
B
127

When would I use this function in favor to String.equals()

when you need speed since you can compare strings by reference (== is faster than equals)

Are there side effects not mentioned in the Javadoc?

The primary disadvantage is that you have to remember to make sure that you actually do intern() all of the strings that you're going to compare. It's easy to forget to intern() all strings and then you can get confusingly incorrect results. Also, for everyone's sake, please be sure to very clearly document that you're relying on the strings being internalized.

The second disadvantage if you decide to internalize strings is that the intern() method is relatively expensive. It has to manage the pool of unique strings so it does a fair bit of work (even if the string has already been internalized). So, be careful in your code design so that you e.g., intern() all appropriate strings on input so you don't have to worry about it anymore.

(from JGuru)

Third disadvantage (Java 7 or less only): interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.

(from Michael Borgwardt)

Baskin answered 7/7, 2009 at 8:41 Comment(12)
A third disadvantage: interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.Brink
AFAIK newer VMs also garbage collect the PermGen space.Upmost
Yes, but as long as those Strings are not yet eligible for GC, they're taking up scant PermGen space.Brink
Hey, please explain that the "intern" has to calculate the hash code of the string for the first time so it has to iterate over all the string which is more expensive than simply comparing (comparing can end before all the string is iterated). So intern only makes sense to compare if you compare many times the same strings. Aditionally it doesn't seem to be a good programming practice because probably there are nicer ways to improve the performance than degrading the code this way (for me to abandone an abstract solution like equals is degrading).Carrizales
Intern is about memory management, not comparison speed. The difference between if (s1.equals(s2)) and if (i1 == i2) is minimal unless your have a lot of long strings with the same leading characters. In most real-world uses (other than URLs) the strings will differ within the first few characters. And long if-else chains are a code smell anyway: use enums and functor maps.Incardination
you can still use s1.equals syntax throughout your program, DONT use ==, .equals use == internally to short-circuit evaluationMeatus
Michael Borgwardt did NOT say that interned strings can't be garbage collected. And that is a FALSE assertion. What Michael's comments (correctly) say is more subtle than that.Scabrous
in Java 1.7 interned String live in the heap bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931Centeno
Wouldn't string interning have a synchronization cost, to look up the string in the cache of previously interned strings and add it if necessary?Baynebridge
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. Refer: oracle.com/technetwork/java/javase/jdk7-relnotes-418459.htmlOligopoly
when you need speed since you can compare strings by reference (== is faster than equals). I might be wrong, but the first thing equals does is checking if reference is the same. Is there a speed difference being expected from not having the method call?Shogun
@Shogun I thought the same. Take a look here for comparison on bytecode level. stackoverflow.com/a/43613977Cecilececiley
C
40

String.intern() is definitely garbage collected in modern JVMs.
The following NEVER runs out of memory, because of GC activity:

// java -cp . -Xmx128m UserOfIntern

public class UserOfIntern {
    public static void main(String[] args) {
        Random random = new Random();
        System.out.println(random.nextLong());
        while (true) {
            String s = String.valueOf(random.nextLong());
            s = s.intern();
        }
    }
}

See more (from me) on the myth of non GCed String.intern().

Cimon answered 6/1, 2010 at 11:48 Comment(5)
OutOfMemoryException - no, not the code above, in my brain: link to the javaturning article, which is pointing to this article, which is pointing to the javaturning article, which... :-)Bello
Although you can see that the post was edited to add that link ;)Yt
You may want to mention that you are the author too of the external reference you link to.Acclimate
@Carlos linking a external reference that links back to stackoverflow should cause a.. Stackoverflow :)Airy
@Airy Circular references are easily detected these days :pAlchemize
F
16

I have recently written an article about String.intern() implementation in Java 6, 7 and 8: String.intern in Java 6, 7 and 8 - string pooling.

I hope it should contain enough information about current situation with string pooling in Java.

In a nutshell:

  • Avoid String.intern() in Java 6, because it goes into PermGen
  • Prefer String.intern() in Java 7 & Java 8: it uses 4-5x less memory than rolling your own object pool
  • Be sure to tune -XX:StringTableSize (the default is probably too small; set a Prime number)
Footboard answered 25/8, 2013 at 8:7 Comment(4)
Please don't just post links to your blog, this is considered by some as SPAM. Plus blog links have a notable tendency to die a 404 death. Please either summarize your article inline here, or leave that link in a comment to the question.Sannyasi
Thanks for writing that @mik1! Very informative, clear and up-to-date article. (I came back here intending to post a link to it myself.)Savick
Thanks for mentioning the -XX arg. You can also use this to see the table stats: -XX:+PrintStringTableStatisticsCollenecollet
@Sannyasi as mentioned the blog links seems to have died a 404 deathYaw
R
14

Comparing strings with == is much faster than with equals()

5 Time faster, but since String comparision usually represents only a small percentage of the total execution time of an application, the overall gain is much smaller than that, and the final gain will be diluted to a few percent.

String.intern() pull the string away from Heap and put it in PermGen

String internalized are put in a different storage area : Permanent Generation which is an area of the JVM that is reserved for non-user objects, like Classes, Methods and other internal JVM objects. The size of this area is limited and the is much precious than heap. Being this area smaller than Heap there are more probability to use all the space and get an OutOfMemoryException.

String.intern() string are garbage collected

In the new versions of JVM also internalized string are garbage collected when not referenced by any object.

Keeping in mind the above 3 point you could deduct that String intern() could be useful only in few situation when you do a lot of string comparison, however it is better don't use internal string if you don't know exactly what you are doing ...

Rockyrococo answered 24/9, 2011 at 8:35 Comment(2)
From Java 7, interned strings are in the heap.Nolin
Just to add, Heap memory exceptions can sometimes be recovered from, especially in threaded models such as web applications. When permgen is exhausted, an application will typically be permanently non-functional and often will resource thrash until killed.Bumbailiff
L
7

When would I use this function in favor to String.equals()

Given they do different things, probably never.

Interning strings for performance reasons so that you can compare them for reference equality is only going to be of benefit if you are holding references to the strings for a while - strings coming from user input or IO won't be interned.

That means in your application you receive input from an external source and process it into an object which has a semantic value - an identifier say - but that object has a type indistinguishable from the raw data, and has different rules as to how the programmer should use it.

It's almost always better to create a UserId type which is interned ( it's easy to create a thread-safe generic interning mechanism ) and acts like an open enum, than to overload the java.lang.String type with reference semantics if it happens to be a User ID.

That way you don't get confusion between whether or not a particular String has been interned, and you can encapsulate any additional behaviour you require in the open enum.

Linville answered 7/7, 2009 at 8:58 Comment(0)
P
6

Am not aware of any advantages, and if there were in one would think that equals() would itself use intern() internally (which it doesn't).

Busting intern() myths

Pasquale answered 7/7, 2009 at 8:41 Comment(7)
Despite you saying that you're not aware of any advantages, your posted linked identifies comparison via == as being 5x faster and thus important for text-centric performant codeChloroprene
When you have lots of text-comparing to do you’ll eventually run out of PermGen space. When there is not so much text-comparing to do the speed difference doesn’t matter. Either way, just don’t intern() your strings. It’s not worth it.Epiphenomenalism
It also goes on to say that the overall relative gain is typically going to be small.Pasquale
I don't think that kind of logic is valid. Good link though!Upmost
@DR: what logic? That's one big fallacy. @objects: sorry but your arguments fall short of reasons. There are very good reasons to use intern, and very good reasons that equals doesn't do so by default. The link you posted is complete bollocks. The last paragraph even admits that intern has a valid usage scenario: heavy text processing (e.g. a parser). Concluding that “[XYZ] is dangerous if you don't know what you are doing” is so banal that it physically hurts.Dupion
@Bombe: often in text processing you've got a fixed list of strings that you need to intern (e.g. a list of keywords) and there'll be no danger of running out of PermGen space.Dupion
It would be catastrophically dangerous for String.equals() to call intern() - for one, it's a very expensive call which would needlessly slow down string equality checks, and for another, in Java 6 and earlier would very quickly flood PermGen.Strage
B
5

Are there side effects not mentioned in the Javadoc, i.e. more or less optimization by the JIT compiler?

I don't know about the JIT level, but there is direct bytecode support for the string pool, which is implemented magically and efficiently with a dedicated CONSTANT_String_info struct (unlike most other objects which have more generic representations).

JVMS

JVMS 7 5.1 says:

A string literal is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3) in the binary representation of a class or interface. The CONSTANT_String_info structure gives the sequence of Unicode code points constituting the string literal.

The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String (JLS §3.10.5). In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus, the following expression must have the value true:

("a" + "b" + "c").intern() == "abc"

To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure.

  • If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.

  • Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure; a reference to that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.

Bytecode

It is also instructive to look at the bytecode implementation on OpenJDK 7.

If we decompile:

public class StringPool {
    public static void main(String[] args) {
        String a = "abc";
        String b = "abc";
        String c = new String("abc");
        System.out.println(a);
        System.out.println(b);
        System.out.println(a == c);
    }
}

we have on the constant pool:

#2 = String             #32   // abc
[...]
#32 = Utf8               abc

and main:

 0: ldc           #2          // String abc
 2: astore_1
 3: ldc           #2          // String abc
 5: astore_2
 6: new           #3          // class java/lang/String
 9: dup
10: ldc           #2          // String abc
12: invokespecial #4          // Method java/lang/String."<init>":(Ljava/lang/String;)V
15: astore_3
16: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
19: aload_1
20: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
23: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
26: aload_2
27: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: aload_3
35: if_acmpne     42
38: iconst_1
39: goto          43
42: iconst_0
43: invokevirtual #7          // Method java/io/PrintStream.println:(Z)V

Note how:

  • 0 and 3: the same ldc #2 constant is loaded (the literals)
  • 12: a new string instance is created (with #2 as argument)
  • 35: a and c are compared as regular objects with if_acmpne

The representation of constant strings is quite magic on the bytecode:

and the JVMS quote above seems to say that whenever the Utf8 pointed to is the same, then identical instances are loaded by ldc.

I have done similar tests for fields, and:

  • static final String s = "abc" points to the constant table through the ConstantValue Attribute
  • non-final fields don't have that attribute, but can still be initialized with ldc

Bonus: compare that to the Integer pool, which does not have direct bytecode support (i.e. no CONSTANT_String_info analogue).

Bangweulu answered 14/2, 2016 at 22:51 Comment(0)
F
4

Daniel Brückner is absolutely right. String interning is meant to save memory (heap). Our system currently have a giant hashmap for holding certain data. As system scales, the hashmap will be big enough to make the heap out of memory (as we've tested). By interning all the duplicated strings all the objects in the hashmap, it saves us a significant amount of heap space.

Also in Java 7, interned strings no long live in PermGen but heap instead. So you don't need to worry about its size and yes it gets garbage collected:

In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.

Facility answered 25/4, 2013 at 17:2 Comment(1)
I have to second that: on my software, a heap dump showed that most heap space was used by String instances. When looking at their content, I saw many duplicates and decided to switch to intern(), which saved hundreds of MB.Walden
O
2

I would examine intern and ==-comparison instead of equals only in the case of equals-comparison being bottleneck in multiple comparisons of string. This is highly unlikely to help with small number of comparisons, because intern() is not free. After aggressively interning strings you will find calls to intern() getting slower and slower.

Original answered 7/7, 2009 at 8:46 Comment(0)
D
2

An kind of memory leak can come from the use of subString() when the result is small compared to the source string and the object has a long life.

The normal solution is to use new String( s.subString(...)) but when you have a class that stores the result of a potential/likely subString(...) and have no control over the caller, you might consider to store the intern() of the String arguments passed to the constructor. This releases the potential large buffer.

Dismantle answered 16/7, 2012 at 9:20 Comment(3)
Interesting, but perhaps this is implementation dependent.Japonica
The above mentioned potential memory leak does not happen in java 1.8 and 1.7.06 (and newer) see Changes to String internal representation made in Java 1.7.0_06.Dismantle
that confirms micro-optimizations are to be applied only when necessary after a performance and/or memory profiling. Thank you.Japonica
B
2

String interning is useful in the case where the equals() method is being invoked often because the equals() method does a quick check to see if the objects are the same at the beginning of the method.

if (this == anObject) {
    return true;
}

This usually occurs on when searching through a Collection though other code may also do string equality checks.

There is a cost involved to interning though, I performed a microbenchmark of some code and found that the interning process increases the runtime by a factor of 10.

The best place to do the interning is usually when you are reading keys that are stored outside of the code as strings in the code are automatically interned. This would normally happen at the initialization stages of your application in order to prevent the first-user penalty.

Another place where it can be done is when processing user input that could be used to do key lookups. This normally happens in your request processor, note that the interned strings should be passed down.

Aside from that there isn't much point doing interning in the rest of the code as it generally won't give any benefit.

Barnstorm answered 3/8, 2014 at 16:30 Comment(0)
P
1

I would vote for it not being worth the maintenance hassle.

Most of the time, there will be no need, and no performance benefit, unless you're code does a lot of work with substrings. In which case the String class will use the original string plus an offset to save memory. If your code uses substrings a lot, then I suspect that it'll just cause your memory requirements to explode.

Pole answered 7/7, 2009 at 9:27 Comment(0)
A
1

http://kohlerm.blogspot.co.uk/2009/01/is-javalangstringintern-really-evil.html

asserts that String.equals() uses "==" to compare String objects before, according to

http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html

it compares the lengths of Strings, and then the contents.

(By the way, product code strings in a sales catalogue are liable to be all the same length - BIC0417 is a bicycist's safety helmet, TIG0003 is a live adult male tiger - you probably need all sorts of licences to order one of those. And maybe you better order a safety helmet at the same time.)

So it sounds as though you get a benefit from replacing your Strings with their intern() version, but you get safety - and readability and standard compliance - -without- using "==" for equals() in your programming. And most of what I'm going to say depends on that being true, if it is true.

But does String.equals() test that you passed it a String and not some other object, before using "==" ? I'm not qualified to say, but I would guess not, because overwhelmingly most such equals() operations will be String to String, so that test is almost always passed. Indeed, prioritising "==" inside String.equals() implies a confidence that you frequently are comparing the String to the same actual object.

I hope no one is surprised that the following lines produce a result of "false":

    Integer i = 1;
    System.out.println("1".equals(i));

But if you change i to i.toString() in the second line, of course it's true.

Venues where you might hope for a benefit from interning include Set and Map, obviously. I hope that interned strings have their hashcodes cached... I think that would be a requirement. And I hope I haven't just given away an idea that could earn me a million dollars. :-)

As for memory, it's also obvious that that is an important limit if your volume of Strings is large, or if you want the memory used by your program code to be very small. If your volume of -distinct- Strings is very large, then it may be time to consider using dedicated database program code to manage them, and a separate database server. Likewise, if you can improve a small program (that needs to run in 10000 instances simultaneously) by having it not store its Strings itself at all.

It feels wasteful to create a new String and then discard it straight away for its intern() substitute, but there isn't a clear alternative, except for keeping the duplicate String. So really the execution cost is of searching for your string in the intern pool and then allowing the garbage collector to dispose of the original. And if it's a string literal then it comes intern-ed already anyway.

I am wondering whether intern() can be abused by malicious program code to detect whether some String and their object references already exist in the intern() pool, and therefore exist elsewhere in the Java session, when that shouldn't be known. But that would only be possible when the program code is already being used in a trusting way, I guess. Still, it is something to consider about the third-party libraries that you include in your program to store and remember your ATM PIN numbers!

Asocial answered 26/4, 2012 at 12:16 Comment(0)
C
0

I am using intern to save memory, I hold a large amount of String data in memory and moving to use intern() saved a massive amount of memory. Unfortunately although it use alot less memory the memory it does use is stored in PermGen memory not Heap and it is difficult to explain to customers how to increase the allocation of this type of memory.

So is there an alternative to intern() for reducing memory consumption, (the == versus equals performance benefits is not a aissue for me)

Crum answered 7/7, 2009 at 8:35 Comment(0)
B
0

The real reason to use intern is not the above. You get to use it after you get out-of-memory error. Lots of the string in a typical program are String.substring() of other big string [think of taking out a user-name from a 100K xml file. The java implementation is that , the substring holds a reference to the original string and the start+end in that huge string. (The thought behind it is a reuse of the same big string)

After 1000 big files , from which you only save 1000 short names , you will keep in memory the whole 1000 files! Solution: in this scenario just use smallsubstring.intern()

Bomarc answered 22/7, 2010 at 20:25 Comment(1)
Why not just create a new string from the substring if you need it?Acclimate
T
0

Let's face it: the main use-case scenario is when you read a stream of data (either through an input stream, or from a JDBC ResultSet) and there is a myriad of little Strings that are repeated all throughout.

Here is a little trick that gives you some control over what kind of mechanism you'd like to use to internalize Strings and other immutables, and an example implementation:

/**
 * Extends the notion of String.intern() to different mechanisms and
 * different types. For example, an implementation can use an
 * LRUCache<T,?>, or a WeakHashMap.
 */
public interface Internalizer<T> {
    public T get(T obj);
}
public static class LRUInternalizer<T> implements Internalizer<T> {
    private final LRUCache<T, T> cache;
    public LRUInternalizer(int size) {
        cache = new LRUCache<T, T>(size) {
            private static final long serialVersionUID = 1L;
            @Override
            protected T retrieve(T key) {
                return key;
            }
        };
    }
    @Override
    public T get(T obj) {
        return cache.get(obj);
    }
}
public class PermGenInternalizer implements Internalizer<String> {
    @Override
    public String get(String obj) {
        return obj.intern();
    }
}

I use that often when I read fields from streams or from ResultSets. Note: LRUCache is a simple cache based on LinkedHashMap<K,V>. It automatically calls the user-supplied retrieve() method for all cache misses.

The way to use this is to create one LRUInternalizer before your read (or reads), use it to internalize Strings and other small immutable objects, then free it. For example:

Internalizer<String> internalizer = new LRUInternalizer(2048);
// ... get some object "input" that stream fields
for (String s : input.nextField()) {
    s = internalizer.get(s);
    // store s...
}
Thong answered 14/8, 2012 at 21:53 Comment(0)
Q
0

I am using it in order to cache the contents of approximately 36000 codes which link to associated names. I intern the strings in the cache because many of the codes point to the same string.

By interning the strings in my cache, I am ensuring that codes that point to the same string actually point to the same memory, thereby saving me RAM space.

If the interned strings were actually garbage collected, it would not work for me at all. This would basically negate the purpose of interning. Mine won't be garbage collected because I am holding a reference to each and every string in the cache.

Quintic answered 16/8, 2013 at 0:20 Comment(1)
No, all interned equal strings that are in memory at a certain time, will still be the same one object. It will a different object than the equal string that was in memory before it was garbage collected. But this is no problem because the old string is no longer there.Nonproductive
D
0

The cost of interning a string is much more than the time saved in a single stringA.equals(B) comparison. Only use it (for performance reasons) when you are repeatedly using the same unchanged string variables. For example if you regularly iterate over a stable list of strings to update some maps keyed on the same string field you can get a nice saving.

I would suggest using string interning to tweak performance when you are optimising specific parts of your code.

Also remember that String are immutable and don't make the silly mistake of

String a = SOME_RANDOM_VALUE
a.intern()

remember to do

String a = SOME_RANDOM_VALUE.intern()
Deed answered 16/9, 2013 at 4:16 Comment(0)
N
0

If you are looking for an unlimited replacement for String.intern, also garbage collected, the following is working well for me.

private static WeakHashMap<String, WeakReference<String>> internStrings = new WeakHashMap<>();
public static String internalize(String k) {
    synchronized (internStrings) {
        WeakReference<String> weakReference = internStrings.get(k);
        String v = weakReference != null ? weakReference.get() : null;
        if (v == null) {
            v = k;
            internStrings.put(v, new WeakReference<String>(v));
        }
        return v;
    }
}

Of course, if you can roughly estimate how many different strings there will be, then simply use String.intern() with -XX:StringTableSize=highEnoughValue.

Nonproductive answered 4/11, 2016 at 14:22 Comment(2)
SoftRef would make more sence.Ackerley
@Ackerley By using WeakReference (instead of SoftReference) memory is freed earlier so other allocations might go faster. It depends on what else the application is doing, either one could make sense.Nonproductive

© 2022 - 2024 — McMap. All rights reserved.