How does java implement flyweight pattern for string under the hood?
Asked Answered
I

7

21

If you have two instances of a String, and they are equal, in Java they will share the same memory. How is this implemented under the hood?

EDIT: My application uses a large number of String objects, many of which are identical. What is the best way to make use of Java String constant pool, as to avoid creating custom flyweight implementation?

Infectious answered 26/5, 2010 at 2:51 Comment(0)
R
7

Look at the source code of java.lang.String (the source for entire java api is part of the JDK).

To summarize: A String wraps a subsequence of a char[]. That backing char[] is never modified. This is accomplished by neither leaking nor capturing this char[] outside the String class. However, several Strings can share the same char[] (see Implementation of String.substring).

There is also the mechanism of interning, as explained in the other answers.

Ralph answered 26/5, 2010 at 20:13 Comment(2)
The fact that String.substring does not allocate a new char[] is now no longer true. See this answer.Fenella
That is correct. String no longer implements the flyweight pattern, because the reference sharing is now deemed more expensive than reducing the "weight" of Strings, in part because JVMs have been improved to allocate objects on the stack if escape analysis proves that the object can not outlive the current stack frame - an optimization which relies on the char[] object not being shared.Ralph
D
14

If you have two instances of a String, and they are equal, in Java they will share the same memory

This is actually not 100% true.

This blog post is a decent explanation of why this is so, and what the String constant pool is.

Deodar answered 26/5, 2010 at 3:0 Comment(1)
+1: This answer and Bill the Lizard's answer are actually the ones really addressing the question.Kiarakibble
R
7

Look at the source code of java.lang.String (the source for entire java api is part of the JDK).

To summarize: A String wraps a subsequence of a char[]. That backing char[] is never modified. This is accomplished by neither leaking nor capturing this char[] outside the String class. However, several Strings can share the same char[] (see Implementation of String.substring).

There is also the mechanism of interning, as explained in the other answers.

Ralph answered 26/5, 2010 at 20:13 Comment(2)
The fact that String.substring does not allocate a new char[] is now no longer true. See this answer.Fenella
That is correct. String no longer implements the flyweight pattern, because the reference sharing is now deemed more expensive than reducing the "weight" of Strings, in part because JVMs have been improved to allocate objects on the stack if escape analysis proves that the object can not outlive the current stack frame - an optimization which relies on the char[] object not being shared.Ralph
A
6

String literals are interned in Java, so there's really only one String object with multiple references (when they are equal, which is not always the case). See the java.net article All about intern() for more details.

There's also a good example/explanation in section 3.10.5 String Literals of the JLS that talks about when Strings are interned and when they'll be distinct.

Annul answered 26/5, 2010 at 2:59 Comment(0)
P
5

That's not necessary true. Example:

String s1 = "hello";
String s2 = "hello";
System.out.println(s1 == s2); // true

but:

String s1 = new String("hello");
String s2 = new String("hello");
System.out.println(s1 == s2); // false

Now the second form is discouraged. Some (including me) think that String shouldn't even have a public constructor. A better version of the above would be:

String s1 = new String("hello").intern();
String s2 = new String("hello").intern();
System.out.println(s1 == s2); // true

Obviously you don't need to do this for a constant String. It's illustrative.

The important point about this is that if you're passed a String or get one from a function you can't rely on the String being canonical. A canonical Object satisfies this equality:

a.equals(b) == b.equals(a) == (a == b)

for non-null instances a, b, of a given Class.

Promising answered 26/5, 2010 at 2:59 Comment(2)
A word of warning regarding interning is that it uses PermGen memory, which can result in a very nasty OutOfMemoryError . If string pooling is necessary, a custom pool is often a better choice: hype-free.blogspot.com/2010/03/…Magnetics
From Java 7 on, interned strings are no longer in the PermGen. Se this answer. @MagneticsFenella
T
4

To answer your edited question, Sun JVMs have a -XX:+StringCache option, which in my observation can reduce the memory footprint of a String heavy application significantly.

Otherwise, you have the option of interning your Strings, but I would be careful about that. Strings that are very large and no longer referenced will still use memory for the life of the JVM.

Edit (in response to comment): I first found out about the StringCache option from here:

-XX:+StringCache Enables caching of commonly allocated strings.

Tom Hawtin describes some type of caching to improve some benchmarks. My observation when I put it on IDEA was that the memory footprint (after a full garbage collection) went way down over not having it. It is not a documented parameter, and may indeed just be about optimizing for some benchmarks. My observation is that it helped, but I wouldn't build an important system based on it.

Trentontrepan answered 26/5, 2010 at 3:31 Comment(1)
I tried finding more info on -XX:+StringCache but to no avail. Where can I read more about this option and how it can reduce memory footprint? Do you have more information on what this option does to VM?Infectious
M
2

Two things to be careful about:

  1. Do not use new String("abc") constructor, just use the literal "abc".
  2. Learn to use intern() method in String class. Especially when concatenating strings together or when converting char array/byte array/etc to a String.

intern() returns always strings that are pooled.

Monometallic answered 26/5, 2010 at 5:4 Comment(0)
I
0

If your identical Strings come from a fixed set of possible values, then a Type-Safe Enumeration is what you want here. Not only will it reduce your String count, but it will make for a more solid application. Your whole app will know this String has semantics attached to it, maybe even some convenience methods.

My favorite optimizations are always the ones that can be defended as making the code better, not just faster. And 9 times out of 10, replacing a String with a concrete type leads to more correct and self-documenting code.

Infiltration answered 27/5, 2010 at 17:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.