Is Java's String Intern a flyweight?
Asked Answered
A

6

14

Does the implementation of Java's String memory pool follows flyweight pattern?

Why I have this doubt is, I see that there is no extrinsic state involved in Intern. In GoF I read that there should be a right balance between intrinsic and extrinsic state. But in intern everything is intrinsic.

Or shall we say there is no strict rule with respect to attributes and just sharing objects to reduce memory is sufficient to call it a flyweight.

Please help me understand.

Ahmad answered 25/6, 2012 at 12:12 Comment(1)
I would say if there is no extrinsic context for your objects, then you are just essentially caching. The whole reason the Flyweight pattern is even useful to define, is that people often forget they can at least cache a part of the object that is independent of context and share it.Cassaundracassava
P
5

Irrespective of interning, Java String utilizes the flyweight pattern by sharing the char[] between a string and those derived from it via substring and similar method calls. This has a flipside, though: if you take a small substring of a huge string, the huge char[] will not be eligible for garbage collection.

Note: as of OpenJDK version 1.7.0_06 the above has become obsolete: the code was changed so that the char[] is no longer shared between instances. substring() creates a new array.

Perales answered 25/6, 2012 at 12:19 Comment(7)
holding the intrinsic state in flyweight object and passing extrinsic state information - do we need to worry about this? Because in GoF book, I see more importance attached towards separation of intrinsic/extrinsic. Here in char[] flyweight, what is intrinsic and extrinsic?Ahmad
That's simple -- char[] is entirely intrinsic, and the string the object represents is entirely extrinsic. Using a String you don't even know that the char[] exists.Perales
The HotSpot implementation is finally changing to use a precise length char[] (or presumably byte[]) without offset and length fields. Really have the char[] as a separate allocation should be eliminated as well.Praseodymium
@TomHawtin-tackline This is very interesting. Can you please point me to a write-up on that? I'm interested in the gory details :)Perales
@MarkoTopolnik I don't have a link. It'll be in one of the many OpenJDK mailing lists somewhere...Praseodymium
This answer was true for less than 2 months; as of Java 1.7.0_06 (August 2012), substrings no longer share memory.Maida
@Maida the answer still is valid; it just deserves a bit elaboration. With the change to no longer share an array with substrings, the String(String) constructor was changed to no longer copy the array. Before the change, using this constructor was an undocumented trick to construct a string not sharing the array, to solve the problem of a small (sub-)strings referencing a large array. Since this trick is not needed anymore, you can again construct strings using the same array, though these would be all equal strings, not substrings. Then, there’s String Deduplication in recent JVMsErythema
K
4

Yes the String.intern() implementation follows the flyweight pattern.

As the javadoc says

Returns a canonical representation for the string object. A pool of strings, initially empty, is maintained privately by the class String.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

All literal strings and string-valued constant expressions are interned. String literals are defined in §3.10.5 of the Java Language Specification

The internalized strings reside in the "Perm Gen" space and on string objects returned by .intern() you can use the operator == because .intern() returns always the same object for equal values.

Then remember that .intern() method does not produce leaks, because the JVM today is able garbage the pool.

Try to read this article too.

Knotweed answered 25/6, 2012 at 12:48 Comment(9)
But flyweight is about sharing the object internals. Interning is just caching the whole objects. I don't see a fit here.Perales
My question, "is sharing to save memory alone qualifies to call it flyweight?" irrespective of implementation detail like extrinsic/intrinsic states..Ahmad
Maybe what I read is wrong, but on Wikipedia too the sample returns and cache the whole object (Flyweight pattern). Maybe someone else can clarify the pattern.Knotweed
To explain better: for what I know with the flyweight pattern we share parts/state of the object class between different instances, and it is usefull when I use many instances (the wikipage sample is about a glyph). Then greater is this part I share better it is. And share the whole object is the extreme case and coherent with the pattern. Naturally if I'm wrong, please provide me documentation or web pages about this, I do not want to be stubborn, and I'm happy to "fix" my knowledge.Knotweed
Sharing the whole object is indeed the best, but then your design is simply not covered by the term "flyweight" where the whole point is to have distinct, complex entities that behind the scenes share their complex data. The wikipedia page you reference makes that perfectly clear.Perales
I've studied the code sample on that wp page and indeed it's almost useless. It has a cache of coffee flavors and two arrays. The flyweight aspect only enters as the pair of array elements at the same index: flavors[i].serveCoffee(tables[i]). Here we see that many orders are served by sharing their internal representation of coffee flavors. This is a very bad example of flyweight, especially in terms of educational value.Perales
However the web is full of bad samples about Flyweight Pattern if your are right :(Knotweed
The intent of Flyweight is exactly that of String interning which is to "use sharing to support large numbers of fine-grained objects efficiently". In regards to sharing internals and such, when you use a String you are typically unaware of the internals, of three strings, two may be interned, and one not. Flyweights enable sharing, they don't enforce it.Chessa
@Chessa the intent may be the same, but two patterns can have the same intent (but different trade-offs). According to GoF, almost word for word, the Flyweight pattern utilizes an intrinsic state separate from an instance's extrinsic state.Cassaundracassava
M
3

You have correctly identified that both Interning and Flyweight are based on the same idea: caching and sharing common state.

With a Flyweight, in the extreme case when there is no extrinsic state to store, only the pointer to the intrinsic state remains. Then there is no need for the extrinsic state to even be an object, the pointer itself can be the extrinsic state. That's when Flyweight has become Interning.

Whether Interning "really" is or is not a kind of Flyweight is just a debate over definitions. What matters most is the understanding of how one can be viewed as a specialized instance of the other, so you're good.

Mythify answered 7/12, 2015 at 14:37 Comment(2)
I think the terms intrinsic and extrinsic are reversed in this answer. Intrinsic data is the commonality that is shared. Extrinsic data is the unique context that is not shared.Biskra
@Biskra Oh! You are right, my answer has been wrong for three years. I corrected it.Mythify
B
0

Just like others have stated, String.intern() is all about caching. It returns the reference to already stored string literal in the pool. In this way it is somehow similar to flyweight pattern as it uses the existing objects resulting in lower memory consumption and increased performance (though intern has its own performance overheads of lookup in the string pool too). Hence those two can appear to be similar but they actually are not.

Barayon answered 26/12, 2015 at 17:54 Comment(0)
B
0

No, sharing objects to reduce memory is insufficient to call it a flyweight. In other words, caching is not automatically the flyweight pattern.

I think it would be fair to say that flyweight is a special form of caching, i.e. partial caching; but do note the GoF book does not use the words "cache" or "caching" anywhere in the flyweight chapter (though the terms are used in both the previous and subsequent chapters, facade and proxy, respectively).

A couple of comments in this thread are worth repeating, because they succinctly answer the overall question.

  • If there is no extrinsic context for your objects, then you are just caching. The whole reason the Flyweight pattern is even useful to define, is that people often forget they can at least cache a part of the object that is independent of context and share it.

    --C S

  • Flyweight is about sharing the object internals. Interning is just caching the whole objects.

    --Marko Topolnik

But let's compare String interning to the criteria that the GoF have defined (on page 197).

Apply the Flyweight pattern when all of the following are true:

  • An application uses a large number of objects.
  • Storage costs are high because of the sheer quantity of objects.
  • Most object state can be made extrinsic.
  • Many groups of objects may be replaced by relatively few shared objects once extrinsic state is removed.
  • The application doesn't depend on object identity. Since flyweight objects may be shared, identity tests will return true for conceptually distinct objects.
  1. Clearly, many applications use a large number of Strings, so this criterion passes.
  2. Storing Strings is expensive, at least compared to primitive types, so let's give this criterion a pass.
  3. Here's where we get tripped up: none of a String's state is made extrinsic. This criterion fails.
  4. If we're generous and ignore the part about extrinsic state, we could give this criterion a pass as well, since Strings do tend to be reused.
  5. Anyone who's ever used == to compare Strings in Java knows not to depend on object identity, so this criterion passes.

Well 4/5 passing criteria is pretty good right? Shouldn't that be enough to say that interning/caching and flyweight are the same? No: similar != same. The emphasis on the word all in the GoF quote is theirs, not mine. There is naturally a strong desire to label as many implementations as possible with GoF pattern names, because doing so lends legitimacy to those implementations. (The most egregious cases are the factory patterns, which you can easily find labeling every kind of creational code imaginable; but I digress.) If the patterns are not held to their published definitions, they overlap and lose meaning, defeating a large part of their purpose (common vocabulary).

Lastly, let's analyze the first sentence of the flyweight chapter: what the GoF defines as the Intent of the flyweight pattern.

Use sharing to support large numbers of fine-grained objects efficiently.

I submit that an object with no extrinsic state is not fine-grained, but rather the opposite; so here is a suggested Intent for caching: Use caching to support large numbers of coarse-grained objects efficiently.

Clearly there is similarity between String interning/caching and the Flyweight Pattern; but they are not the same.

Biskra answered 19/10, 2018 at 22:19 Comment(0)
F
-1

Flyweight is about sharing the object immmutables internals . Interning is just caching the whole objects.

Frag answered 19/11, 2012 at 23:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.