Use PermGen space or roll-my-own intern method?
Asked Answered
F

2

8

I am writing a Codec to process messages sent over TCP using a bespoke wire protocol. During the decode process I create a number of Strings, BigDecimals and dates. The client-server access patterns mean that it is common for the client to issue a request and then decode thousands of response messages, which results in a large number of duplicate Strings, BigDecimals, etc.

Therefore I have created an InternPool<T> class allowing me to intern each class of object. Internally, the pool uses a WeakHashMap<T, WeakReference<T>>. For example:

InternPool<BigDecimal> pool = new InternPool<BigDecimal>();

...

// Read BigDecimal from in buffer and then intern.
BigDecimal quantity = pool.intern(readBigDecimal(in));

My question: I am using InternPool for BigDecimal but should I consider also using it for String instead of String's intern() method, which I believe uses PermGen space? What is the advantage of using PermGen space?

Fluky answered 19/5, 2010 at 11:29 Comment(5)
@kts: If I were to map byte[] to BigDecimal the problem is then that byte[] will not be referenced by anything once the intern pool has created / returned the BigDecimal. Assuming the byte[] is the key in the underlying WeakHashMap this would cause the entry to be removed despite the corresponding BigDecimal being in use.Fluky
Is WeakReference appropriate for this, or should you rather be using a SoftReference? The GC behaves differently for both and this sounds like you are trying to create a kind of cache; weak references are not good use for that purpose. See my answer here for some reasons why: #2861910Shank
@Fluky I would use a SoftReference on the BigDecimal only and a ReferenceQueue to remove byte[]s from the map once a BigDecimal was enqueued. (Probably need a BiMap). This can eliminate the construction of redundant BigDecimal objects saving memory/gc runtime and execution time (only have to construct once).Heliotherapy
Thinking it over, it may be a better idea to intern the byte[] and only convert to BigDecimal once you actually need to use it. This operation too can be cached. This gives the simplicity of byte[] b = pool.intern(getBytes()); with the benefits of lazy construction. In both cases you'll have to read bytes (or whatever you ctor your BigDecimals with) but in this case you will only ctor 1 of each unique BigDecimal.Heliotherapy
@Adamski: you cannot decide to "use PermGen" [sic]. String's intern mechanism relates to String "pooling" but not necessarily to the PermGen memory pool. The PermGen is a Sun VM specific features that doesn't exist at the language/VM specification level.Paralysis
D
3

It is likely that the JVM's String.intern() pool will be faster. AFAIK, it is implemented in native code, so it should be faster and use less space than a pool implemented using WeakHashMap and WeakReference. You would need to do some careful benchmarking to confirm this.

However, unless you have huge numbers of long-lived duplicate objects, I doubt that interning (either in permGen or with your own pools) will make much difference. And if the ratio of unique to duplicate objects is too low, then interning will just increase the number of live objects (making the GC take longer) and reduce performance due the overheads of interning, and so on. So I would also advocate benchmarking the "intern" versus "no intern" approaches.

Dredge answered 19/5, 2010 at 12:14 Comment(2)
Adamski does indeed have huge numbers of long-lived, duplicate objects :-)Sister
@Sister - very clever. The point is that you need to quantify these things to figure out whether interning (by what ever mechanism) improves performance ... or makes it worse. And there are a lot of factors the effect the outcome.Dredge
B
5

If you already have such a InternPool class, it think it is better to use that than to choose a different interning method for Strings. Especially since String.intern() seems to give a much stronger guarantee than you actually need. Your goal is to reduce memory usage, so perfect interning for the lifetime of the JVM is not actually necessary.

Also, I'd use the Google Collections MapMaker to create a InternPool to avoid re-creating the wheel:

Map<BigDecimal,BigDecimal> bigDecimalPool = new MapMaker()
    .weakKeys()
    .weakValues()
    .expiration(1, TimeUnits.MINUTES)
    .makeComputingMap(
      new Function<BigDecimal, BigDecimal>() {
        public BigDecimal apply(BigDecimal value) {
          return value;
        }
      });

This would give you (correctly implemented) weak keys and values, thread safety, automatic purging of old entries and a very simple interface (a simple, well-known Map). To be sure you could also wrap it using Collections.immutableMap() to avoid bad code messing with it.

Barbur answered 19/5, 2010 at 11:39 Comment(6)
OK Thanks. Does String.intern() intern for the life-time of the JVM? I'm not sure this is true as I thought modern VMs garbage collected from PermGen.Fluky
@Joachim - you seem to be implying that an interned String will live for the life of the JVM. This is not guaranteed by the javadocs, and in fact I don't think it is true for recent JVMs.Dredge
@Stephen: I tried not to imply that, as the JavaDoc indeed doesn't state that.Barbur
@Joachim - in that case, I don't understand what you mean by "... seems to give a much stronger guarantee than you actually need".Dredge
BTW, MapMaker#expiration() is now deprecated and should be replaced with MapMaker#expireAfterWrite().Boom
...also, Guava has Interners now, so I'm not sure that the MapMaker method is necessary any more.Boom
D
3

It is likely that the JVM's String.intern() pool will be faster. AFAIK, it is implemented in native code, so it should be faster and use less space than a pool implemented using WeakHashMap and WeakReference. You would need to do some careful benchmarking to confirm this.

However, unless you have huge numbers of long-lived duplicate objects, I doubt that interning (either in permGen or with your own pools) will make much difference. And if the ratio of unique to duplicate objects is too low, then interning will just increase the number of live objects (making the GC take longer) and reduce performance due the overheads of interning, and so on. So I would also advocate benchmarking the "intern" versus "no intern" approaches.

Dredge answered 19/5, 2010 at 12:14 Comment(2)
Adamski does indeed have huge numbers of long-lived, duplicate objects :-)Sister
@Sister - very clever. The point is that you need to quantify these things to figure out whether interning (by what ever mechanism) improves performance ... or makes it worse. And there are a lot of factors the effect the outcome.Dredge

© 2022 - 2024 — McMap. All rights reserved.