As a comment references, do see: http://java-performance.info/string-intern-in-java-6-7-8/. It is very insightful reference and I learned a lot, however I'm not sure its conclusions are necessarily "one size fits all". Each aspect depends on the needs of your own application - taking measurements of realistic input data is highly recommended!
The main factor probably depends on what you are in control over:
Do you have full control over the choice of GC? In a GUI application for example, there is still a strong case to be made for using Serial GC. (far lower total memory footprint for the process - think 400 MB vs ~1 GB for a moderately complex app, and being much more willing release memory, e.g. after a transient spike in usage). So you might pick that or give your users the option. (If the heap remains small the pauses should not be a big deal).
Do you have full control over the code? The G1GC option is great for 3rd party libraries (and applications!) which you can't edit.
The second consideration (as per @ZhongYu's answer) is that String.intern
can de-duplication the String
objects themselves, whereas G1GC necessarily can only de-duplicate their private char[]
field.
A third consideration may be CPU usage, say if impact on laptop battery life might be of concern to your users. G1GC will run an extra thread dedicated to de-duplicating the heap. For example, I played with this to run Eclipse and found it caused an initial period of increased CPU activity after starting up (think 1 - 2 minutes) but it settled on a smaller heap "in-use" and no obvious (just eye-balling the task manager) CPU overhead or slow-down thereafter. So I imagine a certain % of a CPU core will be taken up on de-duplication (during? after?) periods of high memory-churn. (Of course there may be a comparable overhead if you call String.intern everywhere, which would also runs in serial, but then...)
You probably don't need string de-duplication everywhere. There are probably only certain areas of code that:
- really impact long-term heap usage, and
- create a high proportion of duplicate strings
By using String.intern
selectively, other parts of the code (which may create temporary or semi-temporary strings) don't pay the price.
And finally, a quick plug for the Guava utility: Interner, which:
Provides equivalent behavior to String.intern()
for other immutable types
You can also use that for Strings. Memory probably is (and should be) your top performance concern, so this probably doesn't apply often: however when you need to squeeze every drop of speed out of some hot-spot area, my experience is that Java-based weak-reference HashMap solutions do run slightly but consistently faster than the JVM's C++ implementation of String.intern()
, even after tuning the jvm options. (And bonus: you don't need to tune the JVM options to scale to different input.)
String
class without a second thought. – GruellingString.intern()
norSystem.gc()
-- just let the VM do its work. – Discrepancy