Since Java's default string interning has got a lot of bad press, I am looking for an alternative.
Can you suggest an API which is a good alternative to Java string interning? My application uses Java 6. My requirement is mainly to avoid duplicate strings via interning.
Regarding the bad press:
- String intern is implemented via a native method. And the C implementation uses a fixed size of some 1k entries and scales very poorly for large number of strings.
- Java 6 stores interned strings in Perm gen. And therefore are not GC'd and possibly lead to perm gen errors. I know this is fixed in java 7 but I can't upgrade to java 7.
Why do I need to use intering?
- My application is a server app with heap size of 10-20G for different deployments.
- During profiling we have figured that hundrends of thousands of string are duplicates and we can significantly improve the memory usage by avoiding storing duplicate strings.
- Memory has been a bottleneck for us and therefore we are targetting it rather than doing any premature optimization.