I have a large dataset from an analytics provider.
It arrives in JSON and I parse it into a hash, but due to the size of the set I'm ballooning to over a gig in memory usage. Almost everything starts as strings (a few values are numerical), and while of course the keys are duplicated many times, many of the values are repeated as well.
So I was thinking, why not symbolize all the (non-numerical) values, as well?
I've found some discusion of potential problems, but I figure it would be nice to have a comprehensive description for Ruby, since the problems seem dependent on the implementation of the interning process (what happens when you symbolize a string).
I found this talking about Java: Is it good practice to use java.lang.String.intern()?
- The interning process can be expensive
- Interned strings are never de-allocated, resulting in a memory leak
(Except there's some contention on that last point.)
So, can anyone give a detailed explanation of when not to intern strings in Ruby?