When not to use to_sym in Ruby?
Asked Answered
O

2

2

I have a large dataset from an analytics provider.

It arrives in JSON and I parse it into a hash, but due to the size of the set I'm ballooning to over a gig in memory usage. Almost everything starts as strings (a few values are numerical), and while of course the keys are duplicated many times, many of the values are repeated as well.

So I was thinking, why not symbolize all the (non-numerical) values, as well?

I've found some discusion of potential problems, but I figure it would be nice to have a comprehensive description for Ruby, since the problems seem dependent on the implementation of the interning process (what happens when you symbolize a string).

I found this talking about Java: Is it good practice to use java.lang.String.intern()?

  • The interning process can be expensive
  • Interned strings are never de-allocated, resulting in a memory leak

(Except there's some contention on that last point.)

So, can anyone give a detailed explanation of when not to intern strings in Ruby?

Ottava answered 29/4, 2013 at 22:42 Comment(0)
D
6
  • When the list of things in question is an open set (i.e., dynamic, has no fixed inventory), you should not convert them into symbols. Each symbol created will not be garbage collected, and will cause memory leak.
  • When the list of things in question is a closed set (i.e., static, has a fixed inventory), you should better convert them into symbols. Each symbol will be created only once, and will be reused. That will save memory.
Dryer answered 29/4, 2013 at 22:57 Comment(2)
Can you provide some examples about open set and closed set?Acclaim
I used the terminology in linguistic sense. Added explanation.Dryer
R
2

The interning process can be expensive

there is always a tradeoff between memory and computing power we have to choose. so try some best practices out there and benchmark to figure out what's right for you. a few suggestions I like to mention..

  • symbols are an excellent choice for a hash key

    {name: "my name"}
    
  • Freeze Strings to save memory, try to keep a small string pool

    person[:country] = "USA".freeze
    
  • have fun with Ruby GC tuning.

Interned strings are never de-allocated, resulting in a memory leak

Rotogravure answered 22/12, 2018 at 20:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.