Why is Symbol#to_proc slower in Ruby 1.8.7?
Asked Answered
M

3

4

Relative Performance of Symbol#to_proc in Popular Ruby Implementations states that in MRI Ruby 1.8.7, Symbol#to_proc is slower than the alternative in their benchmark by 30% to 130%, but that this isn't the case in YARV Ruby 1.9.2.

Why is this the case? The creators of 1.8.7 didn't write Symbol#to_proc in pure Ruby.

Also, are there any gems that provide faster Symbol#to_proc performance for 1.8?

(Symbol#to_proc is starting to appear when I use ruby-prof, so I don't think I'm guilty of premature optimization)

Mahican answered 28/6, 2011 at 2:46 Comment(0)
B
7

The to_proc implementation in 1.8.7 looks like this (see object.c):

static VALUE
sym_to_proc(VALUE sym)
{
    return rb_proc_new(sym_call, (VALUE)SYM2ID(sym));
}

Whereas the 1.9.2 implementation (see string.c) looks like this:

static VALUE
sym_to_proc(VALUE sym)
{
    static VALUE sym_proc_cache = Qfalse;
    enum {SYM_PROC_CACHE_SIZE = 67};
    VALUE proc;
    long id, index;
    VALUE *aryp;

    if (!sym_proc_cache) {
        sym_proc_cache = rb_ary_tmp_new(SYM_PROC_CACHE_SIZE * 2);
        rb_gc_register_mark_object(sym_proc_cache);
        rb_ary_store(sym_proc_cache, SYM_PROC_CACHE_SIZE*2 - 1, Qnil);
    }

    id = SYM2ID(sym);
    index = (id % SYM_PROC_CACHE_SIZE) << 1;

    aryp = RARRAY_PTR(sym_proc_cache);
    if (aryp[index] == sym) {
        return aryp[index + 1];
    }
    else {
        proc = rb_proc_new(sym_call, (VALUE)id);
        aryp[index] = sym;
        aryp[index + 1] = proc;
        return proc;
    }
}

If you strip away all the busy work of initializing sym_proc_cache, then you're left with (more or less) this:

aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
    return aryp[index + 1];
}
else {
    proc = rb_proc_new(sym_call, (VALUE)id);
    aryp[index] = sym;
    aryp[index + 1] = proc;
    return proc;
}

So the real difference is the 1.9.2's to_proc caches the generated Procs while 1.8.7 generates a brand new one every single time you call to_proc. The performance difference between these two will be magnified by any benchmarking you do unless each iteration is done in a separate process; however, one iteration per-process would mask what you're trying to benchmark with the start-up cost.

The guts of rb_proc_new look pretty much the same (see eval.c for 1.8.7 or proc.c for 1.9.2) but 1.9.2 might benefit slightly from any performance improvements in rb_iterate. The caching is probably the big performance difference.

It is worth noting that the symbol-to-hash cache is a fixed size (67 entries but I'm not sure where 67 comes from, probably related to the number of operators and such that are commonly used for symbol-to-proc conversions):

id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
/* ... */
if (aryp[index] == sym) {

If you use more than 67 symbols as procs or if your symbol IDs overlap (mod 67) then you won't get the full benefit of the caching.

The Rails and 1.9 programming style involves a lot of shorthands like:

    id = SYM2ID(sym);
    index = (id % SYM_PROC_CACHE_SIZE) << 1;

rather than the longer explicit block forms:

ints = strings.collect { |s| s.to_i }
sum  = ints.inject(0) { |s,i| s += i }

Given that (popular) programming style, it makes sense to trade memory for speed by caching the lookup.

You're not likely to get a faster implementation from a gem as the gem would have to replace a chunk of the core Ruby functionality. You could patch the 1.9.2 caching into your 1.8.7 source though.

Brelje answered 28/6, 2011 at 20:49 Comment(6)
And the caching effectively works by having 67 buckets, and you turn the symbol into a hash and see which bucket it'd go in, and if sym is in that bucket, it uses the proc in that bucket, otherwise it changes that bucket to store that symbol and the appropriate proc?Mahican
@Andrew: The cache has 67 buckets, two array entries per bucket (the first entry is symbol and the second is the proc). And yes, the cache does re-use slots if the symbols don't match up, I'll update the answer to make that clear.Brelje
@Andrew: BTW, thanks for the tour of some of the Ruby internals over the past few days.Brelje
You're welcome. I'm glad that someone is able to answer such questions!Mahican
Caching played a part (adding even ruby-based memoization improves things), but there's also other significant differences between blocks and Symbol#to_proc in 1.8.7.Mahican
@AndrewGrimm This related question might be of interest.Zulazulch
M
4

The following ordinary Ruby code:

if defined?(RUBY_ENGINE).nil? # No RUBY_ENGINE means it's MRI 1.8.7
  class Symbol
    alias_method :old_to_proc, :to_proc

    # Class variables are considered harmful, but I don't think
    # anyone will subclass Symbol
    @@proc_cache = {}
    def to_proc
      @@proc_cache[self] ||= old_to_proc
    end
  end
end

Will make Ruby MRI 1.8.7 Symbol#to_proc slightly less slow than before, but not as fast as an ordinary block or a pre-existing proc.

However, it'll make YARV, Rubinius and JRuby slower, hence the if around the monkeypatch.

The slowness of using Symbol#to_proc isn't solely due to MRI 1.8.7 creating a proc each time - even if you re-use an existing one, it's still slower than using a block.

Using Ruby 1.8 head

Size    Block   Pre-existing proc   New Symbol#to_proc  Old Symbol#to_proc
0       0.36    0.39                0.62                1.49
1       0.50    0.60                0.87                1.73
10      1.65    2.47                2.76                3.52
100     13.28   21.12               21.53               22.29

For the full benchmark and code, see https://gist.github.com/1053502

Mahican answered 29/6, 2011 at 11:26 Comment(5)
So you're saying that 1.9.2's to_proc is also ~2x faster than 1.8? Nice.Brelje
@mu is too short: In 1.9.2, to_proc is even faster than an ordinary block if it's executed enough times. How did you compare 1.9.2 versus 1.8, and how did you account for counfounding factors, such as other parts that are in the benchmark but unrelated to the benchmark?Mahican
I figured you had isolate the source of the speed difference so I was just offering an explanation of where it probably came from. I assumed wrong it seems. The speed difference between Procs and blocks is interesting and unexpected.Brelje
FYI - the RUBY_ENGINE check doesn't work on REE. You'll have to do something like ([1,8,7] ... [1,8,8]).include? RUBY_VERSION.split('.').map{|e|e.to_i} instead.Zulazulch
ugh, the version range check raises an error in 1.9. This should work: numeric_version = RUBY_VERSION.split('.').map{|e|e.to_i}; (numeric_version <=> [1,8,7]) >= 0 and (numeric_version <=> [1,8,8]) < 0Zulazulch
Z
1

In addition to not caching procs, 1.8.7 also creates (approximately) one array each time a proc is called. I suspect it's because the generated proc creates an array to accept the arguments - this happens even with an empty proc that takes no arguments.

Here's a script to demonstrate the 1.8.7 behavior. Only the :diff value is significant here, which shows the increase in array count.

# this should really be called count_arrays
def count_objects(&block)
  GC.disable
  ct1 = ct2 = 0
  ObjectSpace.each_object(Array) { ct1 += 1 }
  yield
  ObjectSpace.each_object(Array) { ct2 += 1 }
  {:count1 => ct1, :count2 => ct2, :diff => ct2-ct1}
ensure
  GC.enable
end

to_i = :to_i.to_proc
range = 1..1000

puts "map(&to_i)"
p count_objects {
  range.map(&to_i)
}
puts "map {|e| to_i[e] }"
p count_objects {
  range.map {|e| to_i[e] }
}
puts "map {|e| e.to_i }"
p count_objects {
  range.map {|e| e.to_i }
}

Sample output:

map(&to_i)
{:count1=>6, :count2=>1007, :diff=>1001}
map {|e| to_i[e] }
{:count1=>1008, :count2=>2009, :diff=>1001}
map {|e| e.to_i }
{:count1=>2009, :count2=>2010, :diff=>1}

It seems that merely calling a proc will create the array for every iteration, but a literal block only seems to create an array once.

But multi-arg blocks may still suffer from the problem:

plus = :+.to_proc
puts "inject(&plus)"
p count_objects {
  range.inject(&plus)
}
puts "inject{|sum, e| plus.call(sum, e) }"
p count_objects {
  range.inject{|sum, e| plus.call(sum, e) }
}
puts "inject{|sum, e| sum + e }"
p count_objects {
  range.inject{|sum, e| sum + e }
}

Sample output. Note how we incur a double penalty in case #2, because we use a multi-arg block, and also call the proc.

inject(&plus)
{:count1=>2010, :count2=>3009, :diff=>999}
inject{|sum, e| plus.call(sum, e) }
{:count1=>3009, :count2=>5007, :diff=>1998}
inject{|sum, e| sum + e }
{:count1=>5007, :count2=>6006, :diff=>999}
Zulazulch answered 10/10, 2013 at 18:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.