How does Ruby's Enumerator object iterate externally over an internal iterator?
Asked Answered
S

1

7

As per Ruby's documentation, the Enumerator object uses the each method (to enumerate) if no target method is provided to the to_enum or enum_for methods. Now, let's take the following monkey patch and its enumerator, as an example

o = Object.new
def o.each
    yield 1
    yield 2
    yield 3
end
e = o.to_enum

loop do
  puts e.next
end

Given that the Enumerator object uses the each method to answer when next is called, how do calls to the each method look like, every time next is called? Does the Enumeartor class pre-load all the contents of o.each and creates a local copy for enumeration? Or is there some sort of Ruby magic that hangs the operations at each yield statement until next is called on the enumeartor?

If an internal copy is made, is it a deep copy? What about I/O objects that could be used for external enumeration?

I'm using Ruby 1.9.2.

Samba answered 15/6, 2012 at 19:45 Comment(2)
Just so you know, you use backticks ( ` ) around text to do inline code formatting :)Kiarakibble
muchas gracias! Will keep that in mind for next time.Samba
A
13

It's not exactly magic, but it is beautiful nonetheless. Instead of making a copy of some sort, a Fiber is used to first execute each on the target enumerable object. After receiving the next object of each, the Fiber yields this object and thereby returns control back to where the Fiber was resumed initially.

It's beautiful because this approach doesn't require a copy or other form of "backup" of the enumerable object, as one could imagine obtaining by for example calling #to_a on the enumerable. The cooperative scheduling with fibers allows to switch contexts exactly when needed without the need to keep some form of lookahead.

It all happens in the C code for Enumerator. A pure Ruby version that would show roughly the same behavior could look like this:

class MyEnumerator
  def initialize(enumerable)
    @fiber = Fiber.new do
      enumerable.each { |item| Fiber.yield item }
    end
  end

  def next
    @fiber.resume || raise(StopIteration.new("iteration reached an end"))
  end
end

class MyEnumerable
  def each
    yield 1
    yield 2
    yield 3
  end
end

e = MyEnumerator.new(MyEnumerable.new)
puts e.next # => 1
puts e.next # => 2
puts e.next # => 3
puts e.next # => StopIteration is raised
Adman answered 15/6, 2012 at 21:14 Comment(6)
Nice! I'll read up on fibers in more detail, but are these green threads created by the language to return control to the caller? In other words how is control returned?Samba
@SalmanParacha Wikipedia does a better job at explaining the difference than I ever could. If you want the gory details, the implementation is in cont.c.Adman
@SalmanParacha: Fibers are Ruby's name for coroutines. A coroutine is a generalization of a subroutine: a subroutine always starts to run from the beginning, and it always returns back to the caller. A coroutine runs from the point where it was last stopped and it can "return" (or more precisely transfer control) to any other coroutine, not just the one where it came from.Erkan
@Casper: Think I know what you mean :) The implementation has an effective "lookahead" of one item.Adman
@Adman Yes..right. Sorry I deleted the comment though. I realized I tested it only on the ruby 1.8 native implementation, and have not tested it actually with ruby 1.9 and Fiber. But I would assume it works the same there.Hodgkin
to_a, map, and all the other methods for enumerators are all also defined in terms of each right? From this answer, what I got is that next uses each internally just like the other iterator methods but uses a Fiber to maintain state. Is this correct?Guth

© 2022 - 2024 — McMap. All rights reserved.