Ruby equivalent of C#'s 'yield' keyword, or, creating sequences without preallocating memory

Asked 17/2, 2010 at 16:51 Answered 17/2, 2010 at 17:4

In C#, you could do something like this:

public IEnumerable<T> GetItems<T>()
{
    for (int i=0; i<10000000; i++) {
        yield return i;
    }
}

This returns an enumerable sequence of 10 million integers without ever allocating a collection in memory of that length.

Is there a way of doing an equivalent thing in Ruby? The specific example I am trying to deal with is the flattening of a rectangular array into a sequence of values to be enumerated. The return value does not have to be an Array or Set, but rather some kind of sequence that can only be iterated/enumerated in order, not by index. Consequently, the entire sequence need not be allocated in memory concurrently. In .NET, this is IEnumerable and IEnumerable<T>.

Any clarification on the terminology used here in the Ruby world would be helpful, as I am more familiar with .NET terminology.

EDIT

Perhaps my original question wasn't really clear enough -- I think the fact that yield has very different meanings in C# and Ruby is the cause of confusion here.

I don't want a solution that requires my method to use a block. I want a solution that has an actual return value. A return value allows convenient processing of the sequence (filtering, projection, concatenation, zipping, etc).

Here's a simple example of how I might use get_items:

things = obj.get_items.select { |i| !i.thing.nil? }.map { |i| i.thing }

In C#, any method returning IEnumerable that uses a yield return causes the compiler to generate a finite state machine behind the scenes that caters for this behaviour. I suspect something similar could be achieved using Ruby's continuations, but I haven't seen an example and am not quite clear myself on how this would be done.

It does indeed seem possible that I might use Enumerable to achieve this. A simple solution would be to us an Array (which includes module Enumerable), but I do not want to create an intermediate collection with N items in memory when it's possible to just provide them lazily and avoid any memory spike at all.

If this still doesn't make sense, then consider the above code example. get_items returns an enumeration, upon which select is called. What is passed to select is an instance that knows how to provide the next item in the sequence whenever it is needed. Importantly, the whole collection of items hasn't been calculated yet. Only when select needs an item will it ask for it, and the latent code in get_items will kick into action and provide it. This laziness carries along the chain, such that select only draws the next item from the sequence when map asks for it. As such, a long chain of operations can be performed on one data item at a time. In fact, code structured in this way can even process an infinite sequence of values without any kinds of memory errors.

So, this kind of laziness is easily coded in C#, and I don't know how to do it in Ruby.

I hope that's clearer (I'll try to avoid writing questions at 3AM in future.)

Hezekiah answered 17/2, 2010 at 16:51 Comment(0)

It's supported by Enumerator since Ruby 1.9 (and back-ported to 1.8.7). See Generator: Ruby.

Cliche example:

fib = Enumerator.new do |y|
  y.yield i = 0
  y.yield j = 1
  while true
    k = i + j
    y.yield k
    i = j
    j = k
  end
end

100.times { puts fib.next() }

Fro answered 17/2, 2010 at 16:56 Comment(2)

@Matthew, this looks exactly what I want. Too bad it's Ruby 1.9 as I'm on 1.8.7 at the moment. Will look to see if I can upgrade. If you know of a pre-1.9 approach, I'd like to hear it. – Hezekiah 18/2, 2010 at 1:56

According to this article rubyinside.com/ruby-187-released-912.html the Enumerator sequence support has been back-ported to 1.8.7. Happy days. – Hezekiah 19/2, 2010 at 1:38

Your specific example is equivalent to 10000000.times, but let's assume for a moment that the times method didn't exist and you wanted to implement it yourself, it'd look like this:

class Integer
  def my_times
    return enum_for(:my_times) unless block_given?
    i=0
    while i<self
      yield i
      i += 1
    end
  end
end

10000.my_times # Returns an Enumerable which will let
               # you iterate of the numbers from 0 to 10000 (exclusive)

Edit: To clarify my answer a bit:

In the above example my_times can be (and is) used without a block and it will return an Enumerable object, which will let you iterate over the numbers from 0 to n. So it is exactly equivalent to your example in C#.

This works using the enum_for method. The enum_for method takes as its argument the name of a method, which will yield some items. It then returns an instance of class Enumerator (which includes the module Enumerable), which when iterated over will execute the given method and give you the items which were yielded by the method. Note that if you only iterate over the first x items of the enumerable, the method will only execute until x items have been yielded (i.e. only as much as necessary of the method will be executed) and if you iterate over the enumerable twice, the method will be executed twice.

In 1.8.7+ it has become to define methods, which yield items, so that when called without a block, they will return an Enumerator which will let the user iterate over those items lazily. This is done by adding the line return enum_for(:name_of_this_method) unless block_given? to the beginning of the method like I did in my example.

Amalgamation answered 17/2, 2010 at 16:59 Comment(3)

This answer requires a block. There's no concept of blocks in C#, and the yield statement in C# does something very different. Is there a means to create an arbitrary sequence as a return value from a method? The benefit of having it as an instance is that it can be manipulated, filtered, concatenated, mapped, etc... – Hezekiah 18/2, 2010 at 0:40

I've updated my question to be more explicit. I think the difference in meaning of the yeild keyword between languages caused some confusion. – Hezekiah 18/2, 2010 at 1:30

@Drew: "This answer requires a block." No, it doesn't. Look at my example usage - there is no block. I can do 10000.my_times.first to get 0 (the first element of the enumerator) or 10000.my_times.to_a to get an array of the enumerator's contents. Or I could call any other Enumerable method on it. my_times (without a block) returns an Enumerable which "contains" the yielded items. This is exactly what you asked for. – Amalgamation 18/2, 2010 at 12:7

Without having much ruby experience, what C# does in yield return is usually known as lazy evaluation or lazy execution: providing answers only as they are needed. It's not about allocating memory, it's about deferring computation until actually needed, expressed in a way similar to simple linear execution (rather than the underlying iterator-with-state-saving).

A quick google turned up a ruby library in beta. See if it's what you want.

Anchusin answered 17/2, 2010 at 16:56 Comment(1)

Someone please correct me if I'm wrong, but I believe that Enumerator provides lazy execution, anyway? – Grassland 18/2, 2010 at 0:16

-2

C# ripped the 'yield' keyword right out of Ruby- see Implementing Iterators here for more.

As for your actual problem, you have presumably an array of arrays and you want to create a one-way iteration over the complete length of the list? Perhaps worth looking at array.flatten as a starting point - if the performance is alright then you probably don't need to go too much further.

Sympathize answered 17/2, 2010 at 17:4 Comment(6)

Not likely. C# 2.0 spec was finished December 2002. Ruby 1.9.0 was released December 2007. Moreover, if C# "ripped" it from somewhere, it was CLU, which goes back to 1975. – Fro 17/2, 2010 at 17:15

@Matthew Flaschen: Ruby has had yield since the '90s. It wasn't introduced in 1.9. However, it is quite different from the C# one, even though both are related to iteration. Ruby's yield is just sugar for calling a passed block, while the C# keyword returns an iterator itself. So, for example, the first example in that "iterators" document (threeTimes) would not be implemented using yield in C#. The C# version appears to have come from Python. – Radiology 17/2, 2010 at 17:43

Ah, okay. I misunderstood the Wikipedia article. But my main point was that it's implausible to say C# took the yield keyword from Ruby when CLU's yield predates both of them by decades. – Fro 17/2, 2010 at 17:59

Matz has explicitly stated multiple times that Ruby's iterators are straight from CLU. – Wileen 17/2, 2010 at 18:22

I just gave array flattening as an example. At any rate, I suspect that the flattened array results in a memory spike of X*Y items, which is what I'm trying to avoid. And yes, the C# yield command is completely different -- this was the first thing I checked. – Hezekiah 18/2, 2010 at 0:34

Looking at @Matthew's comment above and his answer, it may be that there's confusion over the yield method and keyword. It seems that the yield method was added in 1.9 along with the Enumerator class. – Hezekiah 18/2, 2010 at 3:38

Recommended topics

Hot tags