How does Ruby Enumerators chaining work exactly?

Asked 3/6, 2021 at 15:22 Answered 3/6, 2021 at 17:54

Consider the following code:

[1,2,3].map.with_index { |x, i| x * i }
# => [0,2,6]

How does this work exactly?

My mental model of map is that it iterates and apply a function on each element. Is with_index somehow passing a function to the enumerator [1,2,3].map, in which case what would that function be?

This SO thread shows how enumerators pass data through, but doesn't answer the question. Indeed, if you replace map with each then the behaviour is different:

[1,2,3].each.with_index { |x, i| x * i }
# => [1,2,3]

map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?

Reclinate answered 3/6, 2021 at 15:22 Comment(1)

I associate the words "Enumerator chaining` with code like c = [1,2,3].chain([4,5,6]). "Method chaining" is the more usual term. – Vansickle 3/6, 2021 at 18:59

Todd's answer is excellent, but I feel like seeing some more Ruby code might be beneficial. Specifically, let's try to write each and map on Array ourselves.

I won't use any Enumerable or Enumerator methods directly, so we see how it's all working under the hood (I'll still use for loops, and those technically call #each under the hood, but that's only cheating a little)

First, there's each. each is easy. It iterates over the array and applies a function to each element, before returning the original array.

def my_each(arr, &block)
  for i in 0..arr.length-1
    block[arr[i]]
  end
  arr
end

Simple enough. Now what if we don't pass a block. Let's change it up a bit to support that. We effectively want to delay the act of doing the each to allow the Enumerator to do its thing

def my_each(arr, &block)
  if block
    for i in 0..arr.length-1
      block[arr[i]]
    end
    arr
  else
    Enumerator.new do |y|
      my_each(arr) { |*x| y.yield(*x) }
    end
  end
end

So if we don't pass a block, we make an Enumerator that, when consumed, calls my_each, using the enumerator yield object as a block. The y object is a funny thing but you can just think of it as basically being the block you'll eventually pass in. So, in

my_each([1, 2, 3]).with_index { |x, i| x * i }

Think of y as being like the { |x, i| x * i } bit. It's a bit more complicated than that, but that's the idea.

Incidentally, on Ruby 2.7 and later, the Enumerator::Yielder object got its own #to_proc, so if you're on a recent Ruby version, you can just do

Enumerator.new do |y|
  my_each(arr, &y)
end

rather than

Enumerator.new do |y|
  my_each(arr) { |*x| y.yield(*x) }
end

Now let's extend this approach to map. Writing map with a block is easy. It's just like each but we accumulate the results.

def my_map(arr, &block)
  result = []
  for i in 0..arr.length-1
    result << block[arr[i]]
  end
  result
end

Simple enough. Now what if we don't pass a block? Let's do the exact same thing we did for my_each. That is, we're just going to make an Enumerator and, inside that Enumerator, we call my_map.

def my_map(arr, &block)
  if block
    result = []
    for i in 0..arr.length-1
      result << block[arr[i]]
    end
    result
  else
    Enumerator.new do |y|
      my_map(arr) { |*x| y.yield(*x) }
    end
  end
end

Now, the Enumerator knows that, whenever it eventually gets a block, it's going to use my_map on that block at the end. We can see that these two functions actually behave, on arrays, like map and each do

my_each([1, 2, 3]).with_index { |x, i| x * i } # [1, 2, 3]
my_map ([1, 2, 3]).with_index { |x, i| x * i } # [0, 2, 6]

So your intuition was spot on

map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?

That's exactly what it does. map creates an Enumerator whose block knows to call map at the end, whereas each does the same but with each. Of course, in reality, all of this is implemented in C for efficiency and bootstrapping reasons, but the fundamental idea is still there.

Riddle answered 3/6, 2021 at 17:54 Comment(0)

Using Array#map without a block simply returns an Enumerator, where each element is then fed to Enumerator#with_index and the results of the block are returned as a collection. It's not complicated, and is similar to (but perhaps cleaner than) the following code. Using Ruby 3.0.1:

results = []
[1, 2, 3].each_with_index { results << _1 * _2 }
results
#=> [0, 2, 6]

Using Array#each doesn't return a collection from the block. It just returns self or another enumerator, so the expected behavior is different by design.

Gellman answered 3/6, 2021 at 15:55 Comment(5)

Array#each can't return anything anyway as it's not in the last position in the chain, right? Sorry I still don't understand how map and each can influence the overall expression since the last item is with_index. Another way to put it is to consider [1,2,3].filter.with_index {|x, i| i == 2} Somehow, the block now makes filtering. How is the block connected to filter in the expression? – Reclinate 3/6, 2021 at 16:15

@JonathanBoccara You're thinking about this wrong. Almost everything in Ruby forms or returns an expression, and that expression returns a value. It just so happens that some values are collections, literals, or enumerators. You need to look at the definitions of each method, and see what they return (which is sometimes dependent on form, e.g. block vs. argument) to understand how they differ. The differences are documented, though. – Gellman 3/6, 2021 at 16:27

Yeah I definitely have a wrong mental model 😅 . So IIUC with_index sort of calls the iteration behaviour of map with a function that yields the block with_index receives? And if you consider [1,2,3].filter.map { |x| x * 2 }, the behaviour of the filter enumerator is never invoked? – Reclinate 3/6, 2021 at 16:51

My guess in that case is that map is an Enumerable method, so you're effectively ignoring the Enumerator aspect of the object and just performing a calculation on the underlying data. with_index is an Enumerator method, which understands how to interact with the original enumerator block. If you do [1, 2, 3].filter.with_index { ... } then the filter still gets run. Although I've frankly never seen that pattern of .filter.map { ... } used before, so I'm not completely sure what it means. There may be some subtlety I'm missing. – Riddle 3/6, 2021 at 18:2

@JonathanBoccara Everyone's brain works a little differently. :) I think your last sentence is spot on: [1,2,3].filter.class #=> Enumerator, so essentially the filter method in [1,2,3].filter.map { |x| x * 2 } just yields the original Array as an Enumerator. Since there's no block argument to #filter, all it does is pass an Enumerator object on to Enumerable#map, which then applies its own block. – Gellman 3/6, 2021 at 18:13

Recommended topics

Hot tags