Ruby - Compare two Enumerators elegantly
Asked Answered
F

5

9

I've got two long streams of numbers coming from two different sources (binary data) in Ruby (1.9.2).

The two sources are encapsulated in the form of two Enumerators.

I want to check that the two streams are exactly equal.

I've come with a couple solutions, but both seem quite inelegant.

The first one simply transforms both into an array:

def equal_streams?(s1, s2)
  s1.to_a == s2.to_a
end

This works, but it is not very performant, memory-wise, specially if the streams have lots of information.

The other option is... ugh.

def equal_streams?(s1, s2)
  s1.each do |e1|
    begin
      e2 = s2.next
      return false unless e1 == e2 # Different element found
    rescue StopIteration
      return false # s2 has run out of items before s1
    end
  end

  begin
    s2.next
  rescue StopIteration
    # s1 and s2 have run out of elements at the same time; they are equal
    return true
  end

  return false

end

So, is there a simpler, more elegant way of doing this?

Fasciate answered 26/6, 2011 at 19:10 Comment(8)
Enumerable#zip would be more elegant, but will most likely not have any improvement in terms of performance.Squelch
It looks like the high-performance way to handle this in Ruby 1.9 would be to use fibers/coroutines. See infoq.com/news/2007/08/ruby-1-9-fibers.Squelch
@Steve: The problem with zip is that it creates arrays internally, no matter what Enumerable you pass. There's another problem with length of input params.Viniculture
@Mladen: I know -- that's why I said it would most likely not have any improvement in performance.Squelch
@Mladen: I had a look at the C code, and I'm not sure. I've asked about it at #6488247Podium
@Andrew: Perhaps I phrased it bad, zip actually always returns an array of arrays (can be seen in the docs), which means that it will internally "uncoil" any passed Enumerable, even a lazy one (such as generator), into an array. I hadn't actually dived into YARV source. I have implemented something similar which works lazily and returns an Enumerator here, but it's not perfect fit for this question either.Viniculture
@Mladen: 1.times.zip(1.times) {|x,y| puts x, y; 42} prints out 0, 0 and returns nil.Podium
Thanks, saw the other thread, I see I was wrong about zip. Not sure how it can be used though, as it looks to me that block-form of zip can be used only in each-like context, not passed around as a new Enumerable (so it could be for example all?-ed)?Viniculture
M
10

Just a slight refactoring to your code, assuming that your streams do not include an element :eof.

def equal_streams?(s1, s2)
  loop do
    e1 = s1.next rescue :eof
    e2 = s2.next rescue :eof
    return false unless e1 == e2
    return true if e1 == :eof
  end
end

Using a keyword like loop should be faster than using a method like each.

Mcclellan answered 26/6, 2011 at 20:11 Comment(2)
I like this one very much. It didn't occur to me that I could use a "guard" value, such as the :eof symbol in your example. Thanks!Fasciate
Oh -- I had it in my head that we needed a solution enumerables that repeatedly invoke the block passed to #each. So long as we can pull, this is a great solution though. If it had to work with enumerables, then coroutines would be needed.Squelch
P
7

Comparing them one element at a time is probably the best you're going to be able to do but you can do it nicer than your "ugh" solution:

def grab_next(h, k, s)
  h[k] = s.next
rescue StopIteration
end

def equal_streams?(s1, s2)
  loop do
    vals = { }
    grab_next(vals, :s1, s1)
    grab_next(vals, :s2, s2)
    return true  if(vals.keys.length == 0)  # Both of them ran out.
    return false if(vals.keys.length == 1)  # One of them ran out early.
    return false if(vals[:s1] != vals[:s2]) # Found a mismatch.
  end
end

The tricky part is differentiating between just one stream running out and both running out. Pushing the StopIteration exception into a separate function and using the absence of a key in a hash is a fairly convenient way of doing that. Just checking vals[:s1] will cause problems if your stream contains false or nil but checking for the presence of a key solves that problem.

Percentile answered 26/6, 2011 at 20:36 Comment(4)
Thanks for your answer. I liked sawa's a bit more, but +1 for your time!Fasciate
@kikito: That's cool, I upvoted sawa's answer too. Sawa and I are using pretty much the same logic, I'm just using the absence of a hash key to signal the end rather than a sentinel value. If you know that you're never going to have the :eof symbol in your streams then there's no need for the extra machinery to avoid using a sentinel.Percentile
You're right. Mine had a slight possibility of conflict with :eof. +1 for a more general solution than mine.Mcclellan
@sawa: But :eof is probably the best choice for a sentinel and the specific case involves "streams of numbers", if the :eof symbol appears in a stream of numbers then kikito has more problems on his hands than a reserved sentinel value.Percentile
P
2

Here's a shot of doing it by creating an alternative for Enumerable#zip, which works lazily and doesn't create an entire array. It's combining my implementation of Closure's interleave and other two answers here (using sentinel value to indicate end of the Enumerable has been reached - the fact causing the problem is that next rewinds the Enumerable once it reached the end).

This solution supports multiple parameters, so you can compare n structures at once.

module Enumerable
  # this should be just a unique sentinel value (any ideas for more elegant solution?)
  END_REACHED = Object.new

  def lazy_zip *others
    sources = ([self] + others).map(&:to_enum)
    Enumerator.new do |yielder|
      loop do
        sources, values = sources.map{|s|
          [s, s.next] rescue [nil, END_REACHED]
        }.transpose
        raise StopIteration if values.all?{|v| v == END_REACHED}
        yielder.yield values.map{|v| v == END_REACHED ? nil : v}
      end
    end
  end
end

So, when you have variant of zip which works lazily and doesn't stop iteration when the first enumerable reaches the end, you can use all? or any? to actually check corresponding elements for equality.

# zip would fail here, as it would return just [[1,1],[2,2],[3,3]]:
p [1,2,3].lazy_zip([1,2,3,4]).all?{|l,r| l == r}
#=> false

# this is ok
p [1,2,3,4].lazy_zip([1,2,3,4]).all?{|l,r| l == r}
#=> true

# comparing more than two input streams:
p [1,2,3,4].lazy_zip([1,2,3,4],[1,2,3]).all?{|vals|
  # check for equality by checking length of the uniqued array
  vals.uniq.length == 1
}
#=> false
Paramo answered 27/6, 2011 at 7:27 Comment(2)
"which works lazily and doesn't create an entire array" - zip doesn't usually create an entire array either: see #6486607 and #6488247Podium
My comment was linking to this question. Sorry, total fail on my part!Podium
P
1

Following the discussion in the comments, here is zip-based solution, first wrapping block version of zip within an Enumerator, then using it to compare corresponding elements.

It works, but there is already mentioned edge case: if the first stream is shorter than the other, remaining elements from the other will be discarded (see the example below).

I have marked this answer as community wiki as other members could improve on it.

def zip_lazy *enums
  Enumerator.new do |yielder|
    head, *tail = enums
    head.zip(*tail) do |values|
      yielder.yield values
    end
  end
end

p zip_lazy(1..3, 1..4).all?{|l,r| l == r}
#=> true
p zip_lazy(1..3, 1..3).all?{|l,r| l == r}
#=> true
p zip_lazy(1..4, 1..3).all?{|l,r| l == r}
#=> false
Paramo answered 26/6, 2011 at 19:11 Comment(0)
S
0

Here is a 2-source example using a fiber/co-routine. It's a bit long-winded, but it's very explicit about its behavior, which is nice.

def zip_verbose(enum1, enum2)
  e2_fiber = Fiber.new do
    enum2.each{|e2| Fiber.yield true, e2 }
    Fiber.yield false, nil
  end
  e2_has_value, e2_val = true, nil
  enum1.each do |e1_val|
    e2_has_value, e2_val = e2_fiber.resume if e2_has_value
    yield [true, e1_val], [e2_has_value, e2_val]
  end
  return unless e2_has_value
  loop do
    e2_has_value, e2_val = e2_fiber.resume
    break unless e2_has_value
    yield [false, nil], [e2_has_value, e2_val]
  end
end

def zip(enum1, enum2)
  zip_verbose(enum1, enum2) {|e1, e2| yield e1[1], e2[1] }
end

def self.equal?(enum1, enum2)
  zip_verbose(enum1, enum2) do |e1,e2|
    return false unless e1 == e2
  end
  return true
end
Squelch answered 6/7, 2011 at 2:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.