Why [20, ..., 13, 14].min(2) => [13, 20]?
Asked Answered
A

2

26
[20, 32, 32, 21, 30, 25, 29, 13, 14].min(2)
# => [13, 20]

Why isn't it [13, 14]? And how do I get what I want, the two smallest elements (in linear time)?

The doc's sentence "If the n argument is given, minimum n elements are returned as an array" isn't quite clear to me, but I think it says min(2) should give me the smallest two elements. I couldn't find much about it, but this thread, which might be the origin, seems to agree with me and say that it should return the same as sort.first(n), which it doesn't:

[20, 32, 32, 21, 30, 25, 29, 13, 14].sort.first(2)
# => [13, 14]

Sorry if dumb question and sorry for the "large" example, but that's already reduced - removing one more number (other than 13 or 14) does give me [13, 14].

Alodi answered 20/8, 2015 at 15:0 Comment(13)
It's a new method in 2.2, I suspect a bug in the implementation that will need to be fixed. In fact many variations of your problematic Array do get the correct answer, including just swapping the order of the 13 and 14.Microampere
Here is the source code for min, in case someone wants to debug it: github.com/ruby/ruby/blob/…Campaign
I have the same problem.I think you are right, and I find min_by(2){|i| i} also returns the wrong result.Christogram
same problem here in the issue tracker i cant fin anything about it bugs.ruby-lang.org/issuesDunois
I don't have the time at the moment to dig in myself, but my money is on this function: github.com/ruby/ruby/blob/…Campaign
@NeilSlater Yeah, I tried quite a few variations as well, with mixed results. I'm new to Ruby... if I report this, should I post it in that feature thread or open a new issue?Alodi
I don't think it is a coincidence that 20 is the third smallest number in this instance. All the variations I can get to exhibit the bug are 8 elements long, with structure [third, x,x,x,x,x, first, second] and return [first,third]Microampere
@NeilSlater Moving the 20 to second or third place (but no further) also gives me [13, 20].Alodi
Same bug for max: [0,0,0,0,0,0,1,3,2].max(2) #=> [3, 1]Keheley
Oh, i doesn't seek the last element! Really, it's interesting bug, we all should fix it :)Considered
I just reported the bug here.Trulatrull
@Trulatrull Great, thanks. I was hoping someone would so I wouldn't have to get into that :-)Alodi
I seem to have found the cause of the bug and a fix for it. Seems to be an issue with this magic number 4 github.com/ruby/ruby/blob/v2_2_3/enum.c#L1272 causes the algorithm to skip the last number when populating dataHerb
L
8

I just posted an explanation for the bug in the Ruby Issue Tracking System:

I suppose I found the problem. Taking the first example:

[20, 32, 32, 21, 30, 25, 29, 13, 14].min(2)

This will call the function "nmin_run" in the file "enum.c", which sets "bufmax" to 4 times the number of minimums (n) we want (for the example, bufmax is 8), and then in the line 1327 will call the function "nmin_i" for each element of the original array.

In the function "nmin_i", when the buffer is full ("data->curlen == data->bufmax"), the function "nmin_filter" is called. In the example, that happens when curlen is 8, and so the buffer is [20, 32, 32, 21, 30, 25, 29, 13]. The "nmin_filter" will do a quicksort until the n smallest elements so far are on the leftmost part of the buffer, and will discard the rest of the elements, which leaves us with [20, 13] in the buffer.

And now starts the problem. At the end of "nmin_filter" the limit (apparently with the intention of storing the greatest value in the buffer) is set to the last value in the buffer (in the example, 13), which is not true. And then based on that value "nmin_i" will discard all remaining elements greater than that (in the example, discarding the 14). The buffer is then sorted and it returns:

[13, 20]

So the solution is either remove all the limit-related part, or take the last pivot as the limit.

Lapel answered 20/8, 2015 at 21:37 Comment(4)
Thanks. I had a look at the code, but it's too much for me. I'll trust you and the upvotes :-). But do I understand correctly that (for n=2) it goes through the array with a buffer of length<=8 and runs the quicksort only on that buffer, so that even if the quicksort degenerated to quadratic all the time, the overall runtime for an array with N elements would still be O(N)? And can you tell the worstcase/average runtime complexity for the general .min(n)?Alodi
Technically, it doesn't do the full quicksort of the buffer, but just the quickselect. A quickselect will be done every time the buffer (which size is 4n) is full, and will leave the n smallest elements. So, there will be a quickselect (with avg. time O(4n) and worst-case O(16n²)) every 3n elements (buffer had already n elements from last quickselect). At the end, min sorts the remaining n elements. Thus, assuming that the length of the original array is L, the avg. time will be O(4/3 L + n log n) and the worst-case O(16/3 n L + n²). Hope I didn't miss anything. :)Lapel
Thanks again, now the algorithm is quite clear to me, and I agree with your runtime analysis. Too bad about the worst case being worse than ideal, but for my fixed n=2 special case even that doesn't matter. And I did some tests with large increasing/decreasing/random arrays and medium to large n and couldn't actually make it as slow as nL would be, so I guess the algorithm somehow avoids it at least for these cases (I didn't see randomization or monotonicity checks or so, but then again I still didn't look too closely).Alodi
@StefanPochmann If performance can be an issue in your case, see my update in the other answer.Lapel
L
2

By the way, answering your question...

And how do I get what I want, the two smallest elements (in linear time)?

If this method did not exist or meanwhile it is broken, you can select the two smallest elements in linear time using Quickselect, which is basically what Ruby does in min under the hood.

Here is my direct translation from Wikipedia:

class Array
  def mymin(n)
    return self.sort if self.size <= n

    a = self.dup
    left = 0
    right = a.size - 1
    loop do
      pivot_index = left + (right - left) / 2;
      pivot_value = a[pivot_index]
      a[pivot_index], a[right] = a[right], a[pivot_index]
      store_index = left
      left.upto(right - 1).each do |i|
         if a[i] < pivot_value
           a[store_index], a[i] = a[i], a[store_index]
           store_index += 1
         end
      end
      a[right], a[store_index] = a[store_index], a[right]
      if n - 1 == store_index
        break
      elsif n - 1 < store_index
        right = store_index - 1
      else
        left = store_index + 1
      end
    end
    a.take(n).sort
  end
end

And then we try your example:

[20, 32, 32, 21, 30, 25, 29, 13, 14].mymin(2)
# => [13, 14]

Yay! We just fixed the min. Beware though, that this implementation has space complexity linear to the size of the original array, while the Ruby implementation is linear to the value n. Also, if your original array has too many duplicates, this will have a bad performance and you should look for
3-way partitioning.


If you only want the min for n = 2 and are really worried about the performance, it is possible to make an optimized version for that case with O(L) guaranteed (assuming L is the length of the array).

class Array
  def min2
    m1 = nil
    m2 = nil
    self.each do |x|
      if m1.nil? || x < m1
        m2 = m1
        m1 = x
      elsif m2.nil? || x < m2
        m2 = x
      end
    end
    [m1, m2].compact
  end
end

And use it in analogous way:

[20, 32, 32, 21, 30, 25, 29, 13, 14].min2
# => [13, 14]
Lapel answered 21/8, 2015 at 8:26 Comment(2)
Thanks, though I really just need the smallest two now, not the smallest n, so writing my own quickselect (which I've done before) is overkill. Especially a version that guarantees linear time (instead of quadratic). I was hoping I was just doing it wrong and there's a right way (hope there will be once the bug is fixed). For now maybe I'll use [...].reduce([]) { |a, n| (a << n).sort.first(2) }. Sorry, I should have made clear that I do know ways to do it myself.Alodi
@StefanPochmann Agreed! That's a good solution for a small n, as in your case. For a big n, that would become nearly quadratic time. Thumbs up for your simple solution.Lapel

© 2022 - 2024 — McMap. All rights reserved.