How does Ruby's max function order duplicates?
Asked Answered
B

1

8

I've been looking at the max method in Ruby's Enumerable mixin (v2.4.1).

It's a fairly straightforward method, but how it orders items when duplicates are present is a bit confusing.

For example:

x = [1,2,3,4,5,6,7,8,9]
x.max {|a, b| a%2 <=> b%2}
=> 1
10.times{|y| p x.max(y) {|a, b| a%2 <=> b%2}}
[]
[1]
[1, 7] # why is 7 the next element after 1?
[3, 1, 5] # why no more 7?
[7, 3, 1, 5] # 7 is now first
[9, 7, 3, 1, 5]
[9, 7, 3, 1, 5, 6]
[9, 7, 3, 1, 5, 4, 6]
[9, 7, 3, 1, 5, 2, 4, 6]
[9, 7, 5, 3, 1, 8, 6, 4, 2] # order has changed again (now seems more "natural")

How is 7 chosen as the second item? Why is it not chosen at all when three values are taken?

If you take even more numbers, the ordering is not consistent (though the items in the set are).

I have taken a glance at the source code, but it seems to be doing normal comparison; the ordering seen here isn't evident from that code.

Can anyone explain how this ordering is achieved? I know that the orderings above are all "valid", but how are they generated?

Bodice answered 18/9, 2017 at 19:21 Comment(9)
The block that max takes in uses 2 variables which can be used to compare the objects; you should be checking |a, b| (a%2) <=> (b%2) or something like that. Also please don't use screenshots of output - copy/paste the actual output into a code block insteadExtrabold
My guess is that the order in which it compares items is not guaranteed?Extrabold
@Extrabold probably, but that doesn't matter much here. The order show is deterministic. I want to know how the source code gives rise to this peculiar ordering.Bodice
Let us continue this discussion in chat.Bodice
Your ordering relation is really strange. Basically, all odd numbers are greater than all even numbers, but both all even and all odd numbers are equal. Since 1, 3, 5, 7, and 9 are all equal, they are all the maximum value, and therefore any arbitrary permutation is "the maximum".Decrial
@JörgWMittag yes, this example has many "duplicate maxes" that are distinguishable from one another to show Ruby's weird ordering.Bodice
What "weird" ordering? There are 5 "maximum" values in your array that are all equal. There is no ordering between them. 1, 3, 5, 7, and 9 are all equal, and they are all the maximum. So, it is legal to return any one of them as the maximum … which is precisely what happens. There is no weirdness, except your ordering relation.Decrial
@JörgWMittag ... You are missing the point. Any permutation is valid, but why is the specific one chosen? What implementation gave rise to it? The permutation is "weird" because it is hard to see how it came to be.Bodice
Maybe this should be tagged with C, since it is essentially a question about an algorithm coded in that language.Bedard
R
5

Your example can be simplified by using max_by to yield the similar result:

10.times{|y| p x.max_by(y) {|t| t%2}}

I have spent some time with the source but can't find any hole.

After I remember seeing a publication called Switch: A Deep Embedding of Queries into Ruby (dissertation by Manuel Mayr) I found the answer.

Where on page 104 you can find the answer for max_by:

... Here, the value in the input list that assumes the maximum when evaluated by the function is returned. If more than one value yields the maximum, the choice for a result among these values is arbitrary. ...


Similarly for:
sort & sort_by from the comments @ emu.c

The result is not guaranteed to be stable. When two keys are equal, the order of the corresponding elements is unpredictable.

First, Second Edit - "we need to go deeper" => I hope you will enjoy the "ride".


The short answer:
The reason for the sorting looking like it does is combination of max_by block(causes to start sorting with max values from %2 which is 1 then it continues with 0) and qsort_r (BSD quick-sort) implemented @ruby.



The long answer: All is based on the source code of ruby 2.4.2 or currently 2.5.0 (which is being developed right now).

The quick-sort algorithm can differ based on the compiler you are using. You can be using qsort_r: GNU version, BSD version (you can check the configure.ac) for more. The visual studio is using BSD version from 2012 or later.

+Tue Sep 15 12:44:32 2015  Nobuyoshi Nakada  <[email protected]>
+
+   * util.c (ruby_qsort): use BSD-style qsort_r if available.


Thu May 12 00:18:19 2016  NAKAMURA Usaku  <[email protected]>

    * win32/Makefile.sub (HAVE_QSORT_S): use qsort_s only for Visual Studio
      2012 or later, because VS2010 seems to causes a SEGV in
test/ruby/test_enum.rb.
  1. If you have GNU qsort_r but not BSD: Only internal ruby_qsort implementation is used. Check util.c for internal implementation of the quick-sort (ruby_qsort(void* base, const size_t nel, const size_t size, cmpfunc_t *cmp, void *d)) function by Tomoyuki Kawamura. @util.h

    If HAVE_GNU_QSORT_R=1 then #define ruby_qsort qsort_r:

    #ifdef HAVE_GNU_QSORT_R
    #define ruby_qsort qsort_r
    #else    void ruby_qsort(void *, const size_t, const size_t,
        int (*)(const void *, const void *, void *), void *);
    #endif
    
  2. If the BSD-style is detected: Then the below code is used (can be found at util.c). Notice how the cmp_bsd_qsort is called before ruby_qsort. The reason? Probably standardization, stack space and maybe speed (did not test it myself - would have to create benchmark, and that is quite time consuming).

The saving stack space is indicated in the BSD qsort.c source code:

    /*
    * To save stack space we sort the smaller side of the partition first
    * using recursion and eliminate tail recursion for the larger side.
    */

The BSD branch at the ruby source code:

     #if defined HAVE_BSD_QSORT_R
    typedef int (cmpfunc_t)(const void*, const void*, void*);

    struct bsd_qsort_r_args {
        cmpfunc_t *cmp;
        void *arg;
    };

    static int
    cmp_bsd_qsort(void *d, const void *a, const void *b)
    {
        const struct bsd_qsort_r_args *args = d;
        return (*args->cmp)(a, b, args->arg);
    }

    void
    ruby_qsort(void* base, const size_t nel, const size_t size, cmpfunc_t *cmp, void *d)
    {
        struct bsd_qsort_r_args args;
        args.cmp = cmp;
        args.arg = d;
        qsort_r(base, nel, size, &args, cmp_bsd_qsort);
    }

If you are using MSYS2 to compile your ruby on Windows (no more DevKit but MSYS2 for windows installer, which I'm using most of the time ) NetBSD version of qsort_r (which is there from 02-07-2012). The latest NetBSD qsort.c (revision:1.23).

Now for the real-life examples - "we need to go even deeper"

The tests will be performed on two (windows) rubies:

  • First ruby: will be based on DevKit version 2.2.2p95 (which was released on 13 Apr 2015) and does not contain the BSD qsort implementation.

  • Second ruby: will be based on MSYS2 tool-chain version ruby 2.4.2-p198 (which was released on 15th Sep 2017) and does contain patch (see above) for BSD qsort implementation.

The code:

x=[1,2,3,4,5,6,7,8,9]
10.times{|y| p x.max_by(y) {|t| t%2}}

Ruby 2.2.2p95:

The result:
[]
[5]
[7, 1]
[3, 1, 5]
[7, 3, 1, 5]
[9, 7, 3, 1, 5]
[5, 9, 1, 3, 7, 6]
[5, 1, 9, 3, 7, 6, 4]
[5, 1, 3, 7, 9, 6, 4, 2]
[9, 1, 7, 3, 5, 4, 6, 8, 2]

Ruby 2.4.2-p198:

The result:
[]
[1]
[7, 1]
[5, 3, 1]
[5, 7, 3, 1]
[5, 9, 7, 3, 1]
[5, 1, 9, 7, 3, 6]
[5, 1, 3, 9, 7, 4, 6]
[5, 1, 3, 7, 9, 2, 6, 4]
[9, 1, 3, 7, 5, 8, 4, 6, 2]

Now for different x: x=[7,9,3,4,2,6,1,8,5]

Ruby 2.2.2p95:

The result:
[]
[1]
[9, 7]
[1, 7, 3]
[5, 1, 7, 3]
[5, 1, 3, 9, 7]
[7, 5, 9, 3, 1, 2]
[7, 9, 5, 3, 1, 2, 4]
[7, 9, 3, 1, 5, 2, 4, 8]
[5, 9, 1, 3, 7, 4, 6, 8, 2]

Ruby 2.4.2-p198:

The result:
[]
[9]
[9, 7]
[3, 1, 7]
[3, 5, 1, 7]
[7, 5, 1, 3, 9]
[7, 9, 5, 1, 3, 2]
[7, 9, 3, 5, 1, 4, 2]
[7, 9, 3, 1, 5, 8, 2, 4]
[5, 9, 3, 1, 7, 2, 4, 6, 8]

Now for the same items in the source array (qsort is unstable see below): x=[1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Process it with the following code: 12.times{|y| p x.max_by(y) {|t| t%2}}

Ruby 2.2.2p95:

The result:
[]
[3]
[1, 1]
[9, 1, 7]
[3, 9, 1, 7]
[5, 3, 9, 1, 7]
[1, 5, 3, 9, 1, 7]
[5, 9, 3, 7, 1, 1, 1]
[1, 5, 9, 1, 7, 1, 3, 4]
[1, 1, 5, 9, 1, 7, 3, 4, 2]
[1, 1, 1, 5, 7, 3, 9, 4, 2, 8]
[9, 1, 7, 1, 5, 3, 1, 2, 6, 8, 4]

Ruby 2.4.2-p198:

The Result:
[]
[1]
[1, 1]
[7, 9, 1]
[7, 3, 9, 1]
[7, 5, 3, 9, 1]
[7, 1, 5, 3, 9, 1]
[1, 5, 9, 3, 7, 1, 1]
[1, 1, 5, 9, 3, 7, 1, 4]
[1, 1, 1, 5, 9, 3, 7, 2, 4]
[1, 7, 3, 1, 5, 9, 1, 2, 4, 8]
[9, 3, 1, 7, 1, 5, 1, 2, 8, 6, 4]

Now for the big question --> Now why are the results different?

The first obvious answer would be that when using GNU or BSD implementations the result would be different? Right? Well the implementations are different, but yielding (check the linked implementations for details) same results. The core of the issue is elsewhere.

The real issue here the algorithm itself. When quick-sort is used then what you get is unstable sorting (when you are comparing the two equal values their order does not remain the same). When you have your [1,2,3,4,5,6,7,8,9] which you then transform in the block to [1,0,1,0,1,0,1,0,1] with max(_by) you are sorting the array into [1,1,1,1,1,0,0,0,0]. You start with 1, but which one? Well then you get unpredictable result. (max(_by) is the reason why are getting the odd numbers first and then the even ones).

See GNU qsort comment:

Warning: If two objects compare as equal, their order after sorting is unpredictable. That is to say, the sorting is not stable. This can make a difference when the comparison considers only part of the elements. Two elements with the same sort key may differ in other respects.

Now to sort it like the engine does:

[1,2,3,4,5,6,7,8,9] -> The first that are taken in account are the odd numbers [1,3,5,7,9] and those are considered equal with max_by{|t| t%2} produces [1,1,1,1,1].

Conclusion:

Now which one to take? Well it is unpredictable in your case it was the ones you got. I'll get different ones even for the same ruby version as the underlying quick-sort algorithm is unstable by its nature.

Roselinerosella answered 3/10, 2017 at 13:57 Comment(3)
@River: I agree so I improved a little bit the answer.Roselinerosella
This is great, I was planning on doing something similar myself, but never got around to it.Bodice
@river: I'm happy that you like it. It was harder then I have expected. There are lots of macros along the way. I have encountered even some buggy comments and hacks.Roselinerosella

© 2022 - 2024 — McMap. All rights reserved.