When does TimSort complain about broken comparator?
Asked Answered
K

2

5

Java 7 changed the sorting algorithm such that it throws an

java.lang.IllegalArgumentException: "Comparison method violates its general contract!"

in some cases when the used comparator is buggy. Is it possible to tell what kind of bug in the comparator causes this? In my experiments it did not matter if x != x , it also did not matter if x < y and y < z but z < x , but it did matter if x = y and y = z but x < z for some values x, y, z. Is this generally so?

(If there were a general rule to this, it might be easier to look for the bug in the comparator. But of course it is better to fix all bugs. :-) )

In particular, the following two comparators did not make TimSort complain:

    final Random rnd = new Random(52);

    Comparator<Integer> brokenButNoProblem1 = new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            if (o1 < o2) {
                return Compare.LESSER;
            } else if (o1 > o2) {
                return Compare.GREATER;
            }
            return rnd.nextBoolean() ? Compare.LESSER : Compare.GREATER;
        }
    };

    Comparator<Integer> brokenButNoProblem2 = new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            if (o1 == o2) {
                return Compare.EQUAL;
            }
            return rnd.nextBoolean() ? Compare.LESSER : Compare.GREATER;
        }
    };

but the following comparator did make it throw up:

    Comparator<Integer> brokenAndThrowsUp = new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            if (Math.abs(o1 - o2) < 10) {
                return Compare.EQUAL; // WRONG and does matter
            }
            return Ordering.natural().compare(o1, o2);
        }
    };

UPDATE: in some real life data we had a failure where there were no x,y,z with x = y and y = z but x < z . So It seems my guess was wrong, and it doesn't seem this specific kind failure only. Any better ideas?

Kleon answered 25/7, 2014 at 8:25 Comment(4)
This related question is about a Comparator that caused the exception because the comparator was not transitive: https://mcmap.net/q/49264/-quot-comparison-method-violates-its-general-contract-quotCowbird
And another question where the bug was that the comparator was not transitive: https://mcmap.net/q/1634140/-comparison-method-violates-its-general-contract-exceptionCowbird
And yet another: https://mcmap.net/q/1634140/-comparison-method-violates-its-general-contract-exceptionCowbird
In general, TimSort will only throw if it happens to discover that the Comparator doesn't work in the course of its normal sorting algorithm. If it doesn't happen to do the right set of comparisons to make something break, it won't throw. You shouldn't depend on the sorting algorithm as a test.Schoonover
B
7

After looking at the code of ComparableTimSort I am not quite sure. Let's analyze it. Here is the only method that throws it (there is a similar method that does the same only with exchanged roles, so analyzing one of them is enough).

private void mergeLo(int base1, int len1, int base2, int len2) {
        assert len1 > 0 && len2 > 0 && base1 + len1 == base2;

        // Copy first run into temp array
        Object[] a = this.a; // For performance
        Object[] tmp = ensureCapacity(len1);

        int cursor1 = tmpBase; // Indexes into tmp array
        int cursor2 = base2;   // Indexes int a
        int dest = base1;      // Indexes int a
        System.arraycopy(a, base1, tmp, cursor1, len1);

        // Move first element of second run and deal with degenerate cases
        a[dest++] = a[cursor2++];
        if (--len2 == 0) {
            System.arraycopy(tmp, cursor1, a, dest, len1);
            return;
        }
        if (len1 == 1) {
            System.arraycopy(a, cursor2, a, dest, len2);
            a[dest + len2] = tmp[cursor1]; // Last elt of run 1 to end of merge
            return;
        }

        int minGallop = this.minGallop;  // Use local variable for performance
    outer:
        while (true) {
            int count1 = 0; // Number of times in a row that first run won
            int count2 = 0; // Number of times in a row that second run won

            /*
             * Do the straightforward thing until (if ever) one run starts
             * winning consistently.
             */
// ------------------ USUAL MERGE
            do {
                assert len1 > 1 && len2 > 0;
                if (((Comparable) a[cursor2]).compareTo(tmp[cursor1]) < 0) {
                    a[dest++] = a[cursor2++];
                    count2++;
                    count1 = 0;
                    if (--len2 == 0)
                        break outer;
                } else {
                    a[dest++] = tmp[cursor1++];
                    count1++;
                    count2 = 0;
                    if (--len1 == 1)
                        break outer;
                }
            } while ((count1 | count2) < minGallop);

// ------------------ GALLOP
            /*
             * One run is winning so consistently that galloping may be a
             * huge win. So try that, and continue galloping until (if ever)
             * neither run appears to be winning consistently anymore.
             */
            do {
                assert len1 > 1 && len2 > 0;
                count1 = gallopRight((Comparable) a[cursor2], tmp, cursor1, len1, 0);
                if (count1 != 0) {
                    System.arraycopy(tmp, cursor1, a, dest, count1);
                    dest += count1;
                    cursor1 += count1;
                    len1 -= count1;
// -->>>>>>>> HERE IS WHERE GALLOPPING TOO FAR WILL TRIGGER THE EXCEPTION
                    if (len1 <= 1)  // len1 == 1 || len1 == 0
                        break outer;
                }
                a[dest++] = a[cursor2++];
                if (--len2 == 0)
                    break outer;

                count2 = gallopLeft((Comparable) tmp[cursor1], a, cursor2, len2, 0);
                if (count2 != 0) {
                    System.arraycopy(a, cursor2, a, dest, count2);
                    dest += count2;
                    cursor2 += count2;
                    len2 -= count2;
                    if (len2 == 0)
                        break outer;
                }
                a[dest++] = tmp[cursor1++];
                if (--len1 == 1)
                    break outer;
                minGallop--;
            } while (count1 >= MIN_GALLOP | count2 >= MIN_GALLOP);
            if (minGallop < 0)
                minGallop = 0;
            minGallop += 2;  // Penalize for leaving gallop mode
        }  // End of "outer" loop
        this.minGallop = minGallop < 1 ? 1 : minGallop;  // Write back to field

        if (len1 == 1) {
            assert len2 > 0;
            System.arraycopy(a, cursor2, a, dest, len2);
            a[dest + len2] = tmp[cursor1]; //  Last elt of run 1 to end of merge
        } else if (len1 == 0) {
            throw new IllegalArgumentException(
                "Comparison method violates its general contract!");
        } else {
            assert len2 == 0;
            assert len1 > 1;
            System.arraycopy(tmp, cursor1, a, dest, len1);
        }
    }

The method performs a merging of two sorted runs. It does a usual merge but starts "gallopping" once it encounters that one side starts "winning" (I.e., being always less than the other) all the time. Gallopping tries to make things faster by looking ahead more elements instead of comparing one element at a time. Since the runs should be sorted, looking ahead is fine.

You see that the exception is only throw when len1 is 0 at the end. The first observation is the following: During the usual merge, the exception can never be thrown since the loop aborts directly once len this 1. Thus, the exception can only be thrown as result of a gallop.

This already gives a strong hint that the exception behaviour is unreliable: As long as you have small data sets (so small that a generated run may never gallop, as MIN_GALLOP is 7) or the generated runs always coincidentally generate a merge that never gallops, you will never receive the exception. Thus, without further reviewing the gallopRight method, we can come to the conclusion that you cannot rely on the exception: It may never be thrown no matter how wrong your comparator is.

Blowtube answered 25/7, 2014 at 8:48 Comment(1)
+1 in my experience the exception is unreliable and whether it is thrown depends both on the comparator and on the data that it's used to sort.Francie
P
-1

From the documentation:

IllegalArgumentException - (optional) if the natural ordering of the array elements is found to violate the Comparable contract

I didn't find much on the mentioned contract, but IMHO it should represent a total order (ie the relation defined by the compareTo method has to be transitive, antisymmetric, and total). If that requirement isn't met, sort might throw an IllegalArgumentException. (I say might because failure to meet this requirement could go unnoticed.)

EDIT: added links to the properties that make a relation a total order.

Parasol answered 25/7, 2014 at 8:53 Comment(5)
Contract means that if you implement an interface (Comparable in this case) than you have to do what the documentation says. You are 'signing' a contract by implementing an interface. Comparable states very well what has to be fulfilled for total/natural ordering.Internalcombustion
Yes, I know that, but while the documentation explicitly mentions a contract for compareTo in conjunction withSet, there is no information about requirements for sort.Parasol
The OP clearly knows what a Comparator is meant to do. He is seeking specific information about what kind of bug is present when the sort fails in a specific manner.Cowbird
@Raedwald: I thought I made it clear that IMHO the requirement is to define a total order (ie. transitive, antisymmetric, and total).Parasol
@Absurdmind can you please post a specific example of a simplified code that throws the exception then provide a sample code that shows the fix ? This forums need a teacher mindset not a lawyer one. We all know what contract means and interface, most people having a challenge reproducing it, hence this thread.Dashed

© 2022 - 2024 — McMap. All rights reserved.