How does Timsort perform on data in descending order?
Asked Answered
F

1

7

From the:

http://svn.python.org/projects/python/trunk/Objects/listsort.txt

and:

http://en.wikipedia.org/wiki/Timsort

I see, that Timsort has some optimizations when a0 > a1 > a2 > ..., but what about next array:

10000,10000,9999,9999,9998,9998,....,9,9,8,8,7,7,6,6,5,5,4,4,3,3,2,2,1,1,0,0

What is a time efficiency for such array?

(integers were used to simplify an example, stable sorting is required) I have made some measurements and, seems, such arrays are not "good" cases for Timsort.

actually, TimSort in JDK http://cr.openjdk.java.net/~martin/webrevs/openjdk7/timsort/raw_files/new/src/share/classes/java/util/TimSort.java has a method "countRunAndMakeAscending"

@SuppressWarnings("unchecked")
private static int countRunAndMakeAscending(Object[] a, int lo, int hi) {
    assert lo < hi;
    int runHi = lo + 1;
    if (runHi == hi)
        return 1;

    // Find end of run, and reverse range if descending
    if (((Comparable) a[runHi++]).compareTo(a[lo]) < 0) { // Descending
        while(runHi < hi && ((Comparable) a[runHi]).compareTo(a[runHi - 1]) < 0)
            runHi++;
        reverseRange(a, lo, runHi);
    } else {                              // Ascending
        while (runHi < hi && ((Comparable) a[runHi]).compareTo(a[runHi - 1]) >= 0)
            runHi++;
    }

    return runHi - lo;
}

why not to implement it in another way:

private static int countRunAndMakeAscending(Object[] a, int lo, int hi) {
    int runHi = lo;
    int lastEqual = lo;
    int ascending = 0;
    while (++runHi < hi) {
      int c = ((Comparable) a[runHi+1]).compareTo(a[runHi]);
      if (ascending == 0) {
        if (c != 0) {
          if (c > 0) {
            ascending = 1;
          } else {
            ascending = -1;
            reverseRange(a, lastEqual, runHi);
            lastEqual = runHi;
          }
        }
      } else if (ascending == 1) {
        if (c < 0) {
          return runHi - lo;
        }
      } else {
        if (c > 0) {
          reverseRange(a, lastEqual, runHi);
          reverseRange(a, lo, runHi);
          return runHi - lo;
        } else if (c < 0) {
          reverseRange(a, lastEqual, runHi);
          lastEqual = runHi;
        }
      }
    }
    if (ascending == -1) {
      reverseRange(a, lastEqual, runHi);
      reverseRange(a, lo, runHi);
    }
    return runHi - lo;
}

so it will work fine with non ascending order?

Franklynfrankness answered 1/12, 2013 at 11:39 Comment(0)
A
2

Yes.

Basically it decided that "ascending" really meant "not descending", without any loss of generality -- in case you have eg [5,5,4 3] it will just break it into [5,5] (ascending) and then [4,3] (descending) on the next call.

As to why, I guess it's for simplicity: simply try counting the number of invocations of reverseRange() in your code and in the original and you'll get the idea (I got it by noticing how long it took me to understand one version compared to the other :)

edit: WRONG WRONG WRONG! As Oscar Smith pointed out, the reason is to make timsort a stable sorting algorithm. If anyone knows how to transfer the undeserved bounty...

Alterant answered 9/12, 2013 at 16:0 Comment(2)
I'm pretty sure that the reason you gave is actually wrong. The actual reason is to make sure that the sort is stable (that equal elements are returned in the order that they are given. This is useful if you want to do things like sort a tuple by the first element. if you have [(1,2),(2,0),(1,3)], it would be bad if the sorted result was [(1,3),(1,2),(2,0)]Kavanagh
One of the things I plan to do when I have copious free time (ha ha) is to see how big a performance increase you can get by implementing a non-stable timsort. I'd expect a few % in some cases (and O(n) vs O(nlog(n) in some really bad ones)Kavanagh

© 2022 - 2024 — McMap. All rights reserved.