As @jpa's answer explained, a
, b
and c
have a single "run" while d
has a million. We can also see the effect of that by counting the comparisons:
a: 1999999 comparisons, 1.000 per element
b: 1999999 comparisons, 1.000 per element
c: 1999999 comparisons, 1.000 per element
d: 10710650 comparisons, 5.355 per element
A single long run means we only compare each pair of neighbors once, that's all. For n elements, that's n-1 comparisons.
Since a
and b
are ascending, they're already sorted, and nothing further is done. Since c
is descending, it simply gets reversed, and then nothing further is done.
For d
, the two-element runs first get combined into runs of 32 to 64 elements using binary insertion sort. And then these longer runs get merged again and again into larger and larger runs until the whole list is sorted.
I tried various values of n. The number of comparisons for d
hovered around 5 per element, going up to ~5.3 until n reached a power of 2, after which the number of comparisons fell to ~4.7 again. In general, d
requires O(n) comparisons. It doesn't need n log n comparisons, because the merging uses "galloping" (an exponential search), which in this case needs only logarithmically many comparisons. So unlike ordinary merge sort in general cases, the recurrence relation for the number of comparisons here isn't T(n) = 2T(n/2) + n
but T(n) = 2T(n/2) + log(n)
, which is O(n).
The runtime however isn't O(n), as there are still n log n moves of elements. But that's low-level and very fast, relatively fast compared to the O(n) slower comparisons. If you measure time for various n, you might very well think it's O(n) time.
The script for counting (Attempt This Online!):
n = 2_000_000
a = [i // 1 for i in range(n)]
b = [i // 2 for i in range(n)]
c = a[::-1]
d = b[::-1]
class Int:
def __init__(self, value):
self.value = value
def __lt__(self, other):
global comparisons
comparisons += 1
return self.value < other.value
for name in 'abcd':
lst = globals()[name]
comparisons = 0
sorted(map(Int, lst))
print(f'{name}: {comparisons} comparisons, {comparisons/len(lst) :5.3f} per element')
number
. I don't take timing tests seriously when they take less than 5 seconds . – Canoecount_runs
.a
andb
detect the whole list as a single ascending run (lo[0] <= lo[1] <= lo[2] <= ...
);c
detects the whole list as a single descending run (lo[0] > lo[1] > lo[2] > ...
); butd
detects a million ascending (e.g.10 <= 10
) or descending (e.g.10 > 9
) runs of size 2. (I have not looked further yet, but this is a rather large difference in the inputs) – Reconnaissancenumber=10
instead ofnumber=1
, the times become 10 times as large, of course. And that's no issue. Still shows the same thing,d
taking much longer than the others. – Greggregaeval
(encouraging insecure practice) just to get nice output? – Lockardeval
. – Greggrega