Why is builtin sorted() slower for a list containing descending numbers if each number appears twice consecutively?
Asked Answered
G

2

32

I sorted four similar lists. List d consistently takes much longer than the others, which all take about the same time:

a:  33.5 ms
b:  33.4 ms
c:  36.4 ms
d: 110.9 ms

Why is that?

Test script (Attempt This Online!):

from timeit import repeat

n = 2_000_000
a = [i // 1 for i in range(n)]  # [0, 1, 2, 3, ..., 1_999_999]
b = [i // 2 for i in range(n)]  # [0, 0, 1, 1, 2, 2, ..., 999_999]
c = a[::-1]                     # [1_999_999, ..., 3, 2, 1, 0]
d = b[::-1]                     # [999_999, ..., 2, 2, 1, 1, 0, 0]

for name in 'abcd':
    lst = globals()[name]
    time = min(repeat(lambda: sorted(lst), number=1))
    print(f'{name}: {time*1e3 :5.1f} ms')
Greggrega answered 5/3, 2024 at 15:21 Comment(18)
Sorting algorithms have certain ideal and certain worst cases. It's entirely possible that list is simply hitting the worst case.Accede
@Accede In that case, it's weird that c is fast while d is slow since both are just in reverse order. You'd think those would hit the same case category. Perhaps the real question is "Why is sorting list c fast?"Brodsky
Tinkered around a bit: difference quite sensitive to e.g. number. I don't take timing tests seriously when they take less than 5 seconds .Canoe
@Michael True, but to really understand that you'd need to dig deep into the sorting implementation. Which you can certainly do as an exercise and for academic interest. It just has little practical use.Accede
@Accede Ah, giving it a better look. D is actually different from C as there are repeated elements in D, so they are not the same case.Brodsky
Anyone who wants to dig in might want to start with bugs.python.org/file4451/timsort.txt, which is a detailed explanation of Python's sorting algorithm.Fungosity
@Michael Good catch. Yeah, it probably takes an extra pass or so, because it wants to check the dupes again or something like that.Accede
Early hypothesis: might have to do with count_runs. a and b detect the whole list as a single ascending run (lo[0] <= lo[1] <= lo[2] <= ...); c detects the whole list as a single descending run (lo[0] > lo[1] > lo[2] > ...); but d detects a million ascending (e.g. 10 <= 10) or descending (e.g. 10 > 9) runs of size 2. (I have not looked further yet, but this is a rather large difference in the inputs)Reconnaissance
@Canoe What do you mean with quite sensitive? If you for example use number=10 instead of number=1, the times become 10 times as large, of course. And that's no issue. Still shows the same thing, d taking much longer than the others.Greggrega
The ratio dropped to almost half when I increased number to 7. Mileages vary.Canoe
@Canoe Consistently? That sounds like you have a very unstable system.Greggrega
(Just the usual caveat, not intending much variance on consecutive runs on one particular/my system.)Canoe
@Canoe What times do you get with the original code and what times with number=7?Greggrega
Averages "2000000*7" 0.522, 0.534, 0.545, and 2.92, respectively.Canoe
@Canoe And with number=1?Greggrega
Let us continue this discussion in chat.Canoe
I thought this would be related to stackoverflow.com/questions/75677825, but it doesn't seem to be. Anyway, any chance we could use an example that doesn't invoke eval (encouraging insecure practice) just to get nice output?Lockard
@KarlKnechtel Yeah, that's unrelated. Replaced eval.Greggrega
H
43

As alluded to in the comments by btilly and Amadan, this is due to how the Timsort sorting algorithm works. Detailed description of the algorithm is here.

Timsort speeds up operation on partially sorted arrays by identifying runs of sorted elements.

A run is either "ascending", which means non-decreasing:

a0 <= a1 <= a2 <= ...

or "descending", which means strictly decreasing:

a0 > a1 > a2 > ...

Note that a run is always at least 2 long, unless we start at the array's last element.

Your arrays a, b and c each consist of just one run. The array d has 1 million runs.

The reason why the descending run cannot be >= is to make the sort stable, i.e. keep the order of equal elements:

The definition of descending is strict, because the main routine reverses a descending run in-place, transforming a descending run into an ascending run. Reversal is done via the obvious fast "swap elements starting at each end, and converge at the middle" method, and that can violate stability if the slice contains any equal elements. Using a strict definition of descending ensures that a descending run contains distinct elements.

Python 3.11 has slightly improved version of timsort, sometimes called powersort, but it uses the same run detection and thus has the same performance.

Hyacinthus answered 5/3, 2024 at 15:42 Comment(20)
Is there any documentation that explains why stability was deemed important for the default sort? Why did the Python devs not supply a separate stable_sort for those who need it? (In any case, stability is only meaningful if the sort key is not the entire value; for a list of (no-locale) strings or a list of numbers, an unstable sort is indistinguishable from a stable one.)Eskimoaleut
@MartinKealey In general, Python places ease of use in front of speed - an interpreted language will never be terribly fast. At the point when Timsort was introduced, Python had already used a stable sort for a decade, changing it would have broken compatibility.Hyacinthus
@MartinKealey "for a list of (no-locale) strings or a list of numbers, an unstable sort is indistinguishable from a stable one." - Not true. Generally you can still distinguish objects by id.Greggrega
@Hyacinthus Python's previous sort wasn't stable, where did you get that it was? And Python still uses timsort, it was just changed a little in a way irrelevant to your answer (the run detection is still the same). The current description still calls it timsort, and in the history you can see Tim updated it to use powersort's merge strategy. Clearly Tim still considers it timsort. Btw I used Python 3.12 for the question's benchmark results.Greggrega
@nocomment Hmm yeah, looking at it closer powersort will be same performance. Regarding sort stability, I tested with Python 2.2, but now I realize it's not very easy to construct a test case that would definitely prove stable vs. not stable. Especially as the old implementation is rather complicated. Interestingly this doc claims it is stable since 2.2, but there doesn't seem to be any relevant change at that version.Hyacinthus
Hmm. The 2.2 doc says: "Whether the sort() method is stable is not defined by the language [...]. In the C implementation of Python, sorts were stable only by accident through Python 2.2. Code that intends to be portable across implementations and versions must not rely on stability". Not sure how you get stable "by accident". And here Tim really made it sound like it wasn't stable. Not sure what to think now.Greggrega
Oh, I guess I misinterpreted that. To me, stability is a feature of the algorithm . So I interpreted that as "the algorithm happened to be stable, without that having been intentional". Like if you invent mergesort without thinking about stability and later discover that it's stable. But I now think the doc really meant that "sorts" sometimes were stable, in the sense of when you sorted some data, sometimes you got lucky and equal values didn't change their order.Greggrega
@nocomment The earliest versions of Python seem to use plain qsort() which is stable on some platforms and unstable on others. But yeah, looks like it wasn't guaranteed stable until 2.3, though I wouldn't be surprised if some code relied on "stable on most platforms" by then.Hyacinthus
I'm the guy who wrote almost all of CPython's sorts. No sort implementation in CPython was guaranteed stable before "timsort". No platform offers a stable qsort() by default (all ways of making quicksort stable require more memory and/or time than basic quicksort). For any non-stable sorting algoithm, there are probably specific inputs it leaves stable - "by accident".Prink
@nocomment yes in general objects can be distinguished; that's precisely why I said "if the sort key is not the entire value". Perhaps I should have clarified that, for the purposes of that statement, I intended the "object identity" to be considered part of the "value", but I would have thought that was blindingly obvious from the context: if there's no way to distinguish two elements in an array, then clearly it doesn't matter what order you put them, so a unstable sort will produce a result that's indistinguishable from that produced by a stable sort.Eskimoaleut
On a related point, all the tests I've done seem to indicate that identical string values seem to produce the same result from id(). Is that because Python is smart about merging duplicate strings, or because id on strings does some kind of hash, rather than simply using the memory address?Eskimoaleut
@MartinKealey Yeah but you said "for a list of (no-locale) strings or a list of numbers, an unstable sort is indistinguishable from a stable one", and that's just not right. Whether two equal strings or numbers are the same object is an implementation detail and depends on how you create them. For example s = 'a'; print(s is s[::-1]) and i = 300; print(i is --i) create equal separate objects for me (demo).Greggrega
@MartinKealey: No. Equal strings can easily have different ID values.Nordgren
What's interesting: If you have a "guaranteed stable" sort and you sort, then reverse an array, then another sort is not allowed to just reverse it. I was thinking: With these rules, TimSort can just copy a range that is in ascending order ("<=") and can just reverse a range that is in descending order (">"). Could you use ">=" for descending order and just be a bit careful during the reversing, so that equal values don't get reversed?Comanchean
@TimPeters Why doesn't timsort look for >= like gnasher729 said? I think count_run could delay the ascending-vs-descending decision until the first inequality, and in case of descending, it could reverse each run of equals and at the end reverse the whole >= run as it does now with the > run. Could make sorting this special case (and similar cases) much faster, maybe sorting d would become almost as fast as sorting c. Is this just too rare or costly, and on average it would be harmful?Greggrega
That would be slower than you're guessing ;-) The sort only uses __lt__() comparisons, so equality cannot be detected directly. Establishing that x == y (which the algorithm never needs to do now) would have to follow as a deduction from establishing not x < y and not y < x. So how is establishing that x <= y for ascending runs done? As a deduction from establishing not y < x, a single __lt__() comparison (and if it's not the case that not y < x, then it's established that the run is (strictly) desciending).Prink
@TimPeters Ah, right. Although it might still be worth it. (1) Each run's initial phase of equals might be short and thus not require many double comparisons. (2) The ascending-mode is not affected. (3) The descending-mode only needs an extra comparisons for each equal pair and one for a run-ending < pair. So for data of mostly > but occasional == this could cost little and lead to longerer runs, saving comparisons. My list d has many > and ==, and I think this would drop the number of comparisons from 5.355 per element (see my answer) to 1.5 per element.Greggrega
I won't pursue it - aimed at a contrived case, and would complicate the code at the cost of slowing other cases. For example, [2, 1] * 1000000. As is, each time we hit a 1, 2 < 1? says "no" and we get out of "descending mode" at once. But with the change, every time we'd also need to ask "OK, 1 < 2?" to determine whether or not they were equal. Leaving "descending mode" would always require 2 compares (but needs only 1 now). Add more complications to keep track of the last time an all-equal run started, etc - gets less attractive with each new detail.Prink
On third thought ;-) , I'm warming to this, and opened an issue on CPython's tracker: github.com/python/cpython/issues/116554Prink
@TimPeters Nice :-). And I think your [2, 1] * 1000000 case wouldn't be affected much. It quickly goes to MINRUN mode anyway, so it's not like an extra comparison at every second element but at every minrun. Relatively insignificant. Great insight of the circular a[0] <= ... <= a[-2] <= a[0] as a cheap "all equal". Although even if a million equal values did take 2 million comparisons, would that be so terrible? It's rare and it would still be a near-best case! Even that might still be worth it if it improved average/real cases enough. Oh well, irrelevant thoughts now, given your insight.Greggrega
G
13

As @jpa's answer explained, a, b and c have a single "run" while d has a million. We can also see the effect of that by counting the comparisons:

a:  1999999 comparisons, 1.000 per element
b:  1999999 comparisons, 1.000 per element
c:  1999999 comparisons, 1.000 per element
d: 10710650 comparisons, 5.355 per element

A single long run means we only compare each pair of neighbors once, that's all. For n elements, that's n-1 comparisons.

Since a and b are ascending, they're already sorted, and nothing further is done. Since c is descending, it simply gets reversed, and then nothing further is done.

For d, the two-element runs first get combined into runs of 32 to 64 elements using binary insertion sort. And then these longer runs get merged again and again into larger and larger runs until the whole list is sorted.

I tried various values of n. The number of comparisons for d hovered around 5 per element, going up to ~5.3 until n reached a power of 2, after which the number of comparisons fell to ~4.7 again. In general, d requires O(n) comparisons. It doesn't need n log n comparisons, because the merging uses "galloping" (an exponential search), which in this case needs only logarithmically many comparisons. So unlike ordinary merge sort in general cases, the recurrence relation for the number of comparisons here isn't T(n) = 2T(n/2) + n but T(n) = 2T(n/2) + log(n), which is O(n).

The runtime however isn't O(n), as there are still n log n moves of elements. But that's low-level and very fast, relatively fast compared to the O(n) slower comparisons. If you measure time for various n, you might very well think it's O(n) time.

The script for counting (Attempt This Online!):

n = 2_000_000
a = [i // 1 for i in range(n)]
b = [i // 2 for i in range(n)]
c = a[::-1]
d = b[::-1]

class Int:
    def __init__(self, value):
        self.value = value
    def __lt__(self, other):
        global comparisons
        comparisons += 1
        return self.value < other.value

for name in 'abcd':
    lst = globals()[name]
    comparisons = 0
    sorted(map(Int, lst))
    print(f'{name}: {comparisons} comparisons, {comparisons/len(lst) :5.3f} per element')
Greggrega answered 5/3, 2024 at 16:34 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.