How is Array.sort performance affected by the initial ordering of the input?
Asked Answered
O

1

12

Is it possible for the ordering of the input to affect the performance of Array.sort()? If so, how?

Outfit answered 14/9, 2017 at 21:31 Comment(2)
Of course it is. An already sorted array takes almost no time at all, while a worst case needs an excessive amount of reorderings.Limitation
Actually, in my benchmark the ordering didn't seem to matter much when using the default comparator for strings - check the results below.Outfit
O
7

This depends on a few things:

  • the runtime (different browsers/runtimes use different sorting algorithms)
  • how your input is arranged relative to the desired order
  • if you are using a custom comparator or not (also related to the previous point)

An application I am working on experienced severe performance degradation in a module which was sorting a list of 35K+ strings after the API endpoint it was hitting started offering it data in sorted order. The time spent in the front-end sort went from around 30 ms to 6 seconds (200x).

The sorting is done using a custom comparator which prioritises strings ending with a certain suffix. If none or both of the strings end in the suffix, then natural ordering is used. I profiled the module using the browser developer tools and found out that most of the time was spent doing this comparison. The profile also showed that QuickSort is the underlying algorithm used by Array.sort() (at least in Chrome).

quicksort profile

This was strange, as QuickSort should not be affected by the ordering of the input. According to Wikipedia:

[the worst case] may occur if the pivot happens to be the smallest or largest element in the list, or in some implementations (e.g., the Lomuto partition scheme as described above) when all the elements are equal.

I became curious and benchmarked multiple variants of doing the sort. I used benchmark.js running on node in the command line. Both the benchmark and the browser are running on top of v8 so they should use the same sorting algorithm. The results were surprising:

6 tests completed.

  Ordered array, sorted with a default comparator           x 34.27 ops/sec ±1.07% (59 runs sampled)
  Ordered array, sorted with a custom comparator            x  0.18 ops/sec ±2.81% (5 runs sampled)
  Ordered array, shuffled, sorted with a custom comparator  x 38.37 ops/sec ±3.67% (51 runs sampled)
  Ordered array, shuffled, sorted with a default comparator x 29.20 ops/sec ±1.28% (51 runs sampled)
  Unordered array, sorted with a default comparator         x 28.38 ops/sec ±1.28% (50 runs sampled)
  Unordered array, sorted with a custom comparator          x 42.10 ops/sec ±1.32% (55 runs sampled)

These results indicate that the performance degradation is because of the distribution of the data in relation to the comparator. Here are some characteristics of the input:

  • the suffix the comparator prioritises (/Prod) matches about 20% of the strings
  • when the strings are sorted alphabetically, the ones ending in /Prod are very likely to be spread relatively uniformly throughout
  • sequences like ABC/Alpha, ABC/Beta, ABC/Prod are common.

This probably makes the algorithm more likely to select a pivot which is at an extreme in its subsequence and thus causes a very large number of comparisons between elements to be performed.

This only happens in Chrome 61. I've tested both Firefox 52.3 and Safari 10.1 and the issue is not reproduced. I presume it's because a different sorting algorithm is used.

Outfit answered 14/9, 2017 at 21:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.