The first thing is that calling the algorithm with a comparator that does not comply with the requirements is undefined behavior and anything goes...
But other than that, I assume that you are interested in knowing what type of implementation might end up accessing out of bounds if the comparator is bad. Should the implementation not check the bounds before accessing the elements in the first place? i.e. before calling the comparator
The answer is performance, and this is just one of the possible things that could lead to this type of issues. There are different implementations of sorting algorithms, but more often than not, std::sort
is built on top of a variant of quicksort that will degenerate on a different sorting algorithm like mergesort to avoid the quicksort worst case performance.
The implementation of quicksort selects a pivot and then partitions the input around the pivot, then independently sorts both sides. There are different strategies for selection of the pivot, but a common one is the median of three: the algorithm gets the values of the first, last and middle element, selects the median of the three and uses that as the pivot value.
Conceptually partition walks from the left until it finds an element that is not smaller than the pivot, it then walks from the right trying to find an element that is smaller than the pivot. If the two cursors meet, partition completed. If the out of place elements are found, the values are swapped and the process continues in the range determined by both cursors. The loop walking from the left to find the element to swap would look like:
while (pos < end && value(pos) < pivot) { ++pos; }
While in general partition cannot assume that the value of pivot will be in the range, quicksort knows that it is, after all it selected the pivot out of the elements in the range. A common optimization in this case is to swap the value of the median to be in the last element of the loop. This guarantees that value(pos) < pivot
will be true before pos == end
(worst case: pos == end - 1
). The implication here is that we can drop the check for the end of the range and we can use a unchecked_partition
(pick your choice of name) with a simpler faster condition:
while (/*pos < end &&*/ value(pos) < pivot) ++pos;
All perfectly good, except that <
is spelled comparator(value(pos), pivot)
. Now if the comparator
is incorrectly implemented you might end up with comparator(pivot,pivot) == true
and the cursor will run out of bounds.
Note that this is just one example of optimization of the algorithm that can remove bounds check for performance: assuming a valid order, it is impossible to walk out of the array in the above loop if quicksort set the pivot to the last element before calling this modified partition.
Back to the question:
Should the implementation not check the bounds before accessing the elements in the first place? i.e. before calling the comparator
No, not if it removed bounds checking by proving that it won't walk out of the array, but that prove is built on the premise that the comparator is valid.
a <= b
. It violates the rule that ifa
is less-thanb
, thenb
less-thana
must be false. – Morrissey<unistd.h>
is also undefined behavior; that doesn't mean it's okay to crash. The blanket of ISO C or C++ undefined behavior is not an excuse. It is not reasonable for a sorting routine to crash just because the comparison function is not well behaved. A reasonable consequence is that a sequence emerges which is not in sorted order. I do not have confidence in the quality of sort code that crashes for this reason. The library implementors should define this behavior for their implementation and write more solid, secure code to ensure it. – Oculomotora < b
returnstrue
andb < a
returnstrue
which one should be first then? It would be quite hard to give a reasonable guarantee other than maybe ordered randomly which doesn't make much sense for a sorting algorithm either. And as far as the standard is concerned everything can happen on undefined behaviour. And in which ways is including<unistd.h>
undefined behavior? I can't find anything about that. – Loupgarou#include <unistd.h>
. Well, what is the definition? In the real world, "as far as the standard is concerned" is just an opinion, (usually a weighty one, but never absolutely so). – Oculomotor16.2
that this header doesn't get shipped with the compiler, the code in it isn't standard doesn't make it undefined. As much as writing own code to print "Hello world" isn't undefined either. It's just 3rd party code. Breaking something stated by the standard as requirement on the other hand is undefined behaviour if the standard doesn't make an exception. – Loupgarou