What causes std::sort() to access address out of range
Asked Answered
W

2

9

I understand that to use std::sort(), the compare function must be strict weak order, otherwise it will crash due to accessing address out of bound. (https://gcc.gnu.org/ml/gcc-bugs/2013-12/msg00333.html)

However, why would std::sort() access out-of-bound address when the compare function is not strict weak order? What is it trying to compare?

Also I wonder if there are other pitfalls in STL that I should be aware of.

Witherspoon answered 4/6, 2014 at 21:37 Comment(13)
Please define what do you mean by "not strictly weak order", kindly give an example.Chapeau
@40two the linked bug gives such an example, the most direct method is simply returning a <= b. It violates the rule that if a is less-than b, then b less-than a must be false.Morrissey
@Morrissey Thanks I got it now.Chapeau
I wonder why this is a bug report though. As far as I can see it is undefined what happens and accessing out of range is therefore a valid thing to do.Loupgarou
@RaphaelMiedl follow the link to bugzilla and you'll see it was rejected. You're right, it's undefined, and hence a crash is a perfectly valid result. Anyone can submit a bug for anything. gcc.gnu.org/bugzilla/show_bug.cgi?id=59391Frenulum
"other pitfalls in STL that I should be aware of." Yes. Anything that's undefined behavior. To avoid: read the documentation for the functions you're using, and do what it says.Trackman
@RaphaelMiedl Including <unistd.h> is also undefined behavior; that doesn't mean it's okay to crash. The blanket of ISO C or C++ undefined behavior is not an excuse. It is not reasonable for a sorting routine to crash just because the comparison function is not well behaved. A reasonable consequence is that a sequence emerges which is not in sorted order. I do not have confidence in the quality of sort code that crashes for this reason. The library implementors should define this behavior for their implementation and write more solid, secure code to ensure it.Oculomotor
@Frenulum Note that bug 59391 was not rejected because of undefined behavior. It was closed because the behavior was not reproducible in latest code more due to another fix. The nonproductive comment about undefined behavior came in a subsequent comment, with the PR being closed already.Oculomotor
@Oculomotor if a < b returns true and b < a returns true which one should be first then? It would be quite hard to give a reasonable guarantee other than maybe ordered randomly which doesn't make much sense for a sorting algorithm either. And as far as the standard is concerned everything can happen on undefined behaviour. And in which ways is including <unistd.h> undefined behavior? I can't find anything about that.Loupgarou
@RaphaelMiedl "Can't find anything about that" is why it's undefined behavior. Assume the converse: suppose C++ defines #include <unistd.h>. Well, what is the definition? In the real world, "as far as the standard is concerned" is just an opinion, (usually a weighty one, but never absolutely so).Oculomotor
@RaphaelMiedl One thing I would tolerate, though is a sort which fails to terminate if the function is badly behaved. For instance, suppose you have a bubble short which checks, after each pass, whether order has been achieved. This will always be false for certain sorting functions and so the loop never bails. Crashing, out of the question though.Oculomotor
@Oculomotor I'm more into why that would be undefined behaviour, the standard defines what happens if you include something in section 16.2 that this header doesn't get shipped with the compiler, the code in it isn't standard doesn't make it undefined. As much as writing own code to print "Hello world" isn't undefined either. It's just 3rd party code. Breaking something stated by the standard as requirement on the other hand is undefined behaviour if the standard doesn't make an exception.Loupgarou
@Oculomotor guess we just got different opinions for it crashing. In my opinion that isn't bad at all. I actually think it could help me find a problem easier.Loupgarou
I
17

The first thing is that calling the algorithm with a comparator that does not comply with the requirements is undefined behavior and anything goes...

But other than that, I assume that you are interested in knowing what type of implementation might end up accessing out of bounds if the comparator is bad. Should the implementation not check the bounds before accessing the elements in the first place? i.e. before calling the comparator

The answer is performance, and this is just one of the possible things that could lead to this type of issues. There are different implementations of sorting algorithms, but more often than not, std::sort is built on top of a variant of quicksort that will degenerate on a different sorting algorithm like mergesort to avoid the quicksort worst case performance.

The implementation of quicksort selects a pivot and then partitions the input around the pivot, then independently sorts both sides. There are different strategies for selection of the pivot, but a common one is the median of three: the algorithm gets the values of the first, last and middle element, selects the median of the three and uses that as the pivot value.

Conceptually partition walks from the left until it finds an element that is not smaller than the pivot, it then walks from the right trying to find an element that is smaller than the pivot. If the two cursors meet, partition completed. If the out of place elements are found, the values are swapped and the process continues in the range determined by both cursors. The loop walking from the left to find the element to swap would look like:

while (pos < end && value(pos) < pivot) { ++pos; }

While in general partition cannot assume that the value of pivot will be in the range, quicksort knows that it is, after all it selected the pivot out of the elements in the range. A common optimization in this case is to swap the value of the median to be in the last element of the loop. This guarantees that value(pos) < pivot will be true before pos == end (worst case: pos == end - 1). The implication here is that we can drop the check for the end of the range and we can use a unchecked_partition (pick your choice of name) with a simpler faster condition:

while (/*pos < end &&*/ value(pos) < pivot) ++pos;

All perfectly good, except that < is spelled comparator(value(pos), pivot). Now if the comparator is incorrectly implemented you might end up with comparator(pivot,pivot) == true and the cursor will run out of bounds.

Note that this is just one example of optimization of the algorithm that can remove bounds check for performance: assuming a valid order, it is impossible to walk out of the array in the above loop if quicksort set the pivot to the last element before calling this modified partition.

Back to the question:

Should the implementation not check the bounds before accessing the elements in the first place? i.e. before calling the comparator

No, not if it removed bounds checking by proving that it won't walk out of the array, but that prove is built on the premise that the comparator is valid.

Imperfective answered 4/6, 2014 at 22:28 Comment(1)
Hello David - sorry to hijack a comment, but I was wondering if you could work your magic on a similar question? #45600009 Same principle, but irreflexibility exists for the comparator. This time it's transitivity that is lacking. Again, sorry to hijack!Unpaid
L
1

std::sort does indeed require that the given comparator establishes a strict weak ordering, otherwise the sorting doesn't really make much sense.

As for it accessing out of range, the link you posted is to a bug report, i.e. it isn't supposed to actually do this. Compilers like any other software can and will have bugs. As noted by Adam this particular bug report got rejected since it isn't really a bug.

What exactly happens when you don't have strict weak ordering isn't defined by the standard, it doesn't make sense to do that and is therefore left out by the standard. Therefore it is undefined by omission. Undefined means that anything can happen, even accessing out of range.

As for avoiding "pitfalls" just be aware of the requirements of the algorithms and functions you use. For C++ there is a nice reference site that I usually use: cppreference

Which on the page of std::sort says:

comp - comparison function object (i.e. an object that satisfies the requirements of Compare) which returns ​true if the first argument is less than (i.e. is ordered before) the second.

With a link to the description of Compare

Loupgarou answered 4/6, 2014 at 21:47 Comment(3)
I understand all the argument, and I'm not debating whether a bug should be opened. My question is can someone familiar with STL tell me how/why std::sort() crashes? It's acceptable that sort() results are incorrect if the compare function is not strict weak order, but I don't think it acceptable for it to segment fault. For example, if I provide qsort() with a random compare function, it would never crash even though the results are incorrect. The caller should always pass two valid elements to compare, right?Witherspoon
Well, that's where we have differing opinions then I guess. In my opinion crashing isn't really bad. I'd rather have it crash than get a random order of items from a sort function. I immediately know that something is wrong and finding a crash is easier than looking through thousands of lines to check all the places where the values from some array got changed.Loupgarou
The problem is it's not always crashing, so still I had to spend a lot of time trying to figure out why. When I ran unit tests, they all passed, but with some combination of data array it would crash. I suspect that there is some real bug in std::sort(), and that's why I want to understand why std::sort() would pass illegal addresses to compare function.Witherspoon

© 2022 - 2024 — McMap. All rights reserved.