qsort with size 0 undefined? [closed]
Asked Answered
G

4

8

I have a report, unconfirmed by me but from a reliable source, that the code

qsort(a, n, sizeof *a, cmpfunc);

is compiled by a modern version of gcc as if it had been written

if(n == 0)
    __builtin_trap();
qsort(a, n, sizeof *a, cmpfunc);

Evidently it is believed that calling qsort with n == 0 is undefined behavior.

[Edit: The whole premise here was found to be false; see "Update 2" below.]

It has been pointed out that Posix explicitly blesses the n == 0 case, but evidently no extant version of the C Standard does.

So the obvious questions are:

  1. Is calling qsort with n = 0 actually undefined behavior in C?
  2. Is every program which ever calls qsort with arbitrary n truly obliged to check for n == 0 and not call qsort in that case?
  3. Why would gcc perform this "optimization"? Even if you believe that calling qsort with n == 0 is undefined, this would seem to marginally slow down every non undefined program.

Textbook implementations of quicksort (which, I know, qsort is not required to be) pretty much can't not handle n = 0 correctly. I wonder if gcc's behavior here is trying to guard against a qsort implementation that somehow does something much worse than a __builtin_trap if the initial call has n == 0?


Update: Thanks for the responses so far. It sounds like gcc is in the wrong here. As I said, I haven't confirmed this result myself, but I'm trying to find out which version of gcc and with which optimization flags the issue was observed.


Update 2: The original report I referred to was in error. Two key clarifications:

  1. gcc was in fact checking for a == 0, not n == 0. This is obviously a completely different kettle of fish: As this thread (and others) has confirmed, calling qsort on a null pointer is considerably more problematic, and almost certainly formally undefined.
  2. The compilation in question included the -fsanitize=undefined and -fsanitize-undefined-trap-on-error flags, so of course gcc was being stringent about checking for inadvertent null pointers (and even at a cost to efficiency).

Sorry for the misinformation and runaround. I'm afraid this question is now in the realm of "not reproducible or was caused by a typo", and I have put one close vote in the hopper on that basis.

For what it's worth, the gcc version was 12.2.1.

Gandhi answered 26/10, 2022 at 12:29 Comment(15)
Could it be an instrumentation because of a constrained range for n? Recently I used a compiler that silently checked all usages of enumerated values. Anyway, a confirmation would be great, for example as an example at godbolt's Compiler Explorer.Eared
Posix explicitly blesses the n == 0 case, but evidently no extant version of the C Standard does. C11 7.22.5 Searching and sorting utilities p1: These utilities make use of a comparison function to search or sort arrays of unspecified type. Where an argument declared as size_t nmemb specifies the length of the array for a function, nmemb can have the value zero on a call to that function; the comparison function is not called, a search finds no matching element, and sorting performs no rearrangement.Emergency
MSVC is benign. With 0 or 1 elements the comparator function isn't called, although their man page states "qsort calls the compare routine one or more times". However it also says "if width is less than zero, the invalid parameter handler is invoked"Tutelage
The quoted text by @LanguageLawyer is unambiguous and largely settles the discussion. It might be dodgy passing a null pointer; it depends on the interpretation of "Pointer arguments on such a call shall still have valid values, as described in 7.1.4." (where §7.1.4 is a set of general requirements on arguments to library functions). Is NULL a valid pointer value?Gambado
I think you are referring to this post on the TZ mailing list? mm.icann.org/pipermail/tz/2022-October/032096.htmlBanking
@JonathanLeffler In the context of section 7.1.4, a null pointer is not valid argument to a library function call unless explicitly stated otherwise in the detailed descriptions that follow the section.Banking
@IanAbbott Well spotted. :-) And I am posting to that thread now.Gandhi
@SteveSummit Me too (again)!Banking
@IanAbbott — yes, I'm sure the discussion is in relation to the TZ mailing list. The timing and content is too similar for it to be accidental. I agree that passing a null pointer is probably UB — but passing zero elements and a non-null pointer to valid memory (so not a completely random non-null pointer) should be OK. In the context of TZ, the obvious test is if (nlinks > 1) qsort(…); (instead of the proposed if (nlinks != 0) qsort(…);) — there is no need to sort a 1-element array (or a 0-element array, especially not one identified by a null pointer).Gambado
Note that C99 ¶7.10.5 has the preamble remarks also in C11 and C17/C18.Gambado
This question is now moot because the reliable source misread the assembly language produced by GCC. Replace if(n == 0) with if(a == NULL) for the corrected reading. See mm.icann.org/pipermail/tz/2022-October/032107.htmlBanking
@IanAbbott Indeed; see "Update 2" in edited question.Gandhi
@SteveSummit I have also added a close vote for the same reason.Banking
@SteveSummit, I concur that the question was inspired by a situation that would warrant closure for reason of typo / unreproducibility. I'm not sure I would otherwise have said the same about the resulting question itself, but since you're suggesting such a closure of your own question, I have added the third vote needed.Corium
@JohnBollinger It hurts to close my own question, but it also hurts to have wasted people's time on this. And questions contrary to fact, or based on false premises, like this one, are endlessly and meta confusing, so they're really best closed.Gandhi
D
6
  1. Is calling qsort with n = 0 actually undefined behavior in C?

It is well-defined behavior in every version of the language.

  1. Is every program which ever calls qsort with arbitrary n truly obliged to check for n == 0 and not call qsort in that case?

The application programmer's source need not perform any such check. As for the behavior of the generated program, the qsort library function shouldn't call the comparison function internally, so it's essentially the same thing as not calling qsort at all, equivalent to a no-op.

Why would gcc perform this "optimization"? Even if you believe that calling qsort with n == 0 is undefined, this would seem to marginally slow down every non undefined program.

Because n == 0 is a special, well-defined case which allows compiler optimizations (not to call the function). Though of course an extra branch is not necessarily an optimization.


Sources:

C17 7.22.5.2

void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));

C17 7.22.5 emphasis mine:

These utilities make use of a comparison function to search or sort arrays of unspecified type. Where an argument declared as size_t nmemb specifies the length of the array for a function, nmemb can have the value zero on a call to that function; the comparison function is not called, a search finds no matching element, and sorting performs no rearrangement. Pointer arguments on such a call shall still have valid values, as described in 7.1.4.

Dissatisfied answered 26/10, 2022 at 13:2 Comment(6)
Aren't "Yes" and "Or rather, qsort shouldn't call the comparison function" effectively contradictory answers to question 2? Based on everything else stated here, I would expect the answer to be "No".Bullough
@WarrenWeckesser Not really. There's no requirement that qsort isn't called, just that the call should boil down to a no-op. It's all clarified by the quoted 7.22.5.Dissatisfied
But question 2 asks if callers of qsort are obligated to check for n == 0 and not call qsort. Your argument then implies that the answer is "No", because callers do not have to explicitly check for n == 0--qsort should handle that case.Bullough
@WarrenWeckesser Ah, well I guess that depends on by "program" you mean the application source or the compiler-generated binary. I'll clarify.Dissatisfied
Thanks for the legwork, @Lundin. I (blush) hadn't checked the Standard myself, but was relying on the research of other respondents on the original mailing list thread who hadn't been able to find anything.Gandhi
Re “It is well-defined behavior in every version of the language.”: As dbush’s answer indicates, the 1990 C standard did not contain the wording stating nmemb can be zero. But, like other versions, it did define an array type as “a contiguously allocated nonempty set of objects…,” so, for qsort to “sort an array of nmemb objects”, it would have to be given a nonempty array.Cam
C
6

As others have mentioned, the most recent version of the C standard as well as POSIX explicitly allows for the nmemb argument to be 0. However, this language is missing from the C89 standard.

Section 4.10.5 of C89 (equivalent to §7.10.5 of C90) doesn't contain the additional paragraphs before the specifications for bsearch or qsort that allows for this. So it's possible that compiling in strict C89 mode could generate the effective code in the question.

The most recent gcc in C89 mode doesn't show the offending behavior:

https://godbolt.org/z/YhKoGEre7

But other versions conceivably could. I haven't checked them all.

UPDATE:

According to this posting, which triggered the initial question:

https://mm.icann.org/pipermail/tz/2022-October/032096.html

And this one in response to it:

https://mm.icann.org/pipermail/tz/2022-October/032107.html

The behavior in question was observed on gcc 12.2 with the -fsanitize=undefined option, and there was an error on the part of the person reporting the behavior in reading the assembly. The above godbolt link shows the following disassembly with the given compiler and options:

        mov     eax, DWORD PTR [rbp-20]
        movsx   rbx, eax
        mov     edi, OFFSET FLAT:.Lubsan_data0
        call    __ubsan_handle_nonnull_arg
        mov     ecx, OFFSET FLAT:cmpfunc
        mov     edx, 4
        mov     rsi, rbx
        mov     edi, 0
        call    qsort
        mov     eax, 0

The check is actually looking to see if base is NULL, not if nmemb is 0. And in that case, it is undefined behavior.

Chanell answered 26/10, 2022 at 13:40 Comment(5)
Hmmm, is there really any "issue" in the question?Cheesecloth
@Cheesecloth That was a bit vague. Updated.Chanell
I haven't managed to reproduce the issue with godbolt either. Thanks for trying.Gandhi
The C99 rationale only mentions that C99 has clarified the requirements and use of the functions. There's no mentioning of the size zero case being explicit UB nor was it listed in the C90 UB list. But maybe this was a quality of implementation thing in C90 indeed.Dissatisfied
nmemb is explicitly allowed to be zero since C99 (see N852 and N872). But I guess the change is considered editorial.Gametophyte
B
4

From POSIX standard (emphasize is mine):

[CX] The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of IEEE Std 1003.1-2001 defers to the ISO C standard.

The qsort() function shall sort an array of nel objects, the initial element of which is pointed to by base. The size of each object, in bytes, is specified by the width argument. If the nel argument has the value zero, the comparison function pointed to by compar shall not be called and no rearrangement shall take place.

Bushtit answered 26/10, 2022 at 13:1 Comment(0)
C
3

As others have mentioned, the C standard library function for qsort is required to handle a size of zero properly.

But that is from a programmers point of view. The C standard does not dictate anything about the produced machine code except that it should behave the way it should.

It's perfectly valid for a C compiler to produce a binary file that calls a sorting function that does NOT handle sizes of 0 properly, as long as it adds a zero check before it. But I cannot find anything in the C89 standard that allows UB if the size is zero.

In practice, the additional text in the specification does not add much. The relevant part is this:

nmemb can have the value zero on a call to that function; the comparison function is not called

That means that this snippet:

#include <stdio.h>
#include <stdlib.h>

int cmpfunc (const void * a, const void * b) {

   puts("foobar"); // To see if this function is executed

   return ( *(int*)a - *(int*)b );
}

int main (void) {

   int values[1] = {42};

   qsort(values, 0, sizeof *values, cmpfunc);
}

is guaranteed to NOT print "foobar" if you compile with C99 or later. But if you compile with C89, it may happen. Or not. But this code does not invoke undefined behavior in either C89 or a later version.

John Bollinger made an interesting point in the comment section

Absent the explicit specification that the second argument may be 0, I could make an argument for that being UB. It would revolve around the fact that the second argument is required to be the length of the array pointed to be the first argument, and C not having zero-length arrays. But I would nevertheless expect every C implementation to handle the situation in the natural way described by later versions of the spec.

There is a little bit of wiggle room to make the interpretation that it's UB without the explicit requirement that size zero is allowed. However, the C standard explicitly states many things as UB, but not this.

My personal opinion (and I would love to know if there is any official consensus on this matter) is that if that the specs are vague, but not explicitly stated as UB, then the compiler should NOT use the ambiguity for optimizations. Doing so would be malicious compliance

Cheesecloth answered 26/10, 2022 at 14:19 Comment(2)
Absent the explicit specification that the second argument may be 0, I could make an argument for that being UB. It would revolve around the fact that the second argument is required to be the length of the array pointed to be the first argument, and C not having zero-length arrays. But I would nevertheless expect every C implementation to handle the situation in the natural way described by later versions of the spec.Corium
@JohnBollinger Good point. Updated.Cheesecloth

© 2022 - 2024 — McMap. All rights reserved.