Is it valid to use a zero-sized non-static array function parameter?
Asked Answered
S

3

11

Is it valid, according to ISO C (any version), to specify a zero-sized array parameter?

The standard seems ambiguous. While it's clear that zero-sized arrays are invalid, array function parameters are special:

C23::6.7.6.3/6:

A declaration of a parameter as "array of type" shall be adjusted to "qualified pointer to type", where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression.

As long as you don't use static, the size specified between [] is effectively ignored. As I understand the quoted paragraph, the compiler isn't allowed to make any suppositions at all about the pointer.

So, the following code should be conforming, right?

void h(char *start, char past_end[0]);

#define size 100
void j(void)
{
        char dst[size];
        h(dst, dst+size);
}

I use past_end[0] as a sentinel pointer to one-past-the-end (instead of a size; it's much more comfortable in some cases). The [0] clearly tells that it's one past the end, and not the actual end, which as a pointer, readers might confuse. The end would be marked as end[1], to be clear.

GCC thinks it's not conforming:

$ gcc -Wall -Wextra -Wpedantic -pedantic-errors -std=c17 -S ap.c 
ap.c:1:26: error: ISO C forbids zero-size array ‘past_end’ [-Wpedantic]
    1 | void h(char *start, char past_end[0]);
      |                          ^~~

Clang seems to agree:

$ clang -Wall -Wextra -Wpedantic -pedantic-errors -std=c17 -S ap.c 
ap.c:1:30: warning: zero size arrays are an extension [-Wzero-length-array]
void h(char *start, char past_end[0]);
                                  ^
1 warning generated.

If I don't ask for strict ISO C, GCC still warns (differently), while Clang relaxes:

$ cc -Wall -Wextra -S ap.c 
ap.c: In function ‘j’:
ap.c:7:9: warning: ‘h’ accessing 1 byte in a region of size 0 [-Wstringop-overflow=]
    7 |         h(dst, dst+size);
      |         ^~~~~~~~~~~~~~~~
ap.c:7:9: note: referencing argument 2 of type ‘char[0]’
ap.c:1:6: note: in a call to function ‘h’
    1 | void h(char *start, char past_end[0]);
      |      ^
ap.c:7:9: warning: ‘h’ accessing 1 byte in a region of size 0 [-Wstringop-overflow=]
    7 |         h(dst, dst+size);
      |         ^~~~~~~~~~~~~~~~
ap.c:7:9: note: referencing argument 2 of type ‘char[0]’
ap.c:1:6: note: in a call to function ‘h’
    1 | void h(char *start, char past_end[0]);
      |      ^
$ clang -Wall -Wextra -S ap.c 

I reported this to GCC, and there seems to be disagreement:

https://gcc.gnu.org/pipermail/gcc/2022-December/240277.html

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108036

Is there any requirements to comply with the same requirements as for arrays?

Moreover, if that proves to be true, what about the following?:

void f(size_t sz, char arr[sz]);

Is the compiler entitled, under strict ISO C mode, to assume that the array will always have at least one element, even if I didn't use static, just because I used array syntax? If so, that would probably be a regression in the language.

Scincoid answered 12/12, 2022 at 23:47 Comment(14)
"The [0] clearly tells that it's one past the end" Why? It's quite unclear and not self-documenting at all. Rather, the "de facto standard" way to deal with iterations from start to end is to have the item named end and then point 1 item beyond the array. Similarly, the endptr of strtol family points 1 item beyond the valid string, the C++ iterators use end to point 1 item beyond the array, and so on.Tavia
Idiomatic form of a "start/end" loop would be: for(const type* i = start; i != end; i++). Which requires end to point 1 item beyond the array. This is AFAIK the very reason why C allows us to point 1 item beyond an array, as long as we don't dereference that location.Tavia
I use past_end[0] as a sentinel pointer to one-past-the-end (instead of a size; it's much more comfortable in some cases). What? That'd confuse me. It's very idiomatic to pass an array with its size to a function..Roily
@marco-a chaining string-copy functions that truncate, while deferring truncation detection to after all chained calls, can only be done with pointers (okay, you can do it with sizes, but I dare you write it and be readable and non-error-prone). See: <software.codidact.com/posts/285946/287522#answer-287522>Scincoid
@Tavia Yes, the de-facto standard is to use end to refer to the last valid pointer in many cases. However, I also see a lot of code that uses end to refer to the last byte. That inconsistency is too much inconsistent to my taste, and in fact I found bugs in a code base where a given function was implemented with end meaning past_end, while at call site it was being passed the actual end; off-by-one, you can guess. I wanted to use unambiguous syntax to fix such inconsistent source of bugs.Scincoid
@Tavia re: "The [0] clearly tells that it's one past the end" Why?: Because a valid pointer that has no addressable storage necessarily has to be one past the end of the array; otherwise it has at least 1 element (assuming non-null). If you have a pointer [3], means that there are 3 remaining elements; [1] means one remaining element; [0] is after the last element. And the name helps. Of course if you haven't seen if before, it might be a bit surprising, and will cause a WTF moment, but after that small learning curve, it can be very informative.Scincoid
@alx I don't know where you're going with this. chaining string-copy functions that truncate, while deferring truncation detection to after all chained calls, can only be done with pointers this presumes that this kind of logic is valid to begin with, which I can't validate. In my personal opinion, I don't think it's clean to pass the end of an array to a function. Your linked post talks about an "improved" string copy function (which, you (conveniently) wrote yourself, by the way) that isn't adding any benefit to existing functions such as strncpy.Roily
@alx Why would I want to chain function calls in a language that wasn't designed for that use case. You chain function calls in a language like JavaScript or PHP, but not C. My hair stands up by even beginning to think about "chaining" function calls in C - ugh.Roily
@alx In case you don't know how it's done: you call a function, you validate if that function call executed properly and then you do the next thing. This is how things are done in C.Roily
@marco-a strncpy(3)? That's a function designed to write to fixed-width buffers such as utmpx(5). It's been long misused as if it were strlcpy(3), but it's not, and it's a source of bugs when used that way (I hope it's not necessary to quote anything here). Anyway, I'll quote a discussion in GCC (in which I participate, for full disclosure), just in case: <lore.kernel.org/linux-man/…>. BTW, I just fixed several such bugs today in the shadow package.Scincoid
re: Yes, I designed the stpecpy() function. But @Tavia reviewed it and helped improve it; and so did other programmers I know (in private). You find issues with that function? I invite you to discuss them in that forum. I'm open to improvement. In fact, I'm about to post a minor improvement to accept NULL for allowing chaining with a variant of snprintf(3) (underlying issue there was that snprintf(3) uses int, for the curious). That was talked in an NGINX discussion to fix some cases of UB while calling snprintf(3).Scincoid
@marco-a re: C is not designed to chain copy functions: That's far from the true. Would you mind explaining the useless return value of strcpy(3) and strcat(3)? It was for this exact line of code: strcat (strcpy (d, s1), s2); This line of code has been literally copied from an ISO C document: <open-std.org/JTC1/SC22/WG14/www/docs/n2349.htm>. And of course, such code goes back to K&R (I don't have that book handy to quote it, though).Scincoid
@alx Yeah, it's prefixed with The idiomatic (though far from ideal) way to append two strings is by calling the strcpy and strcat functions as follows lol.Roily
strcat (strcpy (d, s1), s2); I don't write code like that, neither should anyone else.Roily
H
17

Is it valid, according to ISO C (any version), to specify a zero-sized array parameter?

The C standard is clear. C 2018 6.7.6.2 1, which is a constraints paragraph, says:

In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero…

Since it is a constraint, the compiler must issue a diagnostic message about it. However, as with most things in C, the compiler may accept it anyway.

The adjustment of an array parameter to a pointer is irrelevant; that must come later, since a declaration of a parameter to a pointer cannot be adjusted until there is a declaration to adjust. So the declaration has to be parsed, and it is subject to the constraints of that.

The [0] clearly tells that it's one past the end, and not the actual end, which as a pointer, readers might confuse.

You might use it to tell human readers that, but it does not mean that to the compiler. [static n] tells the compiler there are at least n elements, and [n] has even less meaning than that. It is valid to pass a subarray to a function—to pass a pointer into the middle of an array with the function intended to be used only to access a subset of the array reaching neither to the start or the end of the original array, so, even if [0] were accepted, it would not necessarily mean the pointer is pointing to the end of the array. It would be valid to point anywhere into an array.

Herrod answered 13/12, 2022 at 0:6 Comment(1)
I know [0] is ignored by the compiler. I use it for human readers, as you guessed. :)Scincoid
C
6

This is an interesting corner case.

Section 6.7.6.2p1 of the C11 standard specifying a constraint for array declarators states:

In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.

What you've shown constitutes a constraint violation which requires a warning / error and is considered undefined behavior. But at the same time, because you have an array declared as a parameter to a function, the array gets adjusted to pointer type as per the passage you stated.

Strictly speaking, what the compiler is showing is correct, and in fact it's required to do so in order to be a strictly conforming implementation, however I'd have a hard time arguing that it makes sense for a compiler to reject a program that has char past_end[0] as a function parameter when it is equivalent to char *past_end.

Cove answered 13/12, 2022 at 0:10 Comment(11)
I think at some language lawyer discussion it was established that an array in a parameter list has to be a valid array before the "array adjustment" rule takes place. For example int x[][] gets rejected since this is an array of incomplete element type, the same section (array declarators) that you quoted here applies for that scenario as well. "however I'd have a hard time arguing that it makes sense for a compiler to reject such a program" Like in the other examples, it will reject int x[][0]. How is it otherwise going to perform pointer arithmetic on the decayed type int (*)[0]?Tavia
@Tavia For the int x[][0] example, yes it makes sense that it is invalid as the zero-size array portion isn't adjusted. It's the int x[0] case being adjusted to int * that seems questionable.Cove
To make any sense of it, the compiler must first determine if there is a valid declaration. The text describing parameter adjustment even says (6.7.5.3) "A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to type’’". Meaning that the declaration part must first be examined.Tavia
@Lundin: In contexts where a compiler wouldn't have to know or care about the size of an array type, what is gained by requiring that either the size or a dummy placeholder be specified and even going so far as to require that compilers recognize a special syntax for a dummy placeholder?Menon
@Menon To make the compiler behave consistently for when it comes to cases where it matters, such as the int[][0] example.Tavia
@Lundin: For what practical purposes would a compiler that extends the language to accept such constructs be less useful than one which doesn't (and may perhaps be slower because of it)?Menon
@Menon Try to explain how pointer arithmetic on the type int(*)[0] should reasonably behave and then you'll have the answer to that question.Tavia
@Lundin: Given that C allows pointers to incomplete types, I don't see zero-sized types as posing any problem. In order for ptr1-ptr2 to be valid, the pointers must have a non-zero type, and evaluation of ptr1+integer could only be usefully executed in cases where ptr1 had a non-zero size, but being able to have array declarations collapse to zero size if they arrays will sometimes be needed and sometimes not (e.g. char padding[sizeof (blob) - 64];) would be more useful than requiring that programmers jump through hoops to achieve such effects. Further, in cases where...Menon
...no possible value an expression could yield would have any possible impact on program behavior, requiring that a compiler evaluate the expression anyhow necessitates extra work for no useful purpose.Menon
@Tavia you (and the standard) are mixing pointer with arrays unnecessarily. The question of having arrays of zero-sized arrays or pointers to them, and how they behave regarding pointer arithmetic, is unrelated to this topic. IMO, 0-sized arrays would help reduce conditional code that adds padding when it's non-zero (as #supercat mentioned) and a few other cases, but that's unrelated to array syntax for parameters.Scincoid
As @Menon said, array syntax for pointer parameters is effectively just syntactic sugar, and I see no reason at all why the compiler should treat it any differently. Adding superfluous checks for information that will be discarded is unnecessary. It may be done for simplifying compilers, so it may be something that has to be a constraint violation by ISO C, but then defined by quality implementations like GCC...Scincoid
M
2

An important thing to understand about the C89 Standard (which is also relevant when looking at subsequent versions) is that it involved compromises between the compile writers who didn't want to change the behavior of existing compilers that would reject certain constructs, and programmers who used other compilers that would accept those constructs, who didn't want to have to change their code.

In many such situations, the compromise that was reached was that the Standard would impose a constraint that would require that conforming compilers issue a diagnostic, but compilers whose customers would regard the constraint as silly would be free to accept the code after the diagnostic was issued. If the programmers would be satisfied with their code being "conforming", rather than "strictly conforming", they could then proceed to ignore the constraint if both they and the authors of their compiler thought it was silly.

There's a reason that the flag which enables warnings about zero-sized arrays is named "-pedantic". The authors of gcc recognized that the language would be better off without the constraint, but they provided an option to output a diagnostic in case it was violated, so as to satisfy the constraints demanded by pedants.

Menon answered 13/12, 2022 at 23:25 Comment(2)
However, GCC still warns in non-pedantic mode (it thinks there can be buffer overflow). When I suggested that this was a bug in GCC, there were different opinions. Some maintainers acknowledged the bug, while others (including a member of WG14) said that 0-sized arrays are not expected to be used for that, and the compiler is entitled to warn. We'll see how the bug report evolves...Scincoid
@alx: The Committee generally makes no attempt to anticipate purposes for which a construct might be used in non-portable programs, and compilers are given blanket permission to warn about anything they see fit. While I think there has long been a need for a document that would describe common language extensions, and provide a means of distinguishing implementations that support them from those that do not, the Standard has never sought to be such a document.Menon

© 2022 - 2024 — McMap. All rights reserved.