Is the operand of `sizeof` evaluated with a VLA?
Asked Answered
R

3

27

An argument in the comments section of this answer prompted me to ask this question.

In the following code, bar points to a variable length array, so the sizeof is determined at runtime instead of compile time.

int foo = 100;
double (*bar)[foo];

The argument was about whether or not using sizeof evaluates its operand when the operand is a variable length array, making sizeof(*bar) undefined behavior when bar is not initialized.

Is it undefined behavior to use sizeof(*bar) because I'm dereferencing an uninitialized pointer? Is the operand of sizeof actually evaluated when the type is a variable length array, or does it just determine its type (how sizeof usually works)?


Edit: Everyone seems to be quoting this passage from the C11 draft. Does anyone know if this is the wording in the official standard?

Ruvalcaba answered 7/10, 2015 at 6:45 Comment(17)
Related post: Behavior of sizeof on variable length arrays (C only)Gothart
sample codeEllieellinger
Also see Is dereferencing null pointer valid in sizeof operation and the C++ version here.Like
@Ellieellinger The problem though is that if it is undefined behavior, it still might behave as you'd expect (even across platforms and compilers).Ruvalcaba
@ShafikYaghmour Both of those deal with the compile time evaluation of sizeof, not the runtime evaluation.Ruvalcaba
It does not need to dereference. So it is not UB.Ellieellinger
@Ellieellinger No, it doesn't need to, but the question is really about whether it's undefined behavior according to the standard.Ruvalcaba
Pointer does not make sense to be actually dereference because it has the information of the type pointed to object. need evaluate meant when need evaluate foo when sizeof(double[foo])Ellieellinger
@Ellieellinger It might not make sense to, but that's what the standard currently says.Ruvalcaba
Note that the part to be evaluated is the focus.Ellieellinger
E.g check the type of *bar => type is double[foo] => evaluate foo(the operand is evaluated. However foo thing of the time it was declared).Ellieellinger
@Ellieellinger The argument is really about what the standard says, not what practically happens.Ruvalcaba
Why Can you speak about the standard? It has no effect to the compiler to determine the type to actually run a de-reference. sizeof(*bar) and sizeof(double[foo]) that's the same I only have to say.Ellieellinger
"The size is determined from the type of the operand.” => type is double[foo] => "If the type of the operand is a variable length array type, the operand is evaluated; " This is the same as sizeof(double[foo]).it is necessary to evaluate the foo. no need to actually dereference In the first process.Ellieellinger
5.1.2.3/4: "In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."Ellieellinger
@BLUEPIXY: Yeah, that's why it definitely works in practice on compilers that aren't evil and/or totally stupid. But apparently a DeathStation 9000 C implementation might be allowed to break code that depended on it, because of the way the C standard words it.Preparative
There is a proposal that address this issue. See www9.open-std.org/JTC1/SC22/WG14/www/docs/n2838.htmAnhwei
C
16

Yes, this causes undefined behaviour.

In N1570 6.5.3.4/2 we have:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Now we have the question: is the type of *bar a variable length array type?

Since bar is declared as pointer to VLA, dereferencing it should yield a VLA. (But I do not see concrete text specifying whether or not it does).

Note: Further discussion could be had here, perhaps it could be argued that *bar has type double[100] which is not a VLA.

Supposing we agree that the type of *bar is actually a VLA type, then in sizeof *bar, the expression *bar is evaluated.

bar is indeterminate at this point. Now looking at 6.3.2.1/1:

if an lvalue does not designate an object when it is evaluated, the behavior is undefined

Since bar does not point to an object (by virtue of being indeterminate), evaluating *bar causes undefined behaviour.

Cabin answered 7/10, 2015 at 6:59 Comment(21)
I am still confused, is the type of *bar VLA? Why would it be evaluated if it's type is not VLA? Can you clarify that?Menke
@iharob well, double (*bar)[foo]; declares bar as pointer to VLA, therefore dereferencing it should give a VLA. It'd be good if we could find concrete text to confirm thatCabin
So this means that if one does this: int vla[size]; int *pointer = vla; printf("%zu\n", sizeof(*pointer)); will not be undefined but printf("%zu\n", sizeof(*vla)); will?Menke
I completely misunderstood this as you can see from the comment above ... Sorry.Menke
after int *pointer, the type of pointer is int * which is not a VLA type . *vla has type int which is not a VLA type, so that is not evaluated either.Cabin
@Cabin In my simple case, the compiler may even have optimized *bar to be double[100], but there are probably more complex cases where that wouldn't happen.Ruvalcaba
I see you point now. However, I still see no reason to evaluate the operand. C does not store the length of an array in the array, so the compiler has to keep track of the size elsewhere anyway. For int *i; sizeof(*i), there is also no need to evaluate *i and that is a well accepted construct. I more suspect some kind of malphrasing: "the operand is evaluated" should be "its operand ..." (or the like. Note that gcc actually does behave as I would expect.) Just a simple thought: Why would it not suffice only to evaluate the index of the operand, instead of the whole operand?Goldplate
Note that by "evaluate I mean dereferencing (which includes an access), i.e. runtime code. Of course the compiler has to parse the operand to detect the VLA, as much as for a struct pointer. But this would not be a problem; just the access due to dereferencing would be.Goldplate
Curious discussion, I haven't ever considered that this could be UB. Reading 6.7.6 declarators, there's this statement: A full declarator is a declarator that is not part of another declarator. The end of a full declarator is a sequence point. If, in the nested sequence of declarators in a full declarator, there is a declarator specifying a variable length array type, the type specified by the full declarator is said to be variably modified. Furthermore, any type derived by declarator type derivation from a variably modified type is itself variably modified.Theis
If I understand that correctly, the pointer in this case would be regarded as the type "pointer to a VLA".Theis
@Olaf "evaluate" and "dereference" are different; e.g. if p is a pointer than p; evaluates p but doesn't dereference it. The difference may be more apparent if you replace p with a function that returns a pointer. Then the function is called.Cabin
@Theis Yeah. It still seems unclear to me exactly what the type of *bar is. Are double[100] (vla) and double[100] (not vla) different types?Cabin
@Cabin According to the cited text, those two array declarations are definitely different types. For the type system to be consistent, I would then assume that an array pointer to a variably modified array is a type of its own.Theis
@M.M: They have to. It becomes more clear with a 2D VLA. Here, the inner dimension is required to multiply the index of the outer dim. For most architectures, there has to be different code generated and different metadata has to be stored. Actually, also the non-VLAs int a[3], b[5] are different types. If boundary checking is available (C11 optional) or for such the compiler also has to store the size for later sizeof.Goldplate
@M.M: If we consider "evaluate" does not imply dereferencing it, then where is the problem? It is just the dereferencing which invokes UB. Note that if you have something like sizeof(f(4)), f need not be called either. It is just the return-type which is given with the declaration of f. Note that the expression (i.e. F(4)) is not evaluated according to the cited paragraph. And 6.5.3.4p4 does not exclude VLAs either.Goldplate
@Olaf evaluating f does not dereference f, but evaluating *f doesCabin
@M.M: With the same explanation int *f; sizeof(*f); would have to. But that is perfectly legal according to the cited paragraph and not evaluated.Goldplate
@Olaf I don't know why you keep bringing int *f; sizeof(*f); up. sizeof does not evaluate the operand when the operand is not a VLA, so *f is not evaluated there.Cabin
@M.M: I know that very well! And exactly that is the question: Why would these two cases have to be treated differently? The VLA expression has to be evaluated, but not the (nested) expression yielding the VLA: *f need not be evaluated, but only parsed to get the resulting type (as for the int *), which is a VLA, which then has to be evaluiated (i.e. the index). If *f would not be evaluated, there was no UB.Goldplate
*f yields a VLA. "parsed to get the resulting type" is not part of the standard. Either *f is evaluated or it isn't, there is no middle ground. The two cases are treated differently because the standard says that there are two different cases (VLA and not-VLA).Cabin
I think that this is the correct answer. If we go by the letter of the standard, this behavior is technically undefined.Ruvalcaba
B
16

Two other answers have already quoted N1570 6.5.3.4p2:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

According to that paragraph from the standard, yes, the operand of sizeof is evaluated.

I'm going to argue that this is a defect in the standard; something is evaluated at run time, but the operand is not.

There are proposals to fix this:

Let's consider a simpler example:

int len = 100;
double vla[len];
printf("sizeof vla = %zu\n", sizeof vla);

According to the standard, sizeof vla evaluates the expression vla. But what does that mean?

In most contexts, evaluating an array expression yields the address of the initial element -- but the sizeof operator is an explicit exception to that. We might assume that evaluating vla means accessing the values of its elements, which has undefined behavior since those elements have not been initialized. But there is no other context in which evaluation of an array expression accesses the values of its elements, and absolutely no need to do so in this case. (Correction: If a string literal is used to initialize an array object, the values of the elements are evaluated.)

When the declaration of vla is executed, the compiler will create some anonymous metadata to hold the length of the array (it has to, since assigning a new value to len after vla is defined and allocated doesn't change the length of vla). All that has to be done to determine sizeof vla is to multiply that stored value by sizeof (double) (or just to retrieve the stored value if it stores the size in bytes).

sizeof can also be applied to a parenthesized type name:

int len = 100;
printf("sizeof (double[len]) = %zu\n", sizeof (double[len]));

According to the standard, the sizeof expression evaluates the type. What does that mean? Clearly it has to evaluate the current value of len. Another example:

size_t func(void);
printf("sizeof (double[func()]) = %zu\n", sizeof (double[func()]));

Here the type name includes a function call. Evaluating the sizeof expression must call the function.

But in all of these cases, there's no actual need to evaluate the elements of the array object (if there is one), and no point in doing so.

sizeof applied to anything other than a VLA can be evaluated at compile time. The difference when sizeof is applied to a VLA (either an object or a type) is that something has to be evaluated at run time. But the thing that has to be evaluated is not the operand of sizeof; it's just whatever is needed to determine the size of the operand, which is never the operand itself.

The standard says that the operand of sizeof is evaluated if that operand is of variable length array type. That's a defect in the standard.

Getting back to the example in the question:

int foo = 100;
double (*bar)[foo] = NULL;
printf("sizeof *bar = %zu\n", sizeof *bar);

I've added an initialization to NULL to make it even clearer that dereferencing bar has undefined behavior.

*bar is of type double[foo], which is a VLA type. In principle, *bar is evaluated, which would have undefined behavior since bar is uninitialized. But again, there is no need to dereference bar. The compiler will generate some code when it processes the type double[foo], including saving the value of foo (or foo * sizeof (double)) in an anonymous variable. All it has to do to evaluate sizeof *bar is to retrieve the value of that anonymous variable. And if the standard were updated to define the semantics of sizeof consistently, it would be clear that evaluating sizeof *bar is well defined and yields 100 * sizeof (double) without having to dereference bar.

Byline answered 7/10, 2015 at 15:36 Comment(20)
This is an interesting point of view, but can you really argue that the standard is incorrect? I mean, isn't the standard what makes C, C? You could argue certain laws aren't justified or even really practiced, but it's still the law.Ruvalcaba
@PCLuddite: My argument is that this is a defect in the standard. Taken literally, it imposes a requirement that is not necessary, that makes no sense, and that may be impossible in some cases (there's no definition of what it means to "evaluate" a type name), and that I'm reasonably sure does not reflect the intent of the authors. See here for a list of defect reports against the C11 standard. In any case, I'll re-phrase my answer slightly to say it's a defect rather than that it's "incorrect".Byline
"changing the value of len doesn't change the length of vla"? You mean "does". Thanks for the text. It actually reflects what I think, but could not have expressed it in English so good. Problem is the standard should have to differentiate between run-time (i.e. accessing items) and compiler-time evaluation (i.e. parsing to get the "meta-data", e.g. type and index-expression. There is an interesting information in gcc doc: "The length of an array is computed once ... remembered ... sizeof.".Goldplate
@PCLuddite: That would be neither the first nor the lat paragraph which has to be corrected by a corrigendum. Just read the WG14 documents released after release of the C11 standard.Goldplate
@Olaf: No, I mean that changing the value of len doesn't change the length of vla. I meant that given int len = 100; int vla[len]; len = 200;, changing the value of len to 200 doesn't affect the length of vla. I've updated my answer to clarify that.Byline
I see now. Yes, that is exactly what gcc does it according to the documentation (sorry, had to shorten quite a lot to fit into a comment) and which looks like the most reasonable approach. I wonder if there is a way to send a link to this to the C WG14 standardisation commitee. I did not find any request for clarification for this (they are quite hidden anyway).Goldplate
@Olaf True, but until a correction is released, it's still standard.Ruvalcaba
@Olaf: I started a discussion on this point on the comp.std.c newsgroup in 2012; the Google Groups archive is here. It didn't really go anywhere -- and the C standard committee doesn't have any official affiliation with comp.std.c anyway.Byline
@PCLuddite: Honestly: If the standard contradicts common sense and I use a compiler which actually does follow, is widely used, well supported and does not lock me to a specific vendor, I do not wait until the commitee gets things straight they had >16 years (or 12 years, counting time between the last two releases) time (that was the polite version of my actual thoughts). Panta rhei.Goldplate
It seems to me that the root of the defect is : what is the type of *bar ? (or any expression whose type is an array that was declared as a VLA, but does not actually name that array).Cabin
@M.M: I disagree. The type of *bar is a VLA. The type is not bound to the name, but the object. But in C, this binding is not stored with the object itself, but only known to the compiler, so there is never need to access the object itself to get its type, but that can always be deduced by parsing the expression and using the sotred meta-data. And this is where the defect is: "evaluate" here should mean "parsing" or "evaluate to get the type", not read the value. Only the (unnecessary) latter actually constitutes UB.Goldplate
I was about to post the very question of whether it is really UB to use, say sizeof vla / sizeof vla[0] when vla[] contains only indeterminate values, when I came across this answer. It seems that the committee still hasn't seen fit to address this in the C17 Standard.Selfcommand
Why is *bar "of type int[foo]" if it is being declared as double *bar[foo]? Would the type be double[foo]?Mccrary
@OmarDarwish: Yes. I've fixed it. Thanks for finding this error.Byline
I guess the intention of the standard was to address expressions like sizeof(int[fun()]), not typedef int X[foo()]; sizeof(X); X x; sizeof x;. AFAIK there are no VLA compound literals. Thus it is not possible to form an expression within which VLA type is defined, am I right? If so, then the evaluation is pointless in whenever sizeof expression is used because expressions that impact size of VLA are only evaluated when a VLA type is introduced.Anhwei
@Anhwei sizeof(int[fun()]) is an expression within which a VLA type is defined. You're right, there are no VLA compound literals. gcc, clang, and even tcc handle your example typedef int X[foo()]; sizeof(X); X x; sizeof x; correctly. I would argue they do so by doing what the standard should have said, not what it actually does say. (foo() is called when the typedef is reached.)Byline
@KeithThompson, by the expression I meant the operand of sizeof. int[fun()] has no value, it's not even an expression, is it? After a bit of research, I found *(int(*)[fun()])XXX. A pointer XXX is cast to newly introduced VM type and dereferenced. To make things even more confusing GCC evaluates *(int(*)[fun()])XXX while not evaluating (int(*)[fun()])XXX and &*(int(*)[fun()])XXX. It makes 6.5.3.4p2 even a larger non-sense.Anhwei
There is a proposal that addresses this issue. It must be a defect in the standard. See www9.open-std.org/JTC1/SC22/WG14/www/docs/n2838.htmAnhwei
@Anhwei See also open-std.org/jtc1/sc22/wg14/www/docs/n3187.htm (I've added both links to my answer).Byline
Wow. Thank you for the link. I will read it carefully. It is great that the issue is noticed and is addressed by the committee.Anhwei
M
5

Indeed the Standard seems to imply that behaviour be undefined:

re-quoting N1570 6.5.3.4/2:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

I think the wording from the Standard is confusing: the operand is evaluated does not mean that *bar will be evaluated. Evaluating *bar does not in any way help compute its size. sizeof(*bar) does need to be computed at run time, but the code generated for this has no need to dereference bar, it will more likely retrieve the size information from a hidden variable holding the result of the size computation at the time of bar's instantiation.

Meliamelic answered 7/10, 2015 at 8:30 Comment(1)
I agree. Note that gcc seems to behave as expected (once more).Goldplate

© 2022 - 2024 — McMap. All rights reserved.