Does not evaluating the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?
Asked Answered
M

3

15

First of all, I've seen this question about C99 and the accepted answer references operand is not evaluated wording in the C99 Standard draft. I'm not sure this answer applies to C++03. There's also this question about C++ that has an accepted answer citing similar wording and also In some contexts, unevaluated operands appear. An unevaluated operand is not evaluated. wording.

I have this code:

 int* ptr = 0;
 void* buffer = malloc( 10 * sizeof( *ptr ) );

The question is - is there a null pointer dereference (and so UB) inside sizeof()?

C++03 5.3.3/1 says The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id.

The linked to answers cite this or similar wording and make use of "is not evaluated" part to deduce there's no UB.

However I cannot find where exactly the Standard links evaluation to having or not having UB in this case.

Does "not evaluating" the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?

Maidenly answered 25/2, 2015 at 8:4 Comment(15)
@ParkYoung-Bae That doesn't really help much.Maidenly
@juanchopanza May I wonder where exactly that questions has an answer to mine? Thank you.Maidenly
@ParkYoung-Bae Seriously? Here's what it quotes: "In some contexts, unevaluated operands appear. An unevaluated operand is not evaluated."Maidenly
I re-opened, but I'm not sure the standard has to explicitly say an unevaluated operand does not cause UB.Flummery
@Flummery Sure it should do so, perhaps it is a chain of statements.Maidenly
Then it would be something about the chain of statements that causes UB.Flummery
IMVHO behavior (whether defined or not) takes place during execution, so can only be caused by executing a statement or evaluating an expression. An expression can not cause any behavior if it is not evaluated, like in if(0) { int*p; *p = 0;}.Lives
A better example would be sizeof (1/0). The standard is unclear on whether dereferencing a null pointer is defined, and consensus seems to be that it should be defined, although subsequently applying the lvalue-to-rvalue conversion would be undefined.Mesognathous
@hvd : but then §4.1 says : "When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5) the value contained in the referenced object is not accessed."Grantham
@SanderDeDycker Right. So it wouldn't demonstrate the problem the OP is asking about.Mesognathous
Well in C++03 in order to prove dereferencing a pointer lead to UB, you want to prove there is a lvalue to rvalue conversion, unfortunately, this aspect of the standard is under-specified but it is not a large leap to say if an expression is unevaluated then there is no lvalue-to-value conversion either.Deucalion
@ShafikYaghmour In C++03, [conv.lval]/2 states "When an lvalue-to-rvalue conversion occurs within the operand of sizeof the value contained in the referenced object is not accessed, since that operator does not evaluate its operand." C++11 contains a similar specification, but using the more general unevaluated operands instead of sizeof.Peon
@Peon I see that now, I did not have the time to look it up when I made the comment. Seems like it is saying that even though conceptually an lvalue-to-rvalue conversion is happening it does not count as an access but I am not totally sure. Which would be consistent with the logic used in issue 232Deucalion
Consider this: int* ptr = 0; if (0) *ptr; The expression *ptr is not evaluated because of the if, so there's no undefined behavior. Similarly, sizeof(*ptr) has no undefined behavior because *ptr is not evaluated (and the standard explicitlyi says it's not evaluated). I presume you find the if case unambiguous. Why is the sizeof case troubling?Calculation
@KeithThompson The subjective difference is that in the if example the statement is simply unreachable whereas in the sizeof example it is not necessarily unreachable.Maidenly
D
11

I believe this is currently underspecified in the standard, like many issues such as What is the value category of the operands of C++ operators when unspecified?. I don't think it was intentional, like hvd points outs it is probably obvious to the committee.

In this specific case I think we have the evidence to show what the intention was. From GB 91 comment from the Rapperswil meeting which says:

It is mildly distasteful to dereference a null pointer as part of our specification, as we are playing on the edges of undefined behaviour. With the addition of the declval function template, already used in these same expressions, this is no longer necessary.

and suggested an alternate expression, it refers to this expression which is no longer in the standard but can be found in N3090:

noexcept(*(U*)0 = declval<U>())

The suggestion was rejected since this does not invoke undefined behavior since it is unevaluated:

There is no undefined behavior because the expression is an unevaluated operand. It's not at all clear that the proposed change would be clearer.

This rationale applies to sizeof as well since it's operands are unevaluated.

I say underspecified but I wonder if this is covered by section 4.1 [conv.lval] which says:

The value contained in the object indicated by the lvalue is the rvalue result. When an lvalue-to-rvalue conversion occurs within the operand of sizeof (5.3.3) the value contained in the referenced object is not accessed, since that operator does not evaluate its operand.

It says the value contained is not accessed, which if we follow the logic of issue 232 means there is no undefined behavior:

In other words, it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior

This is somewhat speculative since the issue is not settled yet.

Deucalion answered 25/2, 2015 at 15:56 Comment(0)
A
6

Since you explicitly asked for standard references - [expr.sizeof]/1:

The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id.

[expr]/8:

In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7, 7.1.6.2). An unevaluated operand is not evaluated.

Because the expression (i.e. the dereferenciation) is never evaluated, this expression is not subject to some constraints that it would normally be violating. Solely the type is inspected. In fact, the standard uses null references itself in an example in [dcl.fct]/12:

A trailing-return-type is most useful for a type that would be more complicated to specify before the declarator-id:

template <class T, class U> auto add(T t, U u) -> decltype(t + u);

rather than

template <class T, class U> decltype((*(T*)0) + (*(U*)0)) add(T t, U u);

— end note ]

Amosamount answered 25/2, 2015 at 9:1 Comment(4)
I've cited all of the above in my question. There's a missing link between "evaluation" and causing behavior.Maidenly
Thanks. It looks reasonable, although I agree with sharptooth that it misses a step in the reasoning. It's possible that it's so obvious to the authors of the standard that they didn't bother spelling it out.Mesognathous
@hvd Well, I actually just jumped to the first answer and then directly started writing my own - I misunderstood what exactly the asker wanted a proof for (maybe I should start reading questions more thoroughly). And I do believe that you won't find a single sentence actually confirming that unevaluated code can do certain things without inducing UB.Amosamount
If the unevaluated operands "caused behaviour" then decltype(t + u) would cause UB because t and u are not initialized yet...Complementary
A
4

The specification only says that dereferencing some pointer that is NULL is UB. Since sizeof() is not a real function, and it doesn't actually use the arguments for anything other than getting the type, it never references the pointer. That's WHY it works. Someone else can get all the points for looking up the spec wording that states that "the argument to sizeof doesn't get referenced".

Note that it's also entirely legal to do int arr[2]; size_t s = sizeof(arr[-111100000]); too - it doesn't matter what the index is, because sizeof never actually "does anything" to the argument passed.

Another example to show how it's "not doing anything" would be something like this:

int func()
{
    int *ptr = reinterpret_cast<int*>(32);
    *ptr = 7;
    return 42;
}

size_t size = sizeof(func()); 

Again, this wouldn't crash, because func() is just resolved by the compiler to the type that it produces.

Equally, if sizeof actually "does something" with the argument, what would happen when you do this:

   char *buffer = new sizeof(char[10000000000]);

Would it create a 10000000000 stack allocation, then give the size back after it crashed the code because there isn't enough megabytes of stack? [In some systems, stack size is counted in bytes, not megabytes]. And whilst nobody writes code like that, you could easily come up with something similar using typedef of either buffer_type as an array of char, or some kind of struct with large content.

Archoplasm answered 25/2, 2015 at 8:49 Comment(14)
What I really miss is the Standard wording which backs this "doesn't do anything" claim.Maidenly
@Maidenly Isn't that what "not evaluated" means?Flummery
@Flummery I don't know and I cannot find the answer in The Standard and that's why I asked the question in the first place.Maidenly
So, are you doubting that what I'm saying is correct, or just doubting that the standard states it somewhere? Or somehow thinking that "Maybe, out there somewhere, is a compiler that generates code and accesses the content of the thing passed into sizeof"?Archoplasm
@MatsPetersson I simply cannot find the wording in the Standard. Yes, you claims sound reasonable but they are just your opinion unless they backed by Standard wording.Maidenly
Well, vote for the answer above that quotes the spec then, because it clearly says "unevalutated" - my reading of that is "there is no code generated from the expression itself", since that would just lead to completely impossible situations.Archoplasm
"Again, this wouldn't crash" -- Right. It can't crash if it doesn't compile. :) "error: invalid conversion from ‘int’ to ‘int*’ [-fpermissive]" (GCC) or "error: cannot initialize a variable of type 'int *' with an rvalue of type 'int'" (clang) Even if func is not called, it still has to compile in order for the program to be valid.Mesognathous
Edited to have a reinterpreted cat - eh - cast in there.Archoplasm
@MatsPetersson Won't fly - should be reinterpret_cast<int*>(32).Maidenly
Isn't that what I've just added? (Subject to spelling?)Archoplasm
@MatsPetersson No, you didn't include the parentheses.Mesognathous
Now you are being pedantic. It was meant as a "look at this silly code", not as "please compile and run this, it works".Archoplasm
Then just go back to the previous version of your answer. That was still wrong, but at least some compilers can be convinced to accept that.Mesognathous
My favorite example is sizeof(1/0), which yields sizeof(int) rather than crashing with a divide-by-zero :)Directrix

© 2022 - 2024 — McMap. All rights reserved.