Why do constant expressions have an exclusion for undefined behavior?
Asked Answered
H

3

52

I was researching what is allowed in a core constant expression*, which is covered in section 5.19 Constant expressions paragraph 2 of the draft C++ standard which says:

A conditional-expression is a core constant expression unless it involves one of the following as a potentially evaluated subexpression (3.2), but subexpressions of logical AND (5.14), logical OR (5.15), and conditional (5.16) operations that are not evaluated are not considered [ Note: An overloaded operator invokes a function.—end note ]:

and lists out the exclusions in the bullets that follows and includes (emphasis mine):

an operation that would have undefined behavior [ Note: including, for example, signed integer overflow (Clause 5), certain pointer arithmetic (5.7), division by zero (5.6), or certain shift operations (5.8) —end note ];

Huh? Why do constant expressions need this clause to cover undefined behavior? Is there something special about constant expressions that requires undefined behavior to have a special carve out in the exclusions?

Does having this clause give us any advantages or tools we would not have without it?

For reference, this looks like the last revision of the proposal for Generalized Constant Expressions.

Heathen answered 23/1, 2014 at 21:12 Comment(3)
I think any implementation is free to accept ill-formed programs (that then have undefined behaviour). I.e. an implementation could accept constant expressions with UB as an extension (e.g. with implementation-defined semantics instead of UB). By specifying the program as ill-formed, a diagnostic message is required from the implementation.Cavin
@dyp: But the reason that diagnostics aren't required for UB in general is that it can be very difficult to diagnose.Skewbald
@Cavin the standard definitely is not clear on that, although the comment by Howard I linked to seems to indicate it is the expected behavior.Heathen
H
43

The wording is actually the subject of defect report #1313 which says:

The requirements for constant expressions do not currently, but should, exclude expressions that have undefined behavior, such as pointer arithmetic when the pointers do not point to elements of the same array.

The resolution being the current wording we have now, so this clearly was intended, so what tools does this give us?

Let's see what happens when we try to create a constexpr variable with an expression that contains undefined behavior, we will use clang for all the following examples. This code (see it live):

constexpr int x = std::numeric_limits<int>::max() + 1 ;

produces the following error:

error: constexpr variable 'x' must be initialized by a constant expression
    constexpr int x = std::numeric_limits<int>::max() + 1 ;
                  ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: value 2147483648 is outside the range of representable values of type 'int'
    constexpr int x = std::numeric_limits<int>::max() + 1 ;
                                       ^

This code (see it live):

constexpr int x = 1 << 33 ;  // Assuming 32-bit int

produces this error:

error: constexpr variable 'x' must be initialized by a constant expression
    constexpr int x = 1 << 33 ;  // Assuming 32-bit int
             ^   ~~~~~~~
note: shift count 33 >= width of type 'int' (32 bits)
    constexpr int x = 1 << 33 ;  // Assuming 32-bit int
                  ^

and this code which has undefined behavior in a constexpr function:

constexpr const char *str = "Hello World" ;      

constexpr char access( int index )
{
    return str[index] ;
}

int main()
{
    constexpr char ch = access( 20 ) ;
}

produces this error:

error: constexpr variable 'ch' must be initialized by a constant expression
    constexpr char ch = access( 20 ) ;
                   ^    ~~~~~~~~~~~~

 note: cannot refer to element 20 of array of 12 elements in a constant expression
    return str[index] ;
           ^

Well that is useful the compiler can detect undefined behavior in constexpr, or at least what clang believes is undefined. Note, gcc behaves the same except in the case of undefined behavior with right and left shift, gcc will usually produce a warning in these cases but still sees the expression as constant.

We can use this functionality via SFINAE to detect whether an addition expression would cause overflow, the following contrived example was inspired by dyp's clever answer here:

#include <iostream>
#include <limits>

template <typename T1, typename T2>
struct addIsDefined
{
     template <T1 t1, T2 t2>
     static constexpr bool isDefined()
     {
         return isDefinedHelper<t1,t2>(0) ;
     }

     template <T1 t1, T2 t2, decltype( t1 + t2 ) result = t1+t2>
     static constexpr bool isDefinedHelper(int)
     {
         return true ;
     }

     template <T1 t1, T2 t2>
     static constexpr bool isDefinedHelper(...)
     {
         return false ;
     }
};


int main()
{    
    std::cout << std::boolalpha <<
      addIsDefined<int,int>::isDefined<10,10>() << std::endl ;
    std::cout << std::boolalpha <<
     addIsDefined<int,int>::isDefined<std::numeric_limits<int>::max(),1>() << std::endl ;
    std::cout << std::boolalpha <<
      addIsDefined<unsigned int,unsigned int>::isDefined<std::numeric_limits<unsigned int>::max(),std::numeric_limits<unsigned int>::max()>() << std::endl ;
}

which results in (see it live):

true
false
true

It is not evident that the standard requires this behavior but apparently this comment by Howard Hinnant indicates it indeed is:

[...] and is also constexpr, meaning UB is caught at compile time

Update

Somehow I missed Issue 695 Compile-time calculation errors in constexpr functions which revolves over the wording of section 5 paragraph 4 which used to say (emphasis mine going forward):

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression appears where an integral constant expression is required (5.19 [expr.const]), in which case the program is ill-formed.

and goes on to say:

intended as an acceptable Standardese circumlocution for “evaluated at compile time,” a concept that is not directly defined by the Standard. It is not clear that this formulation adequately covers constexpr functions.

and a later note says:

[...]There is a tension between wanting to diagnose errors at compile time versus not diagnosing errors that will not actually occur at runtime.[...]The consensus of the CWG was that an expression like 1/0 should simply be considered non-constant; any diagnostic would result from the use of the expression in a context requiring a constant expression.

which if I am reading correctly confirms the intention was to be able to diagnose undefined behavior at compile time in the context requiring a constant expression.

We can not definitely say this was the intent but is does strongly suggest it was. The difference in how clang and gcc treat undefined shifts does leave some room for doubt.

I filed a gcc bug report: Right and left shift undefined behavior not an error in a constexpr. Although it seems like this is conforming, it does break SFINAE and we can see from my answer to Is it a conforming compiler extension to treat non-constexpr standard library functions as constexpr? that divergence in implementation observable to SFINAE users seems undesirable to the committee.

Heathen answered 23/1, 2014 at 21:12 Comment(6)
Ask and answering your own question in the same minute??Nydianye
@ZacHowland please see Can I answer my own questions, even if I knew the answer before asking?Heathen
Unfortunately, what you have appears to be a lucky synergy, and not actually a rationale for excluding undefined behaviors from constexpr evaluation in the first place and therefore doesn't answer the question you asked. (no rationale is provided in the DR either)Skewbald
@BenVoigt I reworded my questions slightly, in my mind I had asked not only why but also what did it gain us to have it and I realized that I left that out of my question was it was originally worded.Heathen
Yes, the SFINAE synergy looks quite desirable, I just doubt anyone realized it when proposing the rule.Skewbald
Given extern int foo[];, should a compiler be forbidden from treating foo+3-foo as a constant expression except when foo contains three or more elements? What if the compiler has no way of knowing?Optometrist
S
22

When we talk about undefined behavior, it is important to remember that the Standard leaves the behavior undefined for these cases. It does not prohibit implementations from making stronger guarantees. For example some implementations may guarantee that signed integer overflow wraps around, while others may guarantee saturation.

Requiring compilers to process constant expressions involving undefined behavior would limit the guarantees that an implementation could make, restricting them to producing some value without side effects (what the Standard calls indeterminate value). That excludes a lot of the extended guarantees found in the real world.

For example, some implementation or companion standard (i.e. POSIX) may define the behavior of integral division by zero to generate a signal. That's a side effect which would be lost if the expression were calculated at compile-time instead.

So, these expressions are rejected at compile-time to avoid loss of side effects in the execution environment.

Skewbald answered 23/1, 2014 at 22:27 Comment(12)
+1 thank you, I was hoping to get a answer that actually went into the rationale. I spent a lot of time trying to find that rationale last night but I don't think there is anything public that covers it.Heathen
Sadly I don't know if this is the rationale considered by the committees. But it is why I agree with the rule.Skewbald
Is there any reason that the Standard couldn't/shouldn't be written that if an expression is used in a place requiring a compile-time integer constant, then even in cases where the Standard would not otherwise impose any requirements regarding the behavior of that expression, the compiler must either define the behavior of that expression as yielding an integer value with no side-effects, or must issue a diagnostic? So given e.g. int foo=-1<<4; a compiler which defined (-x) << y as equivalent to -(x<<y) could silently yield int foo=-16; but one where that shift would trap...Optometrist
...must yield a diagnostic? In cases where a compiler would regard an expression as having a well-defined meaning, it would seem annoying to forbid it from making that meaning available in a compile-time constant.Optometrist
@supercat: Yes, that would violate the principle of least surprise. Side effects that suddenly disappear depending on the context = highly surprising. And, as mentioned in my answer, it might also violate a requirement of a companion standard.Skewbald
@supercat: Also, changing the rules in that way would break SFINAE (see Shafik's answer for an example).Skewbald
@BenVoigt: Disappearance of side-effects would be highly surprising; my point was that if an implementation says that int x=-1; int y=x<<4; would set y to -16 with no side-effects, then that same implementation should be able to regard (-1)<<4 as a const-expression -16. While there are corner-cases where such a rule could cause different template substitutions, such issues already exist in many other places; for example, if x is a uint32_t constant equal to 0xFFFFFFFFu, then (uint32_t)(x*x) would be a constant on any platform where int was 16 or 32 bits...Optometrist
...or larger than 64 bits, but be Undefined Behavior on platforms where int was between 40 and 64 bits. I would posit that allowing implementations to yield a value of 1 from that expression regardless of the size of int is more helpful than having the behavior depend on the size of int.Optometrist
@Optometrist see my updated answer, at least in the case of implementation adding constexpr to standard library functions, the committee found breaking SFINAE to be undesirable and hence forbid it in this case.Heathen
@ShafikYaghmour: IMHO, the Standard should split "forbidden" behavior and "optionally-defined" behavior (which would be a good thing for the language in other ways too) and say that if expressions have a well-defined mathematical meaning, even if Standard would not normally require implementations to calculate them, then an implementation may regard such compile-time expressions of that form as constant if such expressions would also be calculated correctly at run-time. Since various platforms could have different successful substitutions for various reason including the size of int...Optometrist
...it could be advantageous to allow template substitution in cases where a platform would be able to handle certain constructs even though the Standard would not otherwise require it. In any case, if substitution is disallowed when it would result in Undefined Behavior, that would imply that the term "Undefined Behavior" means something other than "behavior over which the Standard imposes no requirements".Optometrist
@Optometrist "it could be advantageous to allow template substitution in cases where a platform would be able to handle certain constructs"... No, not really. The value to developers of portable software of requiring diagnostics for undefined behavior in certain contexts far outweighs the value of silently working slightly more efficiently, until it doesn't work at all. C++ has never been a "Do What I Mean" language, if that is the philosophy you desire, then there are many other languages that better meet your needs.Skewbald
D
6

There is another point to excluding undefined behavior from constant expressions: constant expressions should, by definition, be evaluated by the compiler at compile time. Allowing a constant expression to invoke undefined behavior would allow the compiler itself to show undefined behavior. And a compiler that formats your hard-drive because you compile some evil code is not something you want to have.

Dina answered 23/1, 2014 at 21:55 Comment(5)
I don't think a constant expression is required to be evaluated at compile-time in all contexts. For array bounds etc., it is clear that this is "required" implicitly, but I'm not sure if there's a general requirement.Cavin
(Also, I don't think the Standard requires any behaviour of the compiler, only of the program.)Cavin
@dyp: Good point, with the as-if rule, it can be evaluated at run time. There is no requirement as the answer says.Thrave
@Cavin Yes, you are right, I changed the wording accordingly. My mistake, my mind used the purpose of the feature for its definition. Quite a good heuristic most of the time, but I'll never win a medal as a language lawyer with it...Dina
Constant expressions are not "invoked" by the compiler, they are "processed" by it. There is no reason for such processing ever to format your harddrive. A constant expression involving undefined behavior would typically process to an indeterminate value. But that would prevent implementations from making strong guarantees about behavior that the Standard leaves undefined.Skewbald

© 2022 - 2024 — McMap. All rights reserved.