Endianness in constexpr
Asked Answered
G

2

14

I want to create a constexpr function that returns the endianness of the system, like so:

constexpr bool IsBigEndian()
{
    constexpr int32_t one = 1;
    return (reinterpret_cast<const int8_t&>(one) == 0);
}

Now, since the function will get executed at compile time rather than on the actual target machine, what guarantee does the C++ spec give to make sure that the correct result is returned?

Gaye answered 16/6, 2016 at 18:4 Comment(5)
Good question! This used to be a problem (other compilers for other languages) w.r.t. floating point constants/expressions & cross compilers (back in the day when there were different floating point formats). I'd like to know the C++ standard answer.Revanchism
Hard to have any practical meaning. I can't imagine two architectures which can execute the same compiled source, but differ in endianness.Aerometeorograph
@πάντα ῥεῖ: That's not this question at all. This is about a specific case of detecting endian-ness.Jaquelinejaquelyn
@Aerometeorograph What about cross-compiling?Endamoeba
@RichardCritten, cross-compilation for different architecture will have to respect this architecture endianness for all intents and purposes, I suppose... But I concede it's a good point.Aerometeorograph
S
10

None. In fact, the program is ill-formed. From [expr.const]:

A conditional-expression e is a core constant expression unless the evaluation of e, following the rules of the abstract machine (1.9), would evaluate one of the following expressions:
— [...]
— a reinterpret_cast.
— [...]

And, from [dcl.constexpr]:

For a constexpr function or constexpr constructor that is neither defaulted nor a template, if no argument values exist such that an invocation of the function or constructor could be an evaluated subexpression of a core constant expression (5.20), or, for a constructor, a constant initializer for some object (3.6.2), the program is ill-formed; no diagnostic required.


The way to do this is just to hope that your compiler is nice enough to provide macros for the endianness of your machine. For instance, on gcc, I could use __BYTE_ORDER__:

constexpr bool IsBigEndian() {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    return false;
#else
    return true;
#endif
}
Soubise answered 16/6, 2016 at 18:12 Comment(5)
It's not the reason it's ill-formed. The real reason it's ill-formed is because violation of the strict aliasing rules. What you quoted just means IsBigEndian won't be constexpr.Misstate
@uhohsomebodyneedsapupper Violating strict aliasing is UB. A constexpr function that can't be evaluated as a const expression is ill-formed.Soubise
@uhohsomebodyneedsapupper there not neccessarily a violation. int8_t might be char, and aliasing through char* is allowed.Aerometeorograph
@SergeyA: Aliasing is allowed, but the only way you're allowed to access the object through that pointer is to copy it elsewhere. The standard provides no guarantees on what the value stored there will be. It only guarantees that if you copy an int32_t-sized block of memory through a char*, then you have the value-representation of a valid int32_t.Jaquelinejaquelyn
@NicolBolas, it makes no representation of the actual value, but this is exactly what OP wants to check. It is not UB, it allows access to underlying representation, and OP is checking representation to come to conclusions.Aerometeorograph
J
2

As stated by Barry, your code is not legal C++. However, even if you took away the constexpr part, it would still not be legal C++. Your code violates strict aliasing rules and therefore represents undefined behavior.

Indeed, there is no way in C++ to detect the endian-ness of an object without invoking undefined behavior. Casting it to a char* doesn't work, because the standard doesn't require big or little endian order. So while you could read the data through a byte, you would not be able to legally infer anything from that value.

And type punning through a union fails because you're not allowed to type pun through a union in C++ at all. And even if you did... again, C++ does not restrict implementations to big or little endian order.

So as far as C++ as a standard is concerned, there is no way to detect this, whether at compile-time or runtime.

Jaquelinejaquelyn answered 16/6, 2016 at 18:31 Comment(19)
But why? It is not undefined behavior! It is not defined behavior, and those are different. OP is doing exactly this - he is accessing the underlying representation (which is allowed) and than making a decision based on representation. I do not see UB here.Aerometeorograph
@SergeyA: Behavior that is not defined is by definition undefined. That is exactly what the word "undefined" means: not defined. Also, the question is asking what the standard says about such code. The answer is (aside from not being legal) that the standard does not define the behavior of this function.Jaquelinejaquelyn
I disagree. The good test of undefined vs not defined is followin - can it spawn Nasal Demons? (or, more realistically, can compiler produce a code which is not logically equivalent to C++ code). In case of undefined behavior, it can. In case of not defined behavior, it can not. In particular, in given case, no nasal demons would be summoned, instead, one or 0 would be returned based on representation of boolean.Aerometeorograph
@SergeyA: This question is tagged "language-lawyer". That means that the only "test of undefined vs not defined" that matters is what the C++ specification says. Also, that's a terrible definition of undefined behavior regardless of the tag. What is defined or undefined is an objective standard; indeed, that's the whole point of having a standard to begin with.Jaquelinejaquelyn
With that I do agree! I didn't see attorney tag before, and tried to argue from practical perspective. From atterney tag perspective, question makes no sense whatsoever since Standard doesn't cover endianness at all.Aerometeorograph
@NicolBolas: If an implementation documentation indicates that storing 1.00f into a float will cause it to hold the bytes 0x80 00 00 00, and also specifies that storing byte values 0x80 00 00 00 into a long will cause it to hold -2147483648, then in the absence of any rule to the contrary, those facts would effectively define the behavior of storing 1.00f into a union as a float and reading it as a long. It is only the rule prohibiting such action that makes it undefined.Breastpin
@supercat: Perhaps. But that would be because the implementation has chosen to define behavior that the standard does not. This makes any such code dependent on those implementations and thus reliant on that non-standard behavior.Jaquelinejaquelyn
@NicolBolas: Many implementations specify the precise representation formats they use, and 99.9% of implementations use one of a small number of sets of type representations, so in the absence of rules explicitly un-defining the behavior, 99.9% of implementations would have to exhibit one of a small number of behaviors. While 99.9% isn't 100%, it wouldn't be exactly "non-standard" either.Breastpin
@supercat: There is a difference between "common" and standard. This question asked about the latter, not the former.Jaquelinejaquelyn
@NicolBolas: If an implementation has a uint32_t type and a uint8_t type, the Standard would allow it to use any of 32! possible representations of uint32_t (a huge, but finite, number), but I would be surprised, however, if any any non-contrived implementations which have those types would not use one of four orderings, identifiable by reading the first byte of the representation of 0x00010203. The Standard may allow other forms, but is it realistically likely that any non-contrived implementation that defines uint32_t and uint8_t will ever use them?Breastpin
@NicolBolas: whenever almost-everybody-in-existence complies with certain X, it means that X goes beyond simply being "common", and into the realm of "de-facto standard"... From the point of "language-lawyers" - I would argue it is similar to reading law interpretations in real-world use cases (opposed to reading the law itself) ;-).Noellenoellyn
@No-BugsHare: There's still a difference between "de-facto standard" and "de-jure standard". This question is asking about the latter, not the former. Your analogy is wrong; in common-law systems, prior rulings are "the law itself"; case law has the full force of the text of the law. The "language-laywer" tag is not asking about what happens in real systems; it's a tag specifically for asking about what the standard says.Jaquelinejaquelyn
@NicolBolas: 'There's still a difference between "de-facto standard" and "de-jure standard".' - sure, though nobody-besides-language-lawyers-on-stack-overflow (and WG21+compiler-writers) ever cares about the latter. Moreover, I happen to know a few people around WG21 who are currently working on bringing "de-jure" closer to "de-facto" (I don't think that endianness is currently on the list, though). Or, in other words - de-jure standard is certainly not a gospel, it does change - and de-facto affects these changes in a very significant manner.Noellenoellyn
@No-BugsHare: But the OP asked about the C++ standard. Not what the majority of C++ compilers do, not what might be in the C++ standard in the future, and so forth. And even with some of those standard changes that have been proposed, union-based type punning is not one of them. The way C++20 handles this will be std::bit_cast.Jaquelinejaquelyn
@NicolBolas: to define it - it will take more than bit_cast<> (at least as I read it). Overall, there is a dozen ways to have an API which allow to detect endianness (hey, reinterpret_cast<> is already standard, or memcpy() can be made constexpr, or bit_cast<> can be used, or...). IIRC, the main problem is that to define not just an API, but the outcome of this API, we'd need to define the very term "byte order" :-( (and the last time I checked, the only attempts to define it were related to "network byte order", which doesn't bring us closer to "byte order observed on this CPU" :-( ).Noellenoellyn
@No-BugsHare: First, type-punning is about more than just endian-ness. Second, there's already an (accepted) proposal to define the endian-ness of the implementation.Jaquelinejaquelyn
@NicolBolas: thanks for pointing it out, will keep fingers crossed for it :-). BTW, when/if we have this p0463r1- then we won't really need any additional stuff such as bit_cast<>, and will be able to rely on good old memcpy() :-); as for constexpr functions - with p0463r1 in our hands, we'll be able to write our own constexpr version of memcpy() easily (with __LITTLE_ENDIAN__ available, I had to do it myself just a few days ago). Also, I have to point out that this p0463r1 is a prominent example of "people working to bring de-jure standard in line with de-facto one" ;-).Noellenoellyn
@No-BugsHare: "BTW, when/if we have this p0463r1- then we won't really need any additional stuff such as bit_cast<>, and will be able to rely on good old memcpy()" Why use an opaque function like memcpy when what you're clearly trying to do is bit-cast? That function wasn't created because we couldn't do that job; it was created because it makes it abundantly clear what it is doing. And it verifies that the types are of the kind on which that operation is legal. Also, while bit-casting helps with endian conversion, that's not the only thing you use bit-casting for.Jaquelinejaquelyn
@NicolBolas: "Why..." - good ol' Occam's Razor, per chance? This kind of things is necessary in sooooo limited number of scenarios (almost-exclusively marshalling-related), that creating a new concept just to make this-0.001%-of-code a bit safer is IMO a biiiig overkill.Noellenoellyn

© 2022 - 2024 — McMap. All rights reserved.