_Bool type and strict aliasing
Asked Answered
N

1

8

I was trying to write some macros for type safe use of _Bool and then stress test my code. For evil testing purposes, I came up with this dirty hack:

_Bool b=0;
*(unsigned char*)&b = 42;

Given that _Bool is 1 byte on the implementation sizeof(_Bool)==1), I don't see how this hack violates the C standard. It shouldn't be a strict aliasing violation.

Yet when running this program through various compilers, I get problems:

#include <stdio.h>

int main(void)
{
  _Static_assert(sizeof(_Bool)==1, "_Bool is not 1 byte");

  _Bool b=0;
  *(unsigned char*)&b = 42;
  printf("%d ", b);
  printf("%d", b!=0 );

  return 0;
}

(The code relies on printf implicit default argument promotion to int)

Some versions of gcc and clang give output 42 42, others give 0 0. Even with optimizations disabled. I would have expected 42 1.

It would seem that the compilers assume that _Bool can only be 1 or 0, yet at the same time it happily prints 42 in the first case.

Q1: Why is this? Does the above code contain undefined behavior?

Q2: How reliable is sizeof(_Bool)? C17 6.5.3.4 does not mention _Bool at all.

Nicolette answered 4/9, 2018 at 10:9 Comment(13)
6.7.2.1 has interesting footnote that may be relevant: "124) While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit."Ulick
@Ulick That's a non-normative foot note regarding the use of bit-fields. I don't see how it is relevant here.Nicolette
How can the output be 42 42? The second printf can only print 1 or 0.Helmut
@StoryTeller Indeed. But that's what I get with gcc/mingw, hence the question. Maybe a bug in the standard lib?Nicolette
O_O I tried it. My mind is seriously blown right now. This joins my collection of UB examples.Helmut
@StoryTeller Why is it UB though?Nicolette
I don't know. Though I was initially shocked by the second 42, it kinda makes sense in retrospect. Because b != 0 for a _Bool can be optimized to simply b. I'm scratching my head still though.Helmut
@RbMm Character types are an exception in strict aliasing rules. Optimizer cannot cause UB based on that here.Ulick
In a similar example, the _Bool optimization combines with optimizations for transforming branches to arithmetic operations, producing strange-looking results for very natural code. The optimization of if (b) x++; into x+=(the representation of )b; confirms that Clang treats _Bool representations other than 0 and 1 as trap values triggering UB. gcc.godbolt.org/z/wPq4zqAttar
@Ulick The aliasing rules are asymmetric with respect to character types. When the effective type of a datum is a non-character type, you can access that datum via a character type. But when the effective type of a datum is a character type, you can not access that datum via a non-character type.Hurleigh
Please post the assembly code for the posted question. Then we can easily determine what the compiler was thinkingAudy
Interesting corner case: the whole C99 _Bool semantics is a hack. It would have been fine to impose all this headache on implementors if they had also added boolean and/or bit-field arrays, but, as specified, it does provide any real improvement over enum { false, true }; typedef unsigned char _Bool;Damaging
For Q2, sizeof(_Bool) is at least 1. C17 6.7.2.1/4 footnote 124 says: "While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit." (Of course, _Bool has no sign bit, only value bits and padding bits.) C23 final draft n3054 6.2.6.1/2 says: "The type bool shall have one value bit and (sizeof(bool)*CHAR_BIT) - 1 padding bits." (Spot the minor typographical error in the text: they used a hyphen character instead of a minus sign character.)Huckaback
L
9

Q1: Why is this? Does the above code contain undefined behavior?

Yes, it does. The store is valid, but subsequently reading that as a _Bool is not.

6.2.6 Representations of types

6.2.6.1 General

5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...]

Q2: How reliable is sizeof(_Bool)? C17 6.5.3.4 does not mention _Bool at all.

It will reliably tell you the number of bytes that are needed to store one _Bool. 6.5.3.4 also doesn't mention int, but you're not asking whether sizeof(int) is reliable, are you?

Lysine answered 4/9, 2018 at 11:1 Comment(13)
So _Bool a; *(unsigned char*)&a = 42; printf("%d", *(unsigned char*)&a); is valid?Miyasawa
@Miyasawa As far as I know, yes, that is perfectly valid.Lysine
Isn't the existence trap representation implementation defined? In that case it wouldn't be UB always.Ulick
@Ulick - It doesn't need to be a trap representation to have UB. The quote above doesn't at all require the access to trap. Simply yielding 42 for a _Bool is enough UB.Helmut
@StoryTeller Quotation is missing lower half of the paragraph which says: "Such a representation is called a trap representation."Ulick
@Ulick You're right that this is a trap representation regardless of what behaviour the implementation gives it. As for implementation-defined, I don't think so. There are types for which certain implementation-defined aspects allow us to deduce that there are no trap representations (such as sizeof(int) * CHAR_BIT == 32 && INT_MIN+1 == -2147483647 && INT_MAX == 2147483647), but no blanket requirement to document all trap representations.Lysine
@Ulick - Fair enough. Though I think the definition in 3.19.4 is more satisfactory in this case. It's simply not a valid representation of a bool. Ergo, UB.Helmut
I agree with the previous comments, this is merely the formal definition of a trap representation. I've never heard of booleans with trap representation. How do you explain this? coliru.stacked-crooked.com/a/8b5ede2a92714caf The output is 42 42. Yet C17 6.2.6.1 §6 says "The value of a structure or union object is never a trap representation"Nicolette
Actually the linked snippet is even stranger since there's an assignment from _Bool to _Bool and the compiler had a chance to assume 0 or 1.Nicolette
@Nicolette no_trap_t no_trap = {b}; is still invalid because it reads a trap representation. What p6 means is that if that were valid (or if you get a trap representation in a structure member some other way), then performing no_trap = no_trap; is also valid: even though it copies a member which has a trap representation, the structure as a whole does not.Lysine
6.2.6.2 p1 and it's footnote 53 (non-normative I know, but displays intent of standard) seem to suggest that unsigned integer types (incl. _Bool according 6.2.5 p6) can have combinations of padding bytes that are not trap reps. In that case it would be possible have conforming program which would give output 0 0 (assuming LSB is value bit and rest are padding).Ulick
@Ulick Yes, an implementation is allowed to decide that. There is a requirement that all bits zero is a valid representation of zero for all integer types (6.2.6.2p5), but it need not be the only valid representation of zero.Lysine
@Lundin: The purpose of guaranteeing that a struct or union is never a trap representation is essentially to say that a compiler may only replace struct assignment with member-by-member assignment if there are no possible bit patternss the structure might hold that would cause such operations to have side-effects. If a struct member holds a bit pattern which is a trap representation or shares its meaning with any other, I think the Standard allows the corresponding member of the destination structure to hold any equivalent bit pattern (or any pattern at all if source is a trap rep).Gapeworm

© 2022 - 2024 — McMap. All rights reserved.