Is it undefined behavior to read and compare padding bytes of a POD type?
Asked Answered
P

3

12

Today I've encountered some code that roughly looks like the following snippet. Both valgrind and UndefinedBehaviorSanitizer detected reads of uninitialized data.

template <typename T>
void foo(const T& x)
{
    static_assert(std::is_pod_v<T> && sizeof(T) > 1);
    auto p = reinterpret_cast<const char*>(&x);

    std::size_t i = 1; 
    for(; i < sizeof(T); ++i)
    {
        if(p[i] != p[0]) { break; }
    }

    // ...
}

The aforementioned tools complained about the p[i] != p[0] comparison when an object containing padding bytes was passed to foo. Example:

struct obj { char c; int* i; };
foo(obj{'b', nullptr});

Is it undefined behavior to read padding bytes from a POD type and compare them to something else? I couldn't find a definitive answer neither in the Standard nor on StackOverflow.

Pinckney answered 22/11, 2017 at 14:37 Comment(0)
D
8

The behaviour of your program is implementation defined on two counts:


1) Prior to C++14: Due to the possibility of a 1's complement or signed magnitude signed type for your char, you might return a surprising result due to comparing +0 and -0.

The truly watertight way would be to use a const unsigned char* pointer. This obviates any concerns with the now abolished (from C++14) 1's complement or signed magnitude char.


Since (i) you own the memory, (ii) you are taking a pointer to x, and (iii) an unsigned char cannot contain a trap representation, (iv) char, unsigned char, and signed char being exempted from the strict aliasing rules, the behaviour on using const unsigned char* to read uninitialised memory is perfectly well defined.


2) But since you don't know what is contained in that uninitialised memory, the behaviour on reading it is unspecified and that means the program behaviour is implementation defined since the char types cannot contain trap representations.

Delectate answered 22/11, 2017 at 14:41 Comment(14)
Isn't it UB? Do padding bytes have defined value?Affection
No, there is no "mechanism" for UB to manifest itself when reading via unsigned char*.Delectate
What will happen if (*p==42) { }, if unsigned char *p points to a padding byte? Will the branch taken, or not?Affection
@geza: I've added the missing final paragraph.Delectate
Hmm, I thought that reading from uninitialized memory is UB. Maybe I'm wrong, I've never actually checked the standard about this.Affection
@geza: Only if you violate alignment requirements, have the potential for trap representations in the receiving type, or if the compiler is able to store the object in a register (which it can't if an address is taken - I'm careful to point that out).Delectate
Could implementation defined mean UB here? I mean, can an implementation say for an implementation defined thing that it is actually UB? Because I don't really see, how an implementation could say, what will happen for this. In my experience, compilers simply ignore padding bytes, don't initialize them. So a compare on them has undefined result, an implementation won't tell you, what will happen at *p==42. Where am I wrong? Maybe it's time for me to look at the standard about this :)Affection
@Affection I expect implementations to just read that byte and whatever value happens to be in the memory is what is being used. That doesn't count as undefined behavior in the sense of the standard. Although I did expect the standard to say UB here also, so who knows.Mcguire
..., and (iv) the explicit exemption of char types from strict aliasing rules Already using uint8_t instead of char can kill your well-definedness.Soupspoon
@cmaster: I think I'm going to pinch that.Delectate
Is std::byte from C++17 also excluded from the strict aliasing rules? Cppreference page on strict aliasing doesn't mention it en.cppreference.com/w/c/language/object#Strict_aliasing, but the page on reinterpret_cast does en.cppreference.com/w/cpp/language/reinterpret_cast.Unequal
I've added another answer. I didn't found that it would be implementation defined. Am I wrong somewhere?Affection
Disagree with your final point. Behaviour which depends on unspecified values is unspecified, not implementation-defined.Makkah
@Makkah yes there was a missing step. Fixed.Delectate
A
4

It depends on the conditions.

If x is zero-initialized, then padding has zero bits, so this case is well defined (8.5/6 of C++14):

To zero-initialize an object or reference of type T means:

— if T is a scalar type (3.9), the object is initialized to the value obtained by converting the integer literal

0 (zero) to T;105

— if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class

subobject is zero-initialized and padding is initialized to zero bits;

— if T is a (possibly cv-qualified) union type, the object’s first non-static named data member is zero-

initialized and padding is initialized to zero bits;

— if T is an array type, each element is zero-initialized; — if T is a reference type, no initialization is performed.

However, if x is default-initialized, then padding isn't specified, so it has indeterminate value (inferred by the fact that there's no mention of padding here) (8.5/7):

To default-initialize an object of type T means:

— if T is a (possibly cv-qualified) class type (Clause 9), the default constructor (12.1) for T is called (and the initialization is ill-formed if T has no default constructor or overload resolution (13.3) results in an ambiguity or in a function that is deleted or inaccessible from the context of the initialization);

— if T is an array type, each element is default-initialized;

— otherwise, no initialization is performed.

And comparing indeterminate values is UB for this case, as none of the mentioned exceptions apply, as you compare the indeterminate value to something (8.5/12):

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17). [ Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2. — end note ] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

— If an indeterminate value of unsigned narrow character type (3.9.1) is produced by the evaluation of:

......— the second or third operand of a conditional expression (5.16),

......— the right operand of a comma expression (5.18),

......— the operand of a cast or conversion to an unsigned narrow character type (4.7, 5.2.3, 5.2.9, 5.4),

or

......— a discarded-value expression (Clause 5), then the result of the operation is an indeterminate value.

— If an indeterminate value of unsigned narrow character type is produced by the evaluation of the right operand of a simple assignment operator (5.17) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.

— If an indeterminate value of unsigned narrow character type is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

Affection answered 22/11, 2017 at 17:42 Comment(0)
L
1

Bathsheba's answer correctly describes the letter of the C++ standard.

The bad news is that all modern compilers I have tested (GCC, Clang, MSVC, and ICC) all ignore the letter of the standard on this point. They instead treat the bald statement in Annex J.2 to the C standard

[the behavior is undefined if] the value of an object with automatic storage duration is used while it is indeterminate

as if it were 100% normative, in both C and C++, even though Annex J is not normative. This applies to all possible read accesses to uninitialized storage, including those carefully performed through unsigned char *, and, yes, including read accesses to padding bytes.

Moreover, if you were to file a bug report, I am confident that you would be told that, to the extent the normative text of the standard does not agree with what they are doing, it is the standard that is defective.

The good news is that you will only incur UB upon access to padding bytes if you inspect the contents of the padding bytes. Copying them around is OK. In particular, if you initialize all the named fields of a POD structure, it will be safe to copy it around by structure assignment and by memcpy, but it will not be safe to compare it to another such structure using memcmp.

Larainelarboard answered 23/11, 2017 at 3:29 Comment(7)
Can you link to some evidence that the compiler treats reading uninitialized padding as unsigned char as UB? The valgrind report does not necessarily indicate UB, just unspecified behaviour.Makkah
@M.M: Consider uint16_t a,b; unsigned char *ap=&a, *bp=&b,t0,t1; ...; t0=bp[0]^1; t1=bp[1]^1; ap[0]=t0^1; ap[1]=t1^1; Both clang and gcc will treat the sequence of unsigned char-based operations as equivalent to a=b; even in cases where the latter would be undefined, and may generate code that leaves a holding a value outside the range 0-65535.Tit
@Tit this question was about padding bytes in structs, so there is no issue such as what value they hold. Your example is substantively differentMakkah
@M.M: The Standard would imply that using an unsigned char* to read out data from any object should always yield a value 0..CHAR_MAX, even when the value of the object is uninitialized, but gcc and clang don't uphold that. If a struct in automatic storage is written, but no special effort is made with regard to padding bytes, I know of nothing that would require that compilers define the behavior of padding bytes more thoroughly than the behavior of anything else in the struct.Tit
@Tit the standard doesn't say any such thing. Reading objects of indeterminate value produces an indeterminate value, which is UB except for a handful of circumstances , all of which involve either discarding the value or assigning it to another variable which thereby becomes indeterminate also. (This is all in C++14 dcl.init/12)Makkah
@M.M: Reading an object via unsigned char* is by my reading one of the exceptions which is explicitly not UB (the fact that one is using a pointer would imply that the object's address has been taken). Are you implying that reading via unsigned char* is UB despite the rules that would seem to allow it, or that the behavior is defined in a way that could yield a value outside the range of unsigned char?Tit
@Tit Indeterminate values do not have a range so your comments about "range outside of unsigned char" don't even start to make sense. The terminology "reading an object" is not used by dcl.init/12 and I can't understand what you are trying to ask me sorry. The rules allow producing an indeterminate value via an evaluation in certain cases only. IDK which cases you are counting as "reading an object".Makkah

© 2022 - 2024 — McMap. All rights reserved.