What are the restrictions on modifying an object through a pointer to its byte representation in C++?
Asked Answered
G

2

5

I was confused by the following paragraph about type aliasing from cppreference (source):

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  • AliasedType and DynamicType are similar.
  • AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
  • AliasedType is std::byte, char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

Consider I have an object of a trivial type (such as a scalar) whose size is larger than 1 byte. In what ways (if at all), am I allowed to modify the byte representation of the object through a pointer to a different type without invoking undefined behaviour? For example:

int x = 5, y = 10;
std::byte* x_bytes = reinterpret_cast<std::byte*>(&x);

//#1: replacing the entire representation:
std::memcpy(x_bytes, &y, sizeof(int));

//#2: changing a random byte in the representation:
x_bytes[0] = (std::byte)3;

Are both of these operations allowed, or only #1?
The problem is that I don't know how to interpret the paragraph I quoted. The three bullets are exceptions to the rule that "Whenever an attempt is made to read or modify the stored value [...] the behavior is undefined", which would imply that both reading and writing are allowed if one of the bullets is applicable. However, the third bullet only mentions the "examination of the object representation", which implies read-only access.
I tried to find an appropriate standard page describing this problem in more detail, but I haven't been able to, so this was all I had that was relevant to the problem.

Graiae answered 20/5, 2021 at 13:24 Comment(0)
C
5

Are both of these operations allowed

Yes. There is no rule saying that you must modify all or nothing. Modifying a single byte is allowed.


However, the third bullet only mentions the "examination of the object representation", which implies read-only access.

The standard rule doesn't use such wording. This is the rule from the latest draft:

[basic.lval]

If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
  • a char, unsigned char, or std​::​byte type.

Access is defined as:

[defns.access]

⟨execution-time action⟩ read or modify the value of an object


Of course, modifying bytes by their index-order is quite dubious from portability perspective, since different systems store their bytes in different orders, and thus you would be modifying a byte with different order of significance on different systems.

Different behaviour on different systems is often undesirable.

Cask answered 20/5, 2021 at 13:31 Comment(3)
Thank you very much for your answer, especially for linking the standard page.Graiae
There are also [basic.life]/1.5 and /5 which talk about reusing the storage; work is underway to establish that the bytes act like subobjects so as not to end the lifetime of the ordinary object.Permafrost
@DavisHerring the paper you've linked also explicitly says that in-place modification of the object representation is still undefined behavior (in revision 5).Chrissie
C
1

You are not allowed to arbitrarily modify an object through a std::byte*. In your example, #1 is okay, but #2 is undefined behavior.

Firstly, note that any object is type-accessible through a glvalue of type std::byte (the wording is slightly different in C++20, but has the same meaning). This means that you can access any object through a std::byte*, such as x_bytes in your example, and "access" means both reading and modifying by definition.

However, the effect of doing this is completely undefined by the standard, with some exceptions. Notably, [basic.types.general] p2 says that you can copy the underlying bytes of a trivially copyable type into a byte array and back, and this retains the original value. In your example, std::memcpy(x_bytes, &y, sizeof(int)); is therefore required to work exactly like x = y;.

However, the standard never defines what happens when you modify an object through a std::byte* or what value you obtain when reading an object through a std::byte*. Therefore:

  • Modifying an object through std::byte* is undefined behavior by omission.
  • Accessing individual bytes through std::byte* is at best giving you an unspecified value, and is at worst, undefined behavior by omission.

Also note that reinterpret_cast doesn't give you a pointer to the "underlying bytes" (i.e. the object representation). It's still a pointer to the original object.

Related work

Note that the C++ object model is currently highly defective. Many of the issues should be resolved by P1839: Accessing object representations, which makes major changes to object representations.

The paper also states (see Non-goals):

This paper does not propose to make in-place modification of the object representation valid, i.e. writing into the underlying bytes, only reading them.

Also see the Known issues section in the paper.

Chrissie answered 6/6 at 6:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.