C++ value representation of non-trivially-copyable types
Asked Answered
D

1

5

The current draft of the C++ standard (march 2019) has the following paragraph ([basic.types] p.4) (emphasis mine):

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

Why is the highlighted sentence limited to trivially copyable types? Is it because some bits from the value representation of a non-trivially-copyable object may be outside its object representation? This answer, as well as this one imply this.

However, in the answers linked above, the conceptual value of the object is based on semantics that are introduced by the user. In the example from the first linked answer:

class some_other_type
{
    int a;
    std::string s;
};

the user decides that the value of an object of type some_other_type includes the characters belonging to string s.

I tried to think of examples where the fact that some bits of an object's (that is not trivially copyable) value representation are outside its object representation is implicit (the implementation has to do this, it is not arbitrarily decided by the user).

One example that I came up with is the fact that the value representation of a base class subobject with virtual methods may include bits from the object representation of the complete object to which it belongs, because the base class subobject may behave differently (may "have a different value") compared to the situation in which it would be a complete object itself.

Another example that I though of is the fact that a vtable may also be part of the value representation of the object whose vtable pointer points to it.

Are these examples correct? Are there other examples?

Was the highlighted sentence introduced by the standard committee because of the fact that the semantic "value" of an object may be decided by the user (as in the two linked answers), or because of the fact that implementations may decide (or may be forced) to do this, or both?

Thank you.

Darling answered 18/3, 2019 at 16:7 Comment(6)
The conceptual value introduced by users is an open-ended concept. How can the standard say anything meaningful about it?Zink
@StoryTeller So that standard added this sentence because implementations may do that?Darling
I got tripped over by this: The value representation of an object of type T is the set of bits that participate in representing a value of type T. I can't make sense of it, other than as a circular definition: the bits that represent the value make up the representation of the value...Phalanger
@Darling - I don't follow. That sentence only says that an implementation can tell what 1110000100010001111 means if its the pattern where a trivially copyable object resides. I.e that the bit pattern is enough to determine the value if you know enough about how an implementation handles bits.Zink
@StoryTeller My question is why can't an implementation tell what 1110000100010001111 means if it were the object representation of a non-trivially-copyable object? Is it because there are some other bits (outside of this object representation) that help decide what value the object has?Darling
@TimRandall The standard does not define "a value of type T" - except for trivially copyable types (where it is "one discrete element of an implementation-defined set of values"). It only says that the bits that define that value are called "value representation" (and that the value representation has to be fully contained inside the object representation for trivially copyable types).Geometrize
G
7

In my interpretation, the focus of the sentence you highlighted is this part:

For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

Essentially, [basic.types]#4 of the standard says "each object has a set of bits O that are its object representation and a set of bits that are its value representation V. The set P = O without V are the padding bits. For trivially copyable types, V is a subset of O". The latter is important because it means that copying around the O set of bits also safely copies around the V for trivially copyable types, thus the value is preserved. How you define V for other types is of no concern here (set it to the entire abstract machine if you want).


To answer the revised question asked in the comments:

why can't an implementation tell what 1110000100010001111 means if it were the object representation of a non-trivially-copyable object? Is it because there are some other bits (outside of this object representation) that help decide what value the object has?

Let's take std::string as an example. It is not trivially copyable because it has to deal with memory management.

If two std::string objects had the same bit pattern, would they mean the same thing?

No. There is at least one implementation that indicates small string optimization by having its buffer pointer point into itself (gcc). Upon destruction, the buffer is deallocated if (and only if) it is not pointing to that exact location.

Clearly, two std::string objects residing in different locations would have to (in this implementation) represent the same (small) string value with different bit patterns (the buffer pointers would have to be different). And more importantly, the same bit pattern in two objects can mean very different things - it might indicate SSO in one case but not the other.

As you can see, there is additional information participating in the value representation of each std::string here: Its location in memory (i.e. the value of this). How exactly that is represented in terms of bits is not further specified by the standard.

Geometrize answered 18/3, 2019 at 16:40 Comment(13)
Thank you. But in your initial example, V cannot depend on the value of this, as it is not stored explicitly anywhere, and the paragraph states that a value representation is a set of bits. That was my initial thought, that the bits in the value representation of a non-trivially-copyable object are somewhere outside its object representation. That is why I said "Is it because some bits from the value representation of a non-trivially-copyable object may be outside its object representation?".Darling
@Darling There is a set of bits that imply the value of this, otherwise you couldn't call member functions of the object. That those bits are in the form of an explicit pointer value (with value equal to this) is not required. FWIW I agree with you that the standard seems overly pedantic here, but this paragraph is about trivially copyable types in the end (where this definition is useful). Whether these definitions are meaningful for other types is irrelevant I reckon.Geometrize
@Darling In other words, feel free to set V to the entire abstract machine for non-trivially copyable types. I don't think (or see why) the standard would care about that. Their point is "trivially copyable implies V is a subset of O".Geometrize
This is what I was thinking right now. Your string example can be reduced to a general one one in which the bits inside the object are not enough to determine its value (i.e. its value representation is not contained inside its object representation). Now, where one can find the remaining bits in its value representation does not matter, they are somewhere inside the abstract machine (although in my opinion this seems like a stretch of the standard's wording).Darling
Also, how can two string objects (in different memory locations) have the same object representation? Does one point into the other's internal buffer?Darling
@Darling Nobody said they had to be alive at the same time ;) You could first heap-allocate a std::string and store a small string into it, then deallocate that and create another std::string elsewhere that just so happens to allocate its buffer at the same place that the old one was. Unlikely but absolutely plausible (and certainly even forceable with allocators).Geometrize
I understand. So the examples that have been discussed all rely on the fact that the (semantic of the) value of an object is decided by the user. Are the implicit (i.e. the implementation decides) examples in my question correct?Darling
@Darling Even in those cases the value is user-determined in my eyes, simply because the standard doesn't define "value" further. Maybe you as a user don't care about the vtable or the complete object your base class object is in? I don't think you'll get confirmation either way from the standard here.Geometrize
I agree. But if it is always up to the user, if we were to consider struct A { int *x; } and I, as the user, decide that the value of an instance of type A also depends on the value inside the location x, now the value representation is not inside the object representation, despite A being trivially copyable.Darling
@Darling That's your mistake then. To make your idea conform to the C++ standard, you need to make A not trivially copyable. (Note that the standard doesn't care if you break it, it's your problem in the end.)Geometrize
But I believe my example is not ill-formed, as I do not think the standard says that if I want a my program to have a certain semantic and be "correct" I must make my class not trivially copyable. I know that this last question was not very meaningful. With my original question I tried to find out the rationale behind the highlighted sentence in the quoted paragraph (whether the author had in mind that users may introduce their own semantic or if there are cases in the language in which implementations are forced to do this, or both).Darling
Let us continue this discussion in chat.Geometrize
Copying bits is not a clearly defined concept. See my Q about ptrs "copying".Mickimickie

© 2022 - 2024 — McMap. All rights reserved.