C++ Union Member Access And Undefined Behaviour
Asked Answered
A

3

5

I am currently working on a project in which I am provided the following structre. My work is C++ but the project uses both C and C++. The same structure definition is used by both C and C++.

typedef struct PacketHeader {
    //Byte 0
    uint8_t  bRes                           :4;
    uint8_t  bEmpty                         :1;
    uint8_t  bWait                          :1;
    uint8_t  bErr                           :1;
    uint8_t  bEnable                        :1;
    //Byte 1
    uint8_t  bInst                          :4;
    uint8_t  bCount                         :3;
    uint8_t  bRres                          :1;
    //Bytes 2, 3
    union {
        uint16_t wId;    /* Needed for Endian swapping */
        struct{
            uint16_t wMake                  :4;
            uint16_t wMod                   :12;
        };
    };
} PacketHeader;

Depending on how instances of the structure are used, the required endianness of the structure can be big or little endian. As the first two bytes of the structure are each single bytes, these don't need altering when the endianness changes. Bytes 2 and 3, stored as a single uint16_t, are the only bytes which we need to swap to acheive the desired endianness. To acheive the endianness swap, we have been performing the following:

//Returns a constructed instance of PacketHeader with relevant fields set and the provided counter value
PacketHeader myHeader = mmt::BuildPacketHeader(count);

uint16_t packetIdFlipped;
//Swap positions of byte 2 and 3
packetIdFlipped = myHeader.wId << 8;
packetIdFlipped |= (uint16_t)myHeader.wId >> 8;

myHeader.wId = packetIdFlipped;

The function BuildPacketHeader(uint8_t) assigns values to the members wMake and wMod explicitly, and does not write to the member wId. My question is regarding the safety of reading from the member wId inside the returned instance of the structure.

Questions such as Accessing inactive union member and undefined behavior?, Purpose of Unions in C and C++, and Section 10.4 of the draft standard I have each mention the undefined behaviour arising from accessing an inactive member of a union in C++.

Paragraph 1 in Section 10.4 of the linked draft also contains the following note, though I'm not sure I understand all the terminology used:

[Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (10.3), and if a non-static datamember of an object of this standard-layout union type is active and is one of the standard-layout structs, itis permitted to inspect the common initial sequence of any of the standard-layout struct members; see 10.3.— end note]

Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?

Is the unnamed struct the active member as it was the last member written to in the function call?

Or does the note mean it is safe to access the wId member, as it and the struct share a common type? (and is this what is meant by common initial sequence?)

Thanks in advance

Arsenite answered 21/3, 2019 at 13:35 Comment(4)
Anonymous structs are ill-formed in C++. I.e. this: struct{ /* members */ };Grandaunt
@Grandaunt Thank you - I hadn't even realised that. Looking into it led me a comment Similar to union, an unnamed member of a struct whose type is a struct without name is known as anonymous struct. Every member of an anonymous struct is considered to be a member of the enclosing struct or union. This applies recursively if the enclosing struct or union is also anonymous here. Does this mean that the code above actually is not UB as each member of the inner struct and union are considered members of struct PacketHeader?Arsenite
It is UB. The union being anonymous doesn't change anything.Grandaunt
Don't seem to find the edit comment button but, on closer reading, I think that link was referring to C anyhow. Many thanks, once againArsenite
C
1

Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?

Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.

and is this what is meant by common initial sequence?

The common initial sequence is when two standard layout types share the same members in the same order. In

struct foo
{
    int a;
    int b;
};

struct bar
{
    int a;
    int b;
    int c;
};

a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.

This is not your case though since wId isn't a standard layout type struct.

Chadwell answered 21/3, 2019 at 13:47 Comment(3)
Would wrapping wId in a struct ID{ ... } suffice to provide a standard layout type? And, if so, would it share a common initial sequence with the following, anonymous (in the provided example - will be changed now that eerorika pointed out it is ill defined) struct, as each consist of a uint16_t initial member? Or does the fact that the latter struct's member is a bit-field mean they are not common types?Arsenite
I should read more thoroughly before asking more questions, my apologies. The links you posted linked to another page, clarifying and if they are bit-fields, they have the same width.Arsenite
@Arsenite That still would not be legal. Both structs would have to have a bit field for it to be legal, as you found out.Chadwell
W
4

The function BuildPacketHeader(uint8_t) assigns values to the members wMake and wMod explicitly, and does not write to the member wId. My question is regarding the safety of reading from the member wId inside the returned instance of the structure.

Yes, it's UB. It does not mean it's not working, just that it may not work. You can use memcpy inside BuildPacketHeader to avoid that (see this and this).

Wyck answered 21/3, 2019 at 13:56 Comment(1)
Thank you for the reply and the memcpy hint. Is this still UB considering the inner union and struct are both anonymous (as pointed out by eerorika), and the following comment Similar to union, an unnamed member of a struct whose type is a struct without name is known as anonymous struct. Every member of an anonymous struct is considered to be a member of the enclosing struct or union. This applies recursively if the enclosing struct or union is also anonymous., source EDIT: The link seems to be referring to C anyhow, my apologiesArsenite
C
1

Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?

Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.

and is this what is meant by common initial sequence?

The common initial sequence is when two standard layout types share the same members in the same order. In

struct foo
{
    int a;
    int b;
};

struct bar
{
    int a;
    int b;
    int c;
};

a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.

This is not your case though since wId isn't a standard layout type struct.

Chadwell answered 21/3, 2019 at 13:47 Comment(3)
Would wrapping wId in a struct ID{ ... } suffice to provide a standard layout type? And, if so, would it share a common initial sequence with the following, anonymous (in the provided example - will be changed now that eerorika pointed out it is ill defined) struct, as each consist of a uint16_t initial member? Or does the fact that the latter struct's member is a bit-field mean they are not common types?Arsenite
I should read more thoroughly before asking more questions, my apologies. The links you posted linked to another page, clarifying and if they are bit-fields, they have the same width.Arsenite
@Arsenite That still would not be legal. Both structs would have to have a bit field for it to be legal, as you found out.Chadwell
A
1

What the C++ standard kind of says is given two structs A and B and the following untion:

union U
{
  A a;
  B b;
};

The following is valid code:

U u;

A a;
u.a = a;
a = u.a;

B b;
u.b = b;
b = u.b;

You read and write the same type. This is obviously correct code.

But the problem comes when you have the following code:

A a;
B b;
u.a = a;
b = u.b;

What do we know about A and B? First in the union they share the same memory space. Now the C++ standard has plainly declared it as undefined behavior.

But that does not mean it's fully out of the window. C99 comes into play, since it is the normative base and there are weak guarantees about unions. That is if the union member have the same memory layout they are compatible and each structs first memory address is the same. So if you can ensure that your structs / union members are all padded in the correct way the operation is safe, even if C++ says it's undefined.

Finally from a pragmatic standpoint, if you don't mess with padding and get the standard layout, the compiler will generally do the right thing, since that is a quite old usage pattern in C and breaking this will break LOTS of code.

Aguascalientes answered 21/3, 2019 at 13:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.