What is type punning and what is the purpose of it?

Asked 23/5, 2017 at 14:21 Answered 25/2, 2019 at 21:49

Type punning

A form of pointer aliasing where two pointers and refer to the same location in memory but represent that location as different types. The compiler will treat both "puns" as unrelated pointers. Type punning has the potential to cause dependency problems for any data accessed through both pointers.

What is this article trying to say? What happens if I use it or not use it?

Banket answered 23/5, 2017 at 14:21 Comment(3)

what exactly isnt clear? The fact that you already give an answer yourself doent make it easier to answer the question – Fluoridate 23/5, 2017 at 14:22

Possible duplicate of What is the strict aliasing rule? – Luby 23/5, 2017 at 14:22

Possible duplicate of Unions and type-punning – Graecize 24/5, 2017 at 6:46

As it says, type punning is when you have two pointers of different type, both pointing at the same location. Example:

// BAD CODE
uint32_t data;
uint32_t* u32 = &data;
uint16_t* u16 = (uint16_t*)&data; 
*u16 = ... // de-referencing invokes undefined behavior

This code invokes undefined behavior in C++ (and C) since you aren't allowed to access the same memory location through pointers of non-compatible types (with a few special exceptions). This is informally called a "strict aliasing violation" since it violates the strict aliasing rule.

Another way of doing type punning is through unions:

// BAD C++ CODE
typedef union
{
  uint32_t u32;
  uint16_t u16 [2];
} my_type;

my_type mt;
mt.u32 = 1;
std::cout << mt.u16[0]; // access union data through another member, undefined behavior

This is also undefined behavior in C++ (but allowed and perfectly fine in C).

Sabina answered 23/5, 2017 at 14:29 Comment(10)

I thought that union behavior worked in C++ for pod types – Ethelda 28/5, 2017 at 18:15

@Cort it works on most platforms, but it’s undefined behavior. – Incisive 16/2, 2020 at 19:41

Is uint16_t* u16 = (uint16_t*)&data; really undefined behavior? AFAIK nothing will break unless you actually dereference u16 here. – Microcrystalline 1/11, 2022 at 15:58

@Microcrystalline Strictly speaking the UB doesn't occur on that line indeed, unless it leads to a misaligned pointer. I updated the answer. – Sabina 1/11, 2022 at 16:4

is there an official resource where it is confirmed "undefined behavior in C++" for such union use? – Bandeau 19/1 at 10:33

Downvoted, as this answer merely adds an example, but provides no insights of why this is bad and gives too little detail. E.g. addressing a chunk of memory as char* for serialisation is definitely not UB, but the reader might draw an opposite conclusion from the answer. – Differ 9/2 at 10:9

@Differ Maybe you should post an answer yourself then. But strict aliasing is a FAQ and the linked post addresses that in detail. If we were merely to repeat what was already said in such posts, the question should have been closed as a duplicate instead. Also this question just asked what the term means, not if it is good practice or not. – Sabina 9/2 at 11:15

you show some bad code, how about showing good code as well? – Eustashe 11/6 at 18:2

@TheFool That would be impossible since C++ does not support union type punning. Thereby making it an unsuitable language for things like low level microcontroller register maps/embedded systems programming. The problem isn't the code but the language - simply port it to C and the "bad" good turns into good code. – Sabina 12/6 at 6:54

@Lundin, ok thanks for the reply. Im writing C anyway so, I will use unions good. – Eustashe 12/6 at 10:40

Type punning and aliasing are distinct but related concepts that some compiler writers seem unable to distinguish despite their being largely orthogonal.

Type punning refers to situations in which storage is written as one type and read as another type, typically for the purpose of allowing a value to be interpreted as a sequence of bits, allowing a sequence of bits to be interpreted as a value, or allowing a value to be used as another type whose representation matches, at least in the portion of interest. For example, the latter form of type punning may be useful in situations where one may have pointers to a variety of structure types, all of which share a Common Initial Sequence, and may need to operate on common-initial-sequence members of all of those structures despite the structures' different types. Note that even though the Standard includes explicit guarantees which would suggest that the latter form of type punning is supposed to be useful, compilers that confuse it with aliasing don't support such constructs.

Aliasing refers to a different concept in which storage is accessed using two or more simultaneously-active but seemingly-unrelated means, in ways that interact with each other. Given something like:

int test1(int *p1, int *p2)
{
  *p1 = 1;
  *p2 = 2;
  return *p1;
}

if p1==p2, then p1 and p2 will alias since p1 will be used to access the storage identified by p2 sometime between the creation and last use of p2, in a context wherein p1 cannot have been created from p2 [it's possible that p1 might have been created from p2 before the function was called, but there's no way p1 could have been derived from p2 within the function]. Because the Standard allows aliasing between lvalues that identify the same type, however, the above construct would have defined behavior when p1==p2, despite the fact that p1 and p2 alias.

On the other hand, given something like:

struct s1 {int x; };
struct s2 {int x; };
union s1s2 {struct s1 v1; struct s2 v2; } uarr[100];

int test1(int i, int j)
{
  int temp;
  { struct s1 *p1 = &uarr[i].v1; temp = p1->x; }

  if (temp)
    { struct s2 *p2 = &uarr[j].v2; p2->x = 1; }

  { struct s1 *p3 = &uarr[i].v1; temp = p3->x; }
  return temp;
}

Here, the pointers p1, p2, and p3 have obviously-disjoint lifetimes and consequently are not simultaneously active and do alias each other. Each pointer is independently derived from uarr, and the lifetime of each pointer will end prior to the next use of uarr. Consequently, this code makes use of type punning to access the same storage as both a struct s1 and a struct s2, but as written does not exploit aliasing since all the accesses to the storage in question are visibly derived from the same root-level object uarr.

Unfortunately, even though type-based access rules were intended (according to both the Rationale and a footnote) to indicate when things are allowed to alias, some compilers interpret them in ways that make language features such as the Common Initial Sequence guarantee essentially useless, since they use the type-access rules as an excuse to rewrite the code in such a way as to remove the derivation of p3 from uarr, thus introducing aliasing where there had been none.

Ettieettinger answered 25/2, 2019 at 21:49 Comment(0)

There are perfectly good reasons to use punning. Imagine you want to transmit data over a serial link but the data is actually a packed structure of different types. The packed structure is sent as a BYTE array, but to display the data which is of different types...

int main(void)  
{
    unsigned char a[10] = {1,2,3,4,5,6,7,8,9,0};
    unsigned int x,y,z;

    x = *(unsigned int*) a;
    y = *(unsigned int*) (a+1);
    z = *((unsigned int*) a+1);

    printf("x = %08X, y = %08X, z = %08X\n",x,y,z);

    return 0;
}

Answer: x = 04030201, y = 05040302, z = 08070605

Note that this is little endian (LSB in lower memory)

Uyekawa answered 23/2, 2019 at 20:28 Comment(0)

Recommended topics

Hot tags