C++'s Strict Aliasing Rule - Is the 'char' aliasing exemption a 2-way street?
Asked Answered
M

2

16

Just a couple weeks ago, I learned that the C++ Standard had a strict aliasing rule. Basically, I had asked a question about shifting bits -- rather than shifting each byte one at a time, to maximize performance I wanted to load my processor's native register's with (32 or 64 bits, respectively) and perform the shift of 4/8 bytes all in a single instruction.

This is the code I wanted to avoid:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };

for (int i = 0; i < 3; ++i)
{
  buffer[i] <<= 4; 
  buffer[i] |= (buffer[i + 1] >> 4);
}
buffer[3] <<= 4;

And instead, I wanted to use something like:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };
unsigned int *p = (unsigned int*)buffer; // unsigned int is 32 bit on my platform
*p <<= 4;

Someone called out in a comment that my proposed solution violated the C++ Aliasing rules (because p was of type int* and buffer was of type char* and I was dereferencing p to perform the shift. (Please ignore possible issues of alignment and byte order -- I handle those outside of this snippet) I was quite surprised to learn about he Strict Aliasing rule since I regularly operate on data from buffers, casting it from one type to another and have never had any issue. Further investigation revealed that the compiler I use (MSVC) doesn't enforce strict aliasing rules and since I only develop on gcc/g++ in my spare time as a hobby, I likely just hadn't encountered the issue yet.

So then I asked a question about Strict Aliasing Rules and C++'s Placement new operator:

IsoCpp.org offers a FAQ regarding placement new and they provide the following code example:

#include <new>        // Must #include this to use "placement new"
#include "Fred.h"     // Declaration of class Fred
void someCode()
{
  char memory[sizeof(Fred)];     // Line #1
  void* place = memory;          // Line #2
  Fred* f = new(place) Fred();   // Line #3 (see "DANGER" below)
  // The pointers f and place will be equal
  // ...
}

The example is simple enough, but I'm asking myself, "What if someone calls a method on f -- e.g. f->talk()? At that point we would be dereferencing f, which points to the same memory location as memory (of type char*. I've read numerous places that there is an exemption for variables of type char* to alias any type, but I was under the impression that it wasn't a "two-way street" -- meaning, char* can alias (read/write) any type T, but type T can only be used to alias a char* if T itself is of char*. As I'm typing this, that doesn't make any sense to me and so I'm leaning towards the belief that the claim that my initial (bit shifting example) violated the strict aliasing rule is false.

Can someone please explain what is correct? I've been going nuts with trying to understand what is legal and what is not (despite having read numerous websites and SO posts on the topic)

Thank you

Moxa answered 16/5, 2016 at 17:38 Comment(3)
If calling member functions of f were undefined behavior, that would make placement new kind of useless wouldn't it?Sauls
"which points to the same memory location as memory (of type char*)" - the type of memory is not char *. On line 2 the array decays to a pointer, but that doesn't mean memory is a pointer. And, even if memory was a pointer, the type of the memory location it pointed to would be char, not char *.Threemaster
It seems that your question is about whether f->talk() is OK; I think it would improve the question to delete all the preamble (the stuff before "So then")Tersina
E
8

The aliasing rule means that the language only promises your pointer dereferences to be valid (i.e. not trigger undefined behaviour) if:

  • You access an object through a pointer of a compatible class: either its actual class or one of its superclasses, properly cast. This means that if B is a superclass of D and you have D* d pointing to a valid D, accessing the pointer returned by static_cast<B*>(d) is OK, but accessing that returned by reinterpret_cast<B*>(d) is not. The latter may have failed to account for the layout of the B sub-object inside D.
  • You access it through a pointer to char. Since char is byte-sized and byte-aligned, there is no way you could not be able to read data from a char* while being able to read it from a D*.

That said, other rules in the standard (in particular those about array layout and POD types) can be read as ensuring that you can use pointers and reinterpret_cast<T*> to alias two-way between POD types and char arrays if you make sure to have a char array of the apropriate size and alignment.

In other words, this is legal:

int* ia = new int[3];
char* pc = reinterpret_cast<char*>(ia);
// Possibly in some other function
int* pi = reinterpret_cast<int*>(pc);

While this may invoke undefined behaviour:

char* some_buffer; size_t offset; // Possibly passed in as an argument
int* pi = reinterpret_cast<int*>(some_buffer + offset);
pi[2] = -5;

Even if we can ensure that the buffer is big enough to contain three ints, the alignment might not be right. As with all instances of undefined behaviour, the compiler may do absolutely anything. Three common ocurrences could be:

  • The code might Just Work (TM) because in your platform the default alignment of all memory allocations is the same as that of int.
  • The pointer cast might round the address to the alignment of int (something like pi = pc & -4), potentially making you read/write to the wrong memory.
  • The pointer dereference itself may fail in some way: the CPU could reject misaligned accesses, making your application crash.

Since you always want to ward off UB like the devil itself, you need a char array with the correct size and alignment. The easiest way to get that is simply to start with an array of the "right" type (int in this case), then fill it through a char pointer, which would be allowed since int is a POD type.

Addendum: after using placement new, you will be able to call any function on the object. If the construction is correct and does not invoke UB due to the above, then you have successfully created an object at the desired place, so any calls are OK, even if the object was non-POD (e.g. because it had virtual functions). After all, any allocator class will likely use placement new to create the objects in the storage that they obtain. Note that this only necessarily true if you use placement new; other usages of type punning (e.g. naïve serialization with fread/fwrite) may result in an object that is incomplete or incorrect because some values in the object need to be treated specially to maintain class invariants.

Exact answered 16/5, 2016 at 20:32 Comment(9)
Doesn't [basic.stc.dynamic.allocation] guarantee that your example does have the correct alignment?Hanford
@Hurkyl new is not required to return the same address that was returned by the allocator (i.e. it could return some offset into it). So, no, I don't see the allocator requirement as guaranteeing that new char[3*sizeof(int)] will necessarily return a pointer with suitable alignment for int, even if the standard allocator is guaranteed to do so.Threemaster
[expr.new]/11 guarantees proper alignment. Nonetheless, it is UB, because there is no living int object anywhere, only a bunch of chars.Isochronous
@Isochronous If alignment is guaranteed, it is not UB to reinterpret_cast POD types into one another (I have updated my answer). The values that will be obtained are implementation-defined due to things like byte order, but not undefined. It only becomes UB when you move to non-POD types, since you might be breaking class invariants, missing vtable pointers, etc.Halftimbered
thanks, the examples and explanation of reinterpret/static cast were insightfu.Moxa
Your second example is not very good; it might be defined behaviour depending on what some_buffer points to. E.g. if it came from new int[3].Tersina
This does not address the question of whether f->talk() is OKTersina
@Tersina Functions can be called indeed if the object was (correctly) created through placement new. I have added a section at the end to that effect.Halftimbered
Please address whether unsigned char is also valid or not, since it is also byte-sized and byte-aligned.Policewoman
U
0

As a matter of fact, explanation of standard rule regarding pointer type punning through strict aliasing is not neccessarily correct or easy to understand. Standard doesn't mention 'strict aliasing', and I find original standard wording easier to understand and reason about.

In essence, it says that you can only access an object thorugh a pointer to the related type which is suited to access this object (such as the same type or related class type) or through a pointer to char.

As you see, the question of 'two-way street' is not even applicable.

Ushijima answered 16/5, 2016 at 17:44 Comment(2)
I don' see how the two-way question is not applicable.Kayseri
"or through a pointer to char*" - no, it's through a pointer to char. You have introduced a level of indirection.Threemaster

© 2022 - 2024 — McMap. All rights reserved.