Do I understand C/C++ strict-aliasing correctly?
Asked Answered
S

2

7

I've read this article about C/C++ strict aliasing. I think the same applies to C++.

As I understand, strict aliasing is used to rearrange the code for performance optimization. That's why two pointers of different (and unrelated in C++ case) types cannot refer to the same memory location.

Does this mean that problems can occur only if memory is modified? Apart of possible problems with memory alignment.

For example, handling network protocol, or de-serialization. I have a byte array, dynamically allocated and packet struct is properly aligned. Can I reinterpret_cast it to my packet struct?

char const* buf = ...; // dynamically allocated
unsigned int i = *reinterpret_cast<unsigned int*>(buf + shift); // [shift] satisfies alignment requirements
Sledge answered 6/9, 2011 at 14:27 Comment(4)
No you can't reinterpret_cast -- you may be interested in the example I use in my answer here. It calls out a very similar situation to what you're trying to do: stackoverflow.com/questions/98650/…Myrnamyrobalan
How is this question different from “Where can I find documentation on C++ memory alignment across different platforms/compilers?”?Taunyataupe
@Konrad Rudolph: Because it has different title! :) seriously, there I ask about memory alignment and I received really good answers. Here I speak about strict aliasing in relation to compiler code rearrangement, and again received a good answer unrelated to memory alignment (actually just a comment, in opposite case it would be a good candidate for acceptance).Sledge
@Doug: How can this code fail?Johnathanjohnathon
P
7

The problem here is not strict aliasing so much as structure representation requirements.

First, it is safe to alias between char, signed char, or unsigned char and any one other type (in your case, unsigned int. This allows you to write your own memory-copy loops, as long as they're defined using a char type. This is authorized by the following language in C99 (§6.5):

 6. The effective type of an object for an access to its stored value is the declared type of the object, if any. [Footnote: Allocated objects have no declared type] [...] If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

 7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [Footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.]

  • a type compatible with the effective type of the object,
  • [...]
  • a character type.

Similar language can be found in the C++0x draft N3242 §3.11/10, although it is not as clear when the 'dynamic type' of an object is assigned (I'd appreciate any further references on what the dynamic type is of a char array, to which a POD object has been copied as a char array with proper alignment).

As such, aliasing is not a problem here. However, a strict reading of the standard indicates that a C++ implementation has a great deal of freedom in choosing a representation of an unsigned int.

As one random example, unsigned ints might be a 24-bit integer, represented in four bytes, with 8 padding bits interspersed; if any of these padding bits does not match a certain (constant) pattern, it is viewed as a trap representation, and dereferencing the pointer will result in a crash. Is this a likely implementation? Perhaps not. But there have been, historically, systems with parity bits and other oddness, and so directly reading from the network into an unsigned int, by a strict reading of the standard, is not kosher.

Now, the problem of padding bits is mostly a theoretical issue on most systems today, but it's worth noting. If you plan to stick to PC hardware, you don't really need to worry about it (but don't forget your ntohls - endianness is still a problem!)

Structures make it even worse, of course - alignment representations depend on your platform. I have worked on an embedded platform in which all types have an alignment of 1 - no padding is ever inserted into structures. This can result in inconsistencies when using the same structure definitions on multiple platforms. You can either manually work out the byte offsets for data structure members and reference them directly, or use a compiler-specific alignment directive to control padding.

So you must be careful when directly casting from a network buffer to native types or structures. But the aliasing itself is not a problem in this case.

Puparium answered 6/9, 2011 at 14:48 Comment(10)
I thought that it was safe to create a char* reference to a pointer of a different type but not the other way around.Murry
No, it goes both ways as long as you maintain proper alignment.Puparium
@Mark B: mentioned doc says the same. So what's the reason to cast to char* if you cannot cast back?Sledge
@Andy T I believe it's to allow you to copy raw bytes around (say with memcpy).Murry
@Mark B: What's next? I copied and now I need to convert it to original state? How can I do this if converting from char* to any other type is UB?Sledge
@Mark, Andy: Updated with references to the C99 and C++0x standards (draft in the C++0x case)Puparium
It is definitely also a strict-aliasing problem, independent of alignment concerns. Even if your alignment/representation concerns are met, the compiler may still make optimisations that break your code. Your answer doesn't address the question, but I won't downvote because it does contain useful information!Mikesell
@Oli, my read of the standard language here is that, if you manage to restore a series of bytes that happen to match the implementation's representation of ints, into a properly aligned portion of a char array, at that moment it assumes an 'effective type' of an int, and thus can be accessed by an int pointer. Thus, no strict aliasing problem. Is there a problem with my analysis?Puparium
@bdonlan: I've just realised that we're talking about dynamically-allocated memory, in which case I believe your interpretation is correct!Mikesell
@Oli, indeed, a statically allocated array would have an effective type of char statically assigned - moreover, there's no strictly conforming way of properly aligning the object inside a statically allocated array (since casting to an integer type and modulo isn't guaranteed to give you the right results)Puparium
M
0

Actually this code already has UB at the point you dereference the reinterpret_casted integer pointer without even needing to invoke strict-aliasing rules. Not only that, but if you aren't rather careful, reinterpreting directly to your packet structure could cause all sorts of issues depending on struct packing and endianness.

Given all that, and that you're already invoking UB I suspect that it's "likely to work" on multiple compilers and you're free to take that (possibly measurable) risk.

Murry answered 6/9, 2011 at 14:49 Comment(3)
It's not undefined behavior if the original char* was created by reinterpret_casting an unsigned int*, or if it is correctly aligned, and the memory it points to contains a memcpy of an unsigned int.Designedly
@James Kanze doesn't the OP say that the char* block is "dynamically allocated" (which would preclude it being a reinterpreted unsigned)?Murry
But would ensure that it was sufficiently aligned, and that an actual unsigned int could have been memcpyed into it.Designedly

© 2022 - 2024 — McMap. All rights reserved.