Language standards try to strike a balance between the sometimes competing interests of programmers that will use the language and compiler writers that want to use a broad set of optimizations to generate reasonably fast code. Keeping variables in registers is one such optimization. For variables that are "live" in a section of a program the compiler tries to allocate them in registers. Storing at the address in a pointer could store anywhere in the program's address space - which would invalidate every single variable in a register. Sometimes the compiler could analyze a program and figure out where a pointer could or could not be pointing, but the C (and C++) language standards consider this an undue burden, and for "system" type of programs often an impossible task. So the language standards relax the constraints by specifying that certain constructs lead to "undefined behavior" so the compiler writer can assume they don't happen and generate better code under that assumption. In the case of strict aliasing
the compromise reached is that if you store to memory using one pointer type, then variables of a different type are assumed to be unchanged, and thus can be kept in registers, or stores and loads to these other types can be reordered with respect to the pointer store.
There are many examples of these kind of optimizations in this paper "Undefined Behavior: What Happened to My Code?"
http://pdos.csail.mit.edu/papers/ub:apsys12.pdf
There is an example there of a violation of the strict-aliasing rule in the Linux kernel, apparently the kernel avoids the problem by telling the compiler not to make use of the strict-aliasing rule for optimizations "The Linux kernel uses -fno-strict-aliasing to
disable optimizations based on strict aliasing."
struct iw_event {
uint16_t len; /* Real length of this stuff */
...
};
static inline char * iwe_stream_add_event(
char * stream, /* Stream of events */
char * ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len ) /* Size of payload */
{
/* Check if it's possible */
if (likely((stream + event_len) < ends)) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
Figure 7: A strict aliasing violation, in include/net/iw_handler.h of the
Linux kernel, which uses GCC’s -fno-strict-aliasing
to prevent possible
reordering.
2.6 Type-Punned Pointer Dereference
C gives programmers the freedom to cast pointers of one type
to another. Pointer casts are often abused to reinterpret a given
object with a different type, a trick known as type-punning. By
doing so, the programmer expects that two pointers of different
types point to the same memory location (i.e., aliasing).
However, the C standard has strict rules for aliasing. In
particular, with only a few exceptions, two pointers of different
types do not alias [19, 6.5]. Violating strict aliasing leads to
undefined behavior.
Figure 7 shows an example from the Linux kernel. The
function first updates iwe->len, and then copies the content of
iwe, which contains the updated iwe->len, to a buffer stream
using memcpy. Note that the Linux kernel provides its own optimized memcpy implementation. In this case, when event_len
is a constant 8 on 32-bit systems, the code expands as follows.
iwe->len = 8;
*(int *)stream = *(int *)((char *)iwe);
*((int *)stream + 1) = *((int *)((char *)iwe) + 1);
The expanded code first writes 8 to iwe->len, which is of
type uint16_t, and then reads iwe, which points to the same
memory location of iwe->len, using a different type int. According to the strict aliasing rule, GCC concludes that the read
and the write do not happen at the same memory location,
because they use different pointer types, and reorders the two
operations. The generated code thus copies a stale iwe->len
value. The Linux kernel uses -fno-strict-aliasing
to disable optimizations based on strict aliasing.
Answers
1) What optimizations could the compiler perform in this aliasing case ?
The language standard is very specific about the semantics (behavior) of a strictly conforming program - the burden is on the compiler writer or language implementor to get it right. Once the programmer crosses the line and invokes undefined behavior then the standard is clear that the burden of proof that this will work as intended falls on the programmer, not on the compiler writer - the compiler in this case has been nice enough to warn that undefined behavior has been invoked although it is under no obligation to even do that. Sometimes annoyingly people will tell you that at this point "anything can happen" usually followed by some joke/exaggeration. In the case of your program the compiler could generate code that is "typical for the platform" and store to localval
the value of something
and then load from localval
and store at DataPtr
, like you intended, but understand that it is under no obligation to do so. It sees the store to localval
as a store to something of uint32
type and it sees the dereference of the load from (*(const float32*)((const void*)(&localval)))
as a load from a float32
type and concludes these aren't to the same location so localval
can be in a register containing something
while it loads from an uninitialized location on the stack reserved for localval
should it decide it needs to "spill" that register back to its reserved "automatic" storage (stack). It may or may not store localval
to memory before dereferencing the pointer and loading from memory. Depending on what follows in your code it may decide that localval
isn't used and the assignment of something
has no side-effect, so it may decide that assignment is "dead code" and not even do the assignment to a register.
2) As both would occupy the same size (correct me if not) what could be the side affects of such a compiler optimization ?
The effect could be that an undefined value is stored at the address pointed to by DataPtr
.
3) Can I safely ignore the warning or turn off aliasing ?
That is specific to the compiler you are using - if the compiler documents a way to turn off the strict aliasing optimizations then yes, with whatever caveats the compiler makes.
4) If the compiler hasn't performed an optimization and my program is not broken after my first compilation ? Can i safely assume that every time the compiler would behave the same way (does not do optimizations) ?
Maybe, sometimes very small changes in another part of your program could change what the compiler does to this code, think for a moment if the function is "inlined" it could be thrown in the mix of some other part of your code, see this SO question.
5) Does the aliasing apply to a void * typecast too ? or is it applicable only for the standard typecasts (int,float etc...) ?
You cannot dereference a void *
so the compiler just cares about the type of your final cast (and in C++ it would gripe if you convert a const
to non-const
and vice-versa).
6) what are the affects if I disable the aliasing rules ?
See your compiler's documentation - in general you will get slower code, if you do this (like the Linux kernel chose to do in the example from the paper above) then limit this to a small compilation unit, with only the functions where this is necessary.
Conclusion
I understand your questions are for curiosity and trying to better understand how this works (or might not work). You mentioned it is a requirement that the code be portable, by implication then it is a requirement that the program be compliant and not invoke undefined behavior (remember, the burden is on you if you do). In this case, as you pointed out in the question, one solution is to use memcpy
, as it turns out not only does that make your code compliant and therefore portable, it also does what you intend in the most efficient way possible on current gcc with optimization level -O3
the compiler converts the memcpy
into a single instruction storing the value of localval
at the address pointed to by DataPtr
, see it live in coliru here - look for the movl %esi, (%rdi)
instruction.
memcpy
. Code might not work correctly if these 2 types have different alignment requirements. Not likely in when both types have the same size, but is a possibility, and that shouldn't be ignored when writing portable code. – Indebtedmemcpy
is standard compliant, therefore portable, and in this case it is as efficient as it can be... the compiler generates one instruction, a 32-bit store: check it out coliru.stacked-crooked.com/a/0c8fecda1194b87b, look formovl %esi, (%rdi)
– Corpse