Why does optimisation kill this function?
Asked Answered
P

3

50

We recently had a lecture in university about programming specials in several languages.

The lecturer wrote down the following function:

inline u64 Swap_64(u64 x)
{
    u64 tmp;
    (*(u32*)&tmp)       = Swap_32(*(((u32*)&x)+1));
    (*(((u32*)&tmp)+1)) = Swap_32(*(u32*) &x);

    return tmp;
}

While I totally understand that this is also really bad style in terms of readability, his main point was that this part of code worked fine in production code until they enabled a high optimization level. Then, the code would just do nothing.

He said that all the assignments to the variable tmp would be optimized out by the compiler. But why would this happen?

I understand that there are circumstances where variables need to be declared volatile so that the compiler doesn't touch them even if he thinks that they are never read or written but I wouldn't know why this would happen here.

Planetarium answered 4/1, 2014 at 15:8 Comment(12)
This code exhibits undefined behavior. The compiler is legally allowed to do anything at all (see "nasal demons"). "Optimized away to a no-op" is one possible manifestation of undefined behavior.Decalescence
look for strict aliasing.Ecstatic
Some compilers can produce assembly output (e.g. gcc -S). I'd be interested to see what it produced in each case.Bailey
I just tried with gcc (Ubuntu/Linaro 4.7.2-5ubuntu1) 4.7.2 and it gives me the expected results for all optimization levels. (Of course this doesn't prove anything but I tried to find an optimization level at which it fails)Apostate
@Apostate Unfortunately, I don't know which compiler setup was used as this was just an example in textual form, no sample project.Planetarium
@leemes, check my other answer.Materfamilias
@Apostate I was unfortunately also unable to reproduce but I am also not able to reproduce the first example from Understanding Strict Aliasing so that make me feel better but again as you said it is undefined so not being able reproduce is not necessarily surprising.Ecclesiolatry
In addition to string aliasing I thought using casts as an lvalue was also illegal. (This is just from memory though, could not find the chapter and verse...)Aekerly
@Planetarium if you figure out the details, it would nice to know especially if it can help reproduce similar results in different contexts. If you really want to understand the topic better I doubt you will find a better reference than the Understanding Strict Aliasing article I linked in my answer.Ecclesiolatry
@Aekerly the casts are not used as lvalues. They are dereferenced first.Alike
@Alike - I am fairly certain I have seen compilers complain at me for exactly this construct. (lvalue based on dereference of pointer cast)Aekerly
@Aekerly "fairly sure" doesn't mean much :p some compilers might warn about the aliasing violation, but C99 6.5.3.2 paragraph 4 says that the result of dereferencing a pointer (other than a function pointer) is an lvalue, and it doesn't impose any further restrictions on such lvalues. Note however that the result of the de-reference will be UB (as well as the potential aliasing violation, the result of the cast itself is not defined). Seeing as the UB can only occur at run-time the compiler is obliged to compile the code regardless.Alike
E
48

This code violates the strict aliasing rules which makes it illegal to access an object through a pointer of a different type, although access through a *char ** is allowed. The compiler is allowed to assume that pointers of different types do not point to the same memory and optimize accordingly. It also means the code invokes undefined behavior and could really do anything.

One of the best references for this topic is Understanding Strict Aliasing and we can see the first example is in a similar vein to the OP's code:

uint32_t swap_words( uint32_t arg )
{
  uint16_t* const sp = (uint16_t*)&arg;
  uint16_t        hi = sp[0];
  uint16_t        lo = sp[1];

  sp[1] = hi;
  sp[0] = lo;

 return (arg);
} 

The article explains this code violates strict aliasing rules since sp is an alias of arg but they have different types and says that although it will compile, it is likely arg will be unchanged after swap_words returns. Although with simple tests, I am unable to reproduce that result with either the code above nor the OPs code but that does not mean anything since this is undefined behavior and therefore not predictable.

The article goes on to talk about many different cases and presents several working solution including type-punning through a union, which is well-defined in C991 and may be undefined in C++ but in practice is supported by most major compilers, for example here is gcc's reference on type-punning. The previous thread Purpose of Unions in C and C++ goes into the gory details. Although there are many threads on this topic, this seems to do the best job.

The code for that solution is as follows:

typedef union
{
  uint32_t u32;
  uint16_t u16[2];
} U32;

uint32_t swap_words( uint32_t arg )
{
  U32      in;
  uint16_t lo;
  uint16_t hi;

  in.u32    = arg;
  hi        = in.u16[0];
  lo        = in.u16[1];
  in.u16[0] = lo;
  in.u16[1] = hi;

  return (in.u32);
}

For reference the relevant section from the C99 draft standard on strict aliasing is 6.5 Expressions paragraph 7 which says:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:76)

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

and footnote 76 says:

The intent of this list is to specify those circumstances in which an object may or may not be aliased.

and the relevant section from the C++ draft standard is 3.10 Lvalues and rvalues paragraph 10

The article Type-punning and strict-aliasing gives a gentler but less complete introduction to the topic and C99 revisited gives a deep analysis of C99 and aliasing and is not light reading. This answer to Accessing inactive union member - undefined? goes over the muddy details of type-punning through a union in C++ and is not light reading either.


Footnotes:

  1. Quoting comment by Pascal Cuoq: [...]C99 that was initially clumsily worded, appearing to make type-punning through unions undefined. In reality, type-punning though unions is legal in C89, legal in C11, and it was legal in C99 all along although it took until 2004 for the committee to fix incorrect wording, and the subsequent release of TC3. open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm
Ecclesiolatry answered 6/1, 2014 at 18:0 Comment(25)
No, its explicitly NOT undefined behavior. Instead, it gives an unspecified value, which might be any value (including a trap value if those exist), but otherwise has well-defined behavior. So the result of type punning via a union can't do ANYTHING, but it might give any value (including appearing to do nothing). But for types that can't have trap values (such as unsigned ints), it won't crash.Guerrilla
yes, but you quoted the C99 spec, not the C++ spec. Also, I believe the C rules apply even in C++ for standard layout unions. While non-standard layout unions are possible in C++11, they're not terribly useful.Guerrilla
@ChrisDodd I fixed up the wording and added relevant references. Pascal Cuoq's comment here is very interesting wrt to C99 and explains some of the confusion in regards to C99 and type-punning that many including me had.Ecclesiolatry
How about clarifying the difference: The effective type in C is the dynamic type in C++, unless the object was declared with a static type (i.e. is not anonymous / dynamically allocated memory). In C++ the rules are the same but use the dynamic instead of the effective type.Adeno
@Adeno they do behave differently but I am not sure how explaining the difference would help here. Do you have a specific example in mind here?Ecclesiolatry
@ShafikYaghmour: How about writing your own allocator using a static block of memory (char storage[BigN])? Possible in C++, UB in C.Adeno
The strict aliasing rules aren't even necessarily violated, since the cast from a u64 * to a u32 * results in an unspecified pointer value (I mean, that particular conversion is not defined, other than that casting it back to the u64 * should yield the original pointer value).Alike
@davmac: I believe that unless u32 had coarser alignment requirements than u64 (which would be very odd), casting u64* to u32* is required to yield a u32* which, if cast to unsigned char*, will yield the same result as casting the u64* directly to type unsigned char*. That does not imply that all pointers types have the same bitwise representation, but defines the meaning of the pointer much more rigidly than you imply (though there are limits as to how one can actually use the pointer).Lumumba
@Lumumba there's no such requirement in the language spec. The conversions that have a defined value are essentially to another pointer type and back (but not just to another pointer type) and to a char * and (arguably) "to a sub-object at it's [an objects] beginning". But if you go to char * via another pointer type, then it's not specified what (if anything) the resulting pointer points to (at least in C).Alike
@davmac: Given typedef union { u64 ll; u32 ww[2] unsigned char bb[8];} u3264; u64 *p; would (u32*)p be allowed to yield something different from ((u3264*)p)->ww;? Are there any implementations where it would usefully do so? If the answer to the first question is yes, but the answer to the second question is no, what advantage could there be to not requiring the former expression to behave like the latter?Lumumba
@Lumumba Q1: This gets hairy quickly. I think you meant "would *(u32 *)p be allowed...". I believe this potentially falls into the "sub-object at its beginning" category, so assuming that p actually points to an object of type u3264, and ww is the active union member, then it must be the case that *(u32*)p == ((u3264 *)p)->ww (i.e. the answer is no, although this relies on the notion that each member of a union is "at its beginning" which isn't strongly supported in the standard, I think). Without those constraints the answer is "yes". For Q2, I doubt it...Alike
@davmac: The type of ww is (u32*).Lumumba
@Lumumba Ah I see, I missed that it was an array. My answer is the same: given the same constraints as above, (u32*)p == ((u3264 *)p)->ww is assured, IIUC, due to the "subobject at the beginning" clause.Alike
@Lumumba in fact C99 6.7.2.1: "A pointer to a union object, suitably converted, points to each of its members". This is similar to the "subject at the beginning" clause for structs, but is explicitly for unions. The standard doesn't elaborate on "suitably converted" so I interpret it as "converted to a pointer of the appropriate type".Alike
@davmac: FYI, I finally figured out something I hadn't realized: accessing any item in a union accesses all of them, and all of those accesses have to be valid under the type rules for behavior to be defined. So passing a pointer to a union member to code that dereferences it using a pointer of any type other than the union type or a character type is Undefined Behavior.Lumumba
@supercat, no. That interpretation would disallow unions with members whose types were incompatible by 6.5p7 - effectively making unions completely pointless.Alike
@Lumumba (see footnote 37 from 6.2.5p22, which says that “an object with union type can contain only one member at a time”. You can't access members that don't exist at the time; only the "active" member is accessed. The issue of aliasing violation only arises when trying to access the wrong member, the member that the union object doesn't currently contain).Alike
@davmac: Given union flint {float f; int i;} u;, the statement u.i=23; uses an lvalue of type "union flint" to modify flint, flint.i, and flint.u. The first is value because the lvalue type matches that of the object. The second and third are value because the lvalue type is a union containing int and float, respectively.Lumumba
@davmac: If you think such an interpretation would be horribly restrictive and stupid, I with you 100%, since they make it impossible to write many kinds of code efficiently, but to have any assurance that code will work on gcc without -fno-strict-aliasing it will need to abide by those silly rules.Lumumba
@Lumumba 6.5.2.3 "A postfix expression followed by the . operator designates a member of a structure or union object. The value is that of the named member, and is an lvalue ..." - so, the type of the lvalue in the access for your example is that of the member, not that of the union. But, this discussion doesn't belong here.Alike
@davmac: It must be possible to access inactive members. Writing to a member is an access, and it is only by writing to a member that it becomes active. Further, whether or not one regards using an int* to write an int within a flint as an access to the float, it's certainly an access to the flint; would you dispute that? Is there anything in the Standard that would allow such an access? Further. saying myFlint.i to write to myFlint is clearly using type flint in the operation, is it not?Lumumba
Let us continue this discussion in chat.Alike
gcc allows -fno-strict-aliasing so you can break the rules but why is this so?Petrel
@Nubcake: The C Standard was written to describe an already-existing language which allowed programmers to exploit type punning on platforms where doing so would be useful. Because some tasks required this ability, but tasks that didn't require it could be processed more efficiently if compilers didn't have to accommodate it, the Standard allows implementations intended for low-level programming tasks to process such constructs "in a documented fashion characteristic of the environment", without requiring that implementations that aren't intended to be suitable for such use do likewise.Lumumba
@Nubcake: GCC may be configured to operate in a manner suitable for low-level programming by disabling most optimizations, or in a manner that may generate faster code but is unsuitable for such purposes by enabling maximum optimizations.Lumumba
E
47

In C++, pointer arguments are assumed not to alias (except char*) if they point to fundamentally different types ("strict aliasing" rules). This allows some optimizations.

Here, u64 tmp is never modified as u64.
A content of u32* is modified but may be unrelated to 'u64 tmp' so may be seen as nop for u64 tmp.

Ecstatic answered 4/1, 2014 at 15:26 Comment(3)
So reversing the order of 8 char*s in an analogous way is defined behavior?Apostate
@leemes: Yes, because char and unsigned char "alias" all other types (C++11 3.10p10).Bolger
Thanks for this answer! Will look into strict aliasing!Planetarium
M
10

g++ (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1:

> g++ -Wall -std=c++11 -O0 -o sample sample.cpp

> g++ -Wall -std=c++11 -O3 -o sample sample.cpp
sample.cpp: In function ‘uint64_t Swap_64(uint64_t)’:
sample.cpp:10:19: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     (*(uint32_t*)&tmp)       = Swap_32(*(((uint32_t*)&x)+1));
                   ^
sample.cpp:11:54: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     (*(((uint32_t*)&tmp)+1)) = Swap_32(*(uint32_t*) &x);
                                                      ^

Clang 3.4 doesn't warn in any optimization level, which is curious...

Materfamilias answered 4/1, 2014 at 15:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.