Violating strict-aliasing, even without any casting?
Asked Answered
P

2

1

I think I'm really asking: is aliasing "transitive"? If the compiler knows that A might alias B, and B might alias C, then surely it should remember that A might therefore alias C. Perhaps this "obvious" transitive logic isn't required however?

An example, for clarity. The most interesting example, to me, of a strict-aliasing issue:

// g++    -fstrict-aliasing -std=c++11 -O2
#include <iostream>

union
{   
    int i;
    short s;
} u;
int     * i = &u.i;

int main()
{   

    u.i = 1; // line 1
    *i += 1; // line 2

    short   & s =  u.s;
    s += 100; // line 3

    std::cout
        << " *i\t" <<  *i << std::endl // prints 2
        << "u.i\t" << u.i << std::endl // prints 101
        ;

    return 0;
}

g++ 5.3.0, on x86_64 (but not clang 3.5.0) gives the above output, where *i and u.i give different numbers. But they should give exactly the same number, because i is defined at int * i = &u.i; and i doesn't change.

I have a theory: When 'predicting' the value of u.i, the compiler asks which lines might affect the contents of u.i. That includes line 1 obviously. And line 2 because int* can alias an int member of a union. And line 3 also, because anything that can affect one union member (u.s) can affect another member of the same union. But when predicting *i it doesn't realise that line 3 can affect the int lvalue at *i.

Does this theory seem reasonable?

I find this example funny because I don't have any casting in it. I managed to break strict-aliasing with doing any casting.

Parallax answered 28/9, 2016 at 20:40 Comment(4)
en.cppreference.com/w/cpp/language/unionPerquisite
Firstly, union-based type punning is only allowed in C. Secondly, the permission to type-pun is only given when union members are accessed directly as union members. Otherwise, your transitivity would immediately obliterate all aliasing restrictions, since the compiler would generally have to assume that two unrelated pointers might point to members of one union object.Liability
(I'm now trying to delete this question. I never knew that C++ unions were so different than in C. But I can't delete it. Sorry for the dumb question folks!)Parallax
I've just asked the C-based version of this question here: #39758158Parallax
T
5

Reading from inactive member of a union is undefined in C++. (It's legit in C99 and C11).

So, all in all, the compiler isn't required to assume/remember anything.

Standardese:

N4140 §9.5[class.union]/1

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

Tzong answered 28/9, 2016 at 20:41 Comment(5)
I thought I'd read a lot of stuff about strict-aliasing on SO, but I never noticed that before. I think I need to take a break from strict-aliasing question :)Parallax
I've just managed to replicate the same issue in C. I guess I should ask another question. I can't delete this question, and I guess I shouldn't change this question to make it about CParallax
I lack the required knowledge to say whether the compiler is required to consider aliasing of otherwise legal cast variable in C.Tzong
@AaronMcDaid Just ask a new question, this one is not that bad either.Perquisite
Thanks @BaummitAugen. I've just asked another question in C. #39758158Parallax
F
1

It is only allowed to read from the union member that was last written to in C++.

Aliasing outside unions is only allowed between 'similar' types (for details please see this Q/A), and char/unsigned char. It is only allowed to alias another type through char/unsigned char, but it is not allowed to alias char/unsigned char through other types. If the latter was allowed, then all objects would have to be treated as possibly aliasing any other object, because they could be 'transitively aliased' like you describe through char/unsigned char.

But because this is not the case, the compiler can safely assume that only objects of 'similar' types and char/unsigned char alias each other.

Ferrite answered 28/9, 2016 at 20:58 Comment(10)
Applying aliasing non-transitively in cases where that makes sense would be more helpful than blocking useful forms of aliasing for fear of transitivity. For example, it would make sense to see to say that if T1 and T2 appear together in one union, and T1 and T3 appear together in another, accesses via T1* should be presumed to "dirty" things of types T2 and T3, and accesses via T2* or T3* should "dirty" things of type T1, but accesses via T2* need not dirty things of T3*, nor vice versa.Johanna
@Johanna wouldn't the compiler have to know all unions in all TUs, if this was allowed?Ferrite
The rule for use of common-initial-sequences, which the authors of gcc don't like and blithely ignore, would only require compilers to consider unions whose complete declaration was visible at the point of usage. That rule only covers access of common initial sequences of structures within a union (rather than use of unions for other kinds of type punning) but the principle could be applied elsewhere (trading off optimization for semantic expressiveness).Johanna
If the rule were interpreted so as to allow an S to be used as a T, if a complete declaration of the union was visible at places where the object was used as an S, or where it was used as a T, that would allow many useful constructs which used to be supported but are supported only in the -fno-strict-aliasing dialect of gcc. For example, if many structures have a common initial sequence, and accompanying each such structure is a definition for a union which that type and a specific type that just contains that sequence, then a function which accepts a pointer to the latter struct...Johanna
...would be able to operate upon any of the structure types that were associated with it. The authors of gcc seem to think recognizing aliasing in that case would totally trash optimizations, but having code which accepts the "common" type presume that it might alias with the others would hurt optimization far less than would requiring programmers to write it in such a way that the compiler would have to presume it capable of aliasing any object of any type, anywhere, whose address has ever been exposed to the outside world.Johanna
I have no problem with the idea that compilers should not be required to presume that aliasing might occur in cases where they would have no reason to expect it. I have a big problem with the idea that compilers given something like uint32_t get_float_bits(float *f) { return *(uint32*)f; } should use the rules to infer that f cannot possibly point to a float, rather than using the typecast as an indication that a float is likely to be read as a uint32_t.Johanna
What I miss the most in C++ is the ability to use a union to reinterpret a buffer as a struct. C allows it, but C++ doesn't, which I really don't understand, because it forces the programmer to use an 'unnecessary', possibly expensive memcpy.Ferrite
I don't know that such structures are in practice any less safe in C++ than in C; clang and gcc both push very aggressive "interpretations" of the Standard which severely limit what can be done 100% reliably (it's unclear sometimes what code-breaking behaviors are a result of bugs or design, but if a pattern isn't support reliably it's not reliable). By contrast, in C++ I think it should be possible to populate storage as one PODS, then manually invoke the destructor and use placement new to place a new object on the old storage.Johanna
Yes, I think in practice some techniques not involving a memcpy work well, the problem is the standard. I have the impression memcpy is the only standard-sanctioned way, but I'm not sure.Ferrite
In C++, I think explicit the combination of a trivial destructor and placement new will have defined behavior. In C, even memmove isn't guaranteed safe. In C, if memmove is used to copy something with a declared type to a region of allocated storage, the Standard would allow a compiler to presume that the destination cannot alias anything of any other type, and in at least some cases--whether by bug or design--gcc seems to exploit that freedom.Johanna

© 2022 - 2024 — McMap. All rights reserved.