Can an object have more than one effective type?
Asked Answered
D

2

6

Consider the following code on a platform where the ABI does not insert padding into unions:

union { int xi; } x;
x.xi = 1;

I believe that the second line exhibits undefined behaviour as it violates the strict aliasing rule:

  1. The object referred to by x.xi is the same object as the object referred to by x. Both are the same region of storage and the term object is defined in ISO 9899:2011 §3.15 as:

    object

    1 region of data storage in the execution environment, the contents of which can represent values

    2 NOTE When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.

    As an object is not more than a region of storage, I conclude that as x and x.xi occupy the same storage, they are the same object.

  2. The effective type of x is union { int xi; } as that's the type it has been declared with. See §6.5 ¶6:

    6  The effective type of an object for an access to its stored value is the declared type of the object, if any.87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.


    87) Allocated objects have no declared type.

    By the wording of ¶6 it is also clear that each object can only have one effective type.

  3. In the statement x.xi I access x through the lvalue x.xi typed int. This is not one of the types listed in §6.5 ¶7:

    7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

    • a type compatible with the effective type of the object,
    • a qualified version of a type compatible with the effective type of the object,
    • a type that is the signed or unsigned type corresponding to the effective type of the object,
    • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    • a character type.

    88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

  4. Therefore, the second line exhibits undefined behaviour.

As this interpretation is clearly wrong, where lies my misreading of the standard?

Dogmatist answered 5/8, 2016 at 8:7 Comment(0)
T
5

The error is thinking that x and x.xi are the same object.

The union is an object and it contains member objects1. They are distinct objects, each with it's own type.


1. (Quoted from: ISO/IEC 9899:20x1 6.2.5 Types 20)
A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.

Timon answered 5/8, 2016 at 8:19 Comment(9)
How is this consistent with the definition of "object" as "region of data storage"? Does it mean that a region of data storage is not necessarily an object/not necessarily a single object? I.e is the equivalence only in one direction? How can we know then, when and when not a region of data storage is an object?Pravit
@JohannesSchaub-litb I suggest you ask a new question.Timon
The definition of equality operator for pointers says "Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) ...". This appears to indicate that a subobject and its parent object is one and the same object.Pravit
@JohannesSchaub-litb The quote you posted (see the text in brackets) answers that question.Timon
but that quote says "pointers to the same object (includding a pointer to an object and a subobject at its beginning" which is another way of saying "an object and a subobject at its beginning are the same object". Correct me if I am wrong, but the "including" here means that the case of an object and its subobject at the beginning are included in the condition "point to the same object". Otherwise, what is the meaning of "including" after the statement about same objects then?Pravit
The interesting thing about this quote is that it indicates that two objects can be distinct even though they occupy the exact same storage region.Dogmatist
@FUZxxl This seems to be true for aggregate types and unions.Timon
@FUZxxl: The storage returned from a malloc() allocation needs to be a single object to make pointer arithmetic, pointer comparisons, and memcpy() work. The Standard should have defined a new term like "object-view" for its pointer-access-type restrictions, with the latter term referring to the portion of an object which has been accessed via certain pointer and will be accessed again using that "same" pointer without any intervening typecasts being performed upon it.Pronator
@JohannesSchaub-litb: I think the use of the word "including" in that definition is a little sloppy; "or" would have been more accurate. For example, given struct { int x; int y; } obj;, obj and obj.x are (obviously, I think) not the same object -- but (void*)&obj == (void*)&obj.x. There is an interesting question as to whether an object and an initial subobject of the same size are the same object, but the definition of pointer equality doesn't answer it.Anorthite
P
3

Outside of the rules which forbid the use of pointers to access things of other types, the term "object" refers to a contiguous allocation of storage. Each individual variable of automatic or static duration is an independent object (since an implementation could arbitrarily scatter them throughout memory) but any region of memory created by malloc would be a single object--effectively of type "char[]", no matter how many different ways the contents therein were indexed and accessed.

The C89 rules regarding pointer type access could be made workable if, in addition to the special rule for character-pointer types, there were a corresponding rule for suitably-aligned objects of character-array types, or for objects with no declared type that were effectively "char[]". Interpreting the rules in such a fashion would limit their application to objects that had declared types. This would have allowed most of the optimizations that would have been practical in 1989, but as compilers got more sophisticated it became more desirable to be able to apply similar optimizations to allocated storage; since the rules weren't set up for that, there was little clarity as to what was or was not permissible.

By 1999, there was a substantial overlap between the kinds of pointer-based accesses some programs needed to do, and the kinds of pointer-based accesses that compilers were assuming programs wouldn't do, so any single C99 standard would have either required that some C99 implementations be made less efficient than they had been, or else allow C99 compilers to behave arbitrarily with a large corpus of code that relies upon techniques that some compilers didn't support.

The authors of C99, rather than resolving the situation by defining directives to specify different aliasing modes, attempted to "clarify" it by adding language that either requires applying a different definition of "object" from the one used elsewhere, or else requires that each allocated region hold either one array of a single type or a single structure which may contain a flexible array member. The latter restriction might be usable in a language being designed from scratch, but would effectively invalidate a huge amount of C code. Fortunately or unfortunately, however, the authors of the Standard were to get away with such sloppy drafting since compiler writers were, at least until recently, more interested in doing what was necessary to make a compiler useful than in doing the minimum necessary to comply with the poorly-written Standard.

If one wants to write code that will work with a quality compiler, ensure that any aliasing is done in ways that a compiler would have to be obtuse to ignore (e.g. if a function receives a parameter of type T*, casts it to U*, and then accesses the object as a U*, a compiler that's not being obtuse should have no trouble recognizing that the function might really be accessing a T*). If one wants to write code that will work with the most obtuse compiler imaginable... that's impossible, since the Standard doesn't require that an implementation be incapable of processing anything other than a possibly-contrived and useless program. If one wants to write code that will work on gcc, the author's willingness to support constructs will be far more relevant than what the Standard has to say about them.

Pronator answered 5/8, 2016 at 20:26 Comment(6)
Though, what makes you believe that the word object has a different meaning within §6.5 (if I understood you correctly)? There doesn't seem to be any text indicating this interpretation.Dogmatist
@FUZxxl: A lot of code written prior to 1999 will use "malloc" to receive a large allocation and then use pointer arithmetic to sub-allocate space from it to hold many kinds of structures. In some cases, groups of consecutive structures may be passed to memcpy, fwrite, etc. Making such code to be compatible with an interpretation of the language standard which would require that a malloc() allocation only treated as a single object of a single type would essentially require a complete rewrite, and many kinds of I/O would require that data be duplicated into an "unsigned char[]".Pronator
@FUZxxl: Perhaps the authors of C99 did intend to make the Standard inapplicable to a large corpus of existing code, but if they did I would think it better to keep the code and toss the Standard than vice versa. I prefer to think they intended to allow for the possibility that regions of storage within an object acquire types through usage. That would make the intention almost reasonable, though the way it's written needlessly cripples the semantics available to the programmer in ways that offer little optimization benefit.Pronator
@FUZxxl: If the Standard had included directives to specify when it was necessary to reinterpret or recycle storage as different types (if storage is reinterpreted, it would contain the bit pattern written as the old type, reinterpreted as the new type; if it's recycled, it would contain Unspecified bit patterns), and indicated that code may only include #pragma STDC ALIAS_EXPLICIT" if all such reinterpretations are thus marked, that would allow many more optimizations than are possible under the present standard, without breaking any code.Pronator
Would have been nice to know whether the author understands gcc to be a "quality" compiler – or, as others have said, an "adversarial" compiler :). Or put another way: is there any current compiler that could be called a "quality compiler" according to the behavior described in the last paragraph?Hysteria
@hmijail: GCC can be configured to behave as a quality compiler, at the expense of disabling many optimizations which would have been useful if performed less adversarially.Pronator

© 2022 - 2024 — McMap. All rights reserved.