Can a type which is a union member alias that union?
Asked Answered
S

4

5

Prompted by this question:

The C11 standard states that a pointer to a union can be converted to a pointer to each of its members. From Section 6.7.2.1p17:

The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

This implies you can do the following:

union u {
    int a;
    double b;
};

union u myunion;
int *i = (int *)&u;
double *d = (double *)&u;

u.a = 2;
printf("*i=%d\n", *i);
u.b = 3.5;
printf("*d=%f\n", *d);

But what about the reverse: in case of the above union, can an int * or double * be safely converted to a union u *? Consider the following code:

#include <stdio.h>

union u {
    int a;
    double b;
};

void f(int isint, union u *p)
{
    if (isint) {
        printf("int value=%d\n", p->a);
    } else {
        printf("double value=%f\n", p->b);
    }
}

int main()
{
    int a = 3;
    double b = 8.25;
    f(1, (union u *)&a);
    f(0, (union u *)&b);
    return 0;
}

In this example, pointers to int and double, both of which are members of union u, are passed to a function where a union u * is expected. A flag is passed to the function to tell it which "member" to access.

Assuming, as in this case, that the member accessed matches the type of the object that was actually passed in, is the above code legal?

I compiled this on gcc 6.3.0 with both -O0 and -O3 and both gave the expected output:

int value=3
double value=8.250000
Singleminded answered 4/2, 2019 at 14:52 Comment(1)
You do not even need the aliasing rules to see this may have behavior not defined by the standard. If the double requires eight-byte alignment, then the union does too. But the int a may have only four-byte alignment, in which case the behavior of converting &a to union u * is not defined.Apure
H
1

Regarding strict aliasing, there is not an issue going from pointer-to-type (for example &a), to pointer-to-union containing that type. It is one of the exceptions to the strict aliasing rule, C17 6.5/7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object, /--/
- an aggregate or union type that includes one of the aforementioned types among its members

So this is fine as far as strict aliasing goes, as long as the union contains an int/double. And the pointer conversion in itself is well-defined too.

The problem comes when you try to access the contents, for example the contents of an int as a larger double. This is probably UB for multiple reasons - I can think of at least C17 6.3.2.3/7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned69) for the referenced type, the behavior is undefined.

Where the non-normative foot note provides more information:

69) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

Hellas answered 4/2, 2019 at 15:11 Comment(2)
I think the alignment is the key here. I hadn't considered that angle, and as per the passage you quoted this ends up being a hard no, baring potentially ugly hacks that force alignment.Singleminded
@Singleminded Or even more fundamentally: object size. But that's kind of covered by alignment, implicitly.Hellas
A
5

In this example, pointers to int and double, both of which are members of union u, are passed to a function where a union u * is expected. A flag is passed to the function to tell it which "member" to access.

Assuming, as in this case, that the member accessed matches the type of the object that was actually passed in, is the above code legal?

You seem to be focusing your analysis with respect to the strict aliasing rule on the types of the union members. However, given

union a_union {
    int member;
    // ...
} my_union, *my_union_pointer;

, I would be inclined to argue that expressions of the form my_union.member and my_union_pointer->member express accessing the stored value of an object of type union a_union in addition to accessing an object of the member's type. Thus, if my_union_pointer does not actually point to an object whose effective type is union a_union then there is indeed a violation of the strict aliasing rule -- with respect to type union a_union -- and the behavior is therefore undefined.

Assimilate answered 4/2, 2019 at 15:56 Comment(13)
I'm not sure this applies, at least not in the case of ->. 6.5.3.2p4: A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. If the first expression is a pointer to a qualified type, the result has the so-qualified version of the type of the designated member.Singleminded
Well, @dbush, I take that text to be supportive of my position, in that it is written in terms of a member of the object to which the operand expression points. But if you don't see your way clear to that, then it still leaves you with UB, because if the operand does not actually point to a union then there is no member of that union from which to draw a value, and no definition of any other way to evaluate the expression.Assimilate
There is no concept of "pointer to several effective types at once". Sure that part of the standard is badly written, but it doesn't mention any of your speculations. If so, they wouldn't have made exceptions applying to aggregate and union types. Notably, these rules don't care where pointers point, only through which type the data is accessed.Hellas
@Lundin, as I know you are aware, a pointer has only one target type. But multiple objects with different effective types can certainly overlap in memory, and the standard certainly recognizes that converting a pointer's type can change which of those the result points to. But the standard does not provide any definition of the behavior u->b if u in fact does not point to an object whose effective type is u's target type, regardless of how the effective type of that object compares to the result of u being converted to some other pointer type.Assimilate
The effective type is either union or the type of the member, depending on how the value access was done. If you did something like *my_union = *my_other_union; then the effective type is the union type. But if you did my_union->foo = 5; then the effective type is int etc. But you can still access the whole memory area where that effective type int resides as a union, since it is a union type containing a type compatible with the effective type (int).Hellas
Well I guess that's the crux of our disagreement, @Lundin. The definition of the behavior of -> says "The value is that of the named member of the object to which the first expression points" (emphasis added). Lacking any clear reason to do otherwise, I am inclined to take that as written, which leaves the behavior undefined if the first expression in fact does not point to an object of its target type. Doing so furthermore hangs together consistently with the strict aliasing rule and the possibility UB from misalignment when converting pointers.Assimilate
@Lundin: The use of a freshly-derived reference in the context of its derivation is not aliasing. The question of exactly when references should be recognized as "freshly derived" was (and still should be, mostly) viewed as an implementation issue, with the proviso that certain constructs should be recognizable by all non-garbage implementations. The authors of the Standard didn't think it necessary to explicitly say that a compiler should recognize that a function that takes the address of a union member and then uses the resulting pointer before doing anything else...Dieter
...might actually access the resulting union object, but I think that's most likely because they never imagined that it would become fashionable for compilers to be willfully blind to such things.Dieter
@Dieter I think you are mixing up C and C++? In C, all objects in a union exist at the same time, or type punning wouldn't be possible.Hellas
@Lundin: If all objects exist at the same time, then accessing any of them with an lvalue other than union type or character type would violate the 6.5p7 constraint. Yes that would allow conforming-but-obtuse compilers totally break the language, but more importantly it shows that 6.5p7 cannot plausibly have been meant to be regarded as a hard constraint.Dieter
@Lundin: If one recognizes that 6.5p7 is only meant to apply in cases where objects are not visibly freshly related to each other, and is not intended to encourage compilers to ignore obvious relationships, then the rule would make sense but needlessly restrict many optimizations. The intention would have been clearer if the only things that were allowed to alias were references to the same object, elements of a common larger array, or an array and elements thereof, but the rule made clear that use of a freshly-formed pointer isn't aliasing.Dieter
@JohnBollinger: The rules as written are simply unworkable unless one limits their application to cases that involve aliasing between seemingly-unrelated objects. The Effective Type Rule (N1570 6.5p6) is particularly bad, since by the definition of "object" used everywhere else, the type of every object is statically assigned [except for VLAs where the element type is static and the length dynamic]. The notion that storage has an effective type is a fundamentally broken and useless abstraction. Recognizing pointers as having an effective type would be better, though...Dieter
...that's probably not the best way to describe aliasing.Dieter
D
2

The Standard gives no general permission to access a struct or union object using an lvalue of member type, nor--so far as I can tell--does it give any specific permission to perform such access unless the member happens to be of character type. Nor does it define any means by which the act of casting an int* into a union u* can create one which did not already exist. Instead, the creation of any storage that will ever be accessed as a union u implies the simultaneous creation of a union u object within that storage.

Instead, the Standard (references quoted from the C11 draft N1570) relies upon implementations to apply the footnote 88 (The intent of this list is to specify those circumstances in which an object may or may not be aliased.) and recognize that the "strict aliasing rule" (6.5p7) should only be applied when an object is referenced both via an lvalue of its own type and a seemingly-unrelated lvalue of another type during some particular execution of a function or loop [i.e. when the object aliases some other lvalue].

The question of when two lvalues may be viewed as "seemingly unrelated", and when an implementations should be expected to recognize a relationship between them, is a Quality of Implementation issue. Clang and gcc seem to recognize that lvalues with forms unionPtr->value and unionPtr->value[index] are related to *unionPtr, but seem unable to recognize that pointers to such lvalues have any relationship to unionPtr. They will thus recognize that both unionPtr->array1[i] and unionPtr->array2[j] access *unionPtr (since array subscripting via [] seems to be treated differently from array-to-pointer decay), but will not recognize that *(unionPtr->array1+i) and *(unionPtr->array2+j) do likewise.

Addendum--standard reference:

Given

union foo {int x;} foo,bar;
void test(void)
{
  foo=bar;   // 1
  foo.x = 2; // 2
  bar=foo;   // 3
}

The Standard would describe the type of foo.x as int. If the second statement didn't access the stored value of foo, then the third statement would have no effect. Thus, the second statement accesses the stored value of an object of type union foo using an lvalue of type int. Looking at N1570 6.5p7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:(footnote 88)

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

Footnote 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

Note that there is no permission given above to access an object of type union foo using an lvalue of type int. Because the above is a constraint, any violation thereof invokes UB even if the behavior of the construct would otherwise be defined by the Standard.

Dieter answered 4/2, 2019 at 21:30 Comment(5)
"The Standard gives no general permission to access a struct or union object using an lvalue of member type" Sure it does, the first part of the question quotes the relevant part. Meaning if you go from union pointer to member type pointer through a cast, you end up with for example int* pointing at an int. Which is of course fine and strict aliasing does not even apply.Hellas
@Lundin: According 6.5p7, an object of type union foo shall have its stored value accessed only by lvalues of type union foo, other structs or unions that have the union foo as a member, or character types, and any program that violates that constraint invokes UB even if its behavior would be defined elsewhere in the Standard in the absence of that constraint.Dieter
According to that logic, this is UB then: int* iptr = &my_union.my_int; *iptr = 0;. Which would break the whole language. Each member of an aggregate/union must always be regarded as having the same effective type as its declared type, or nothing makes sense.Hellas
@Lundin: The authors of clang/gcc think that construct is UB, and neither will process it consistently. I think they're right about it being UB, but only because the authors of the Standard made no effort to forbid compilers from behaving in obtusely useless fashion--a fact they explicitly acknowledge in the Rationale. The purpose of 6.5p7 is to indicate the circumstances when things may alias--not the circumstances in which pointers/lvalues which are visibly freshly derived from objects of other types may be used to access those objects.Dieter
@Lundin: For an example where clang/gcc fail to meaningfully process address-of with a union member, see godbolt.org/z/Lw5zOP (the generated code for test ignores the possibility that operations on arr[j] may affect arr[i], and test2 would ignore the possibility of i==0 j==0, where the (&arr[0].s1+i)->m should behave equivalent to arr[0].s1.m, but doesn't.Dieter
H
1

Regarding strict aliasing, there is not an issue going from pointer-to-type (for example &a), to pointer-to-union containing that type. It is one of the exceptions to the strict aliasing rule, C17 6.5/7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object, /--/
- an aggregate or union type that includes one of the aforementioned types among its members

So this is fine as far as strict aliasing goes, as long as the union contains an int/double. And the pointer conversion in itself is well-defined too.

The problem comes when you try to access the contents, for example the contents of an int as a larger double. This is probably UB for multiple reasons - I can think of at least C17 6.3.2.3/7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned69) for the referenced type, the behavior is undefined.

Where the non-normative foot note provides more information:

69) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

Hellas answered 4/2, 2019 at 15:11 Comment(2)
I think the alignment is the key here. I hadn't considered that angle, and as per the passage you quoted this ends up being a hard no, baring potentially ugly hacks that force alignment.Singleminded
@Singleminded Or even more fundamentally: object size. But that's kind of covered by alignment, implicitly.Hellas
D
-1

No. It's not formally correct.

In C you can do whatever, and it could work, but constructs like this are bombs. Any future modification could lead to a big failure.

The union reserves memory space to hold the largest of it elements:

The size of a union is sufficient to contain the largest of its members.

On the reverse the space can't be enough.

Consider:

union
{
    char a;
    int b;
    double c;
} myunion;
char c;
((union myunion *)&c)->b = 0;

Will create a memory corruption.

The meaning of the standard definition:

The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

Enforce the point that each union member start at the union start address, and, implicitly, states that the compiler shall align unions on a suitable boundary for each of its elements, that means to choose an alignment correct for each member. Because the standard alignments are normally powers of 2, as rule of thumb the union will get aligned on the boundary that fit the element requiring the largest alignment.

Duotone answered 4/2, 2019 at 15:50 Comment(2)
Will create a memory corruption. Not necessarily. It can fail in other ways, too.Termagant
@AndrewHenle Of course, but in the example provided, where an int is supposed to be larger that a char, depending on the machine default alignment, there is high probability to overwrite next allocation.Duotone

© 2022 - 2024 — McMap. All rights reserved.