Union with anonymous struct with flexible array member
Asked Answered
G

4

15

Consider the following two examples:

1.

union test{
  struct {
      int a;
      int b[];
  };
};


int main(void){
    union test test;
    test.a = 10;
    printf("test.b[0] = %d", test.b[0]); //prints 0, UB?
}

DEMO

2.

#include <stdio.h>

union test{
    int a;
    int b[]; //error: flexible array member in union
};


int main(void){
    union test test;
    test.a = 10;
    printf("test.b[0] = %d", test.b[0]);
}

DEMO

The behavior is unclear. I expected the examples would behave the same (i.e. the first example would also fail to compile) since 6.7.2.1(p13):

The members of an anonymous structure or union are considered to be members of the containing structure or union.

So I interpreted the wording as if a union contains an anonymous struct as a member the members of the anonymous struct would be considered as members of the containing union.

Question: Why does the first example compile fine instead of failing as the second one?

Gault answered 5/6, 2019 at 14:7 Comment(0)
H
14

NOTE: this answer has been substantively modified since first being written, reflecting a change to the committee's position after publication of the documents on which the original version of the answer relied.

The members of an anonymous structure or union are considered to be members of the containing structure or union.

This is a tricky provision to interpret, and indeed it has been the subject of at least two defect reports against the standard. The intention, as supported by the committee in its response to DR 499 is that anonymous structures are treated for layout purposes as if the structure itself were the member of the containing structure or union, but access to its members is expressed as if they were members of the containing structure or union.

The accepted position on DR 502, on the other hand, holds that even an anonymous struct containing a flexible array member as its only member is allowed if it is the last member of the structure (not union) containing it, and at least one other precedes it.

I find those a bit inconsistent, but the unifying theme across them seems to be that the intent of the standard in this area comes down to layout. A flexible array member inside an anonymous struct is allowed as long as it comes at the end of the layout of the innermost named structure or union, which must have non-zero size from consideration of the other members, taking into consideration the fact that members of an anonymous struct do not overlap, regardless of whether the anonymous struct appears inside a union.

The proposed committee response to DR 502 (which differs from its initial position) is consistent with that. It holds that anonymous structures inside a structure or union must obey the same rules as other structures with respect to flexible array members, notwithstanding the provision you ask about.

The committee does not appear to have decided the specific question you asked, but the theme of its decisions seems clear: the "considered to be members of the containing structure or union" wording is intended to be interpreted narrowly, as a statement only about the syntax for accessing members of anonymous structures and unions. Thus, that provision has nothing to say about whether anonymous structures may contain FAMs, and the general rules about when and where they may do apply. Those rules allow your first case.

Halsy answered 5/6, 2019 at 14:40 Comment(3)
DR 502 has apparently not actually been accepted, according to the minutes of the April 2018 meeting (n2239.pdf). What I find interesting here is that the logic in DR502 which would allow struct { int i; struct { int a[]; }; }; would also allow struct { int i; union { int a[]; };}; (IMO). In practice, neither clang nor gcc seem to allow either of these. (I noticed this today because I tried to remove the 0 from a header which included struct { ...; union { char* c; Node* n[0]; }; };, which both gcc and clang seem to be happy with, thanks to the gcc extension.)Flavory
Here's a better reference for DR502 having been closed: open-std.org/jtc1/sc22/wg14/www/docs/n2257.htm#dr_502. So I guess I'm out of luck here. According to that resolution, "the effect is easily achieved." I supposed the simple resolution for me is to change the 0 to a 1.Flavory
Thanks, @rici, it appears that the committee changed course between 10/2016 and 4/2017. That was not evident from the document I was relying upon (and that I linked), but it is abundantly clear from your link. I will revise the answer shortly.Halsy
B
6

The second case fails to compile, because flexible array member is a property of a structure type, not for unions. That's straightforward.

Next, in the first case, trying to access b[0] would be undefined behavior, as no memory has been allocated for that.

Quoting C11, §6.7.2.1/P18

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. [...] If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

That said,

The members of an anonymous structure or union are considered to be members of the containing structure or union.

That is for the access purpose, the layout remains unchanged. See, in your first example, you're accessing a (and b) as if they are direct members of the union.

To clarify,

#include <stdio.h>

union test{
    struct {
        int p;
        float q;
    } t;                //named structure member
  struct {
      int a;
      int b[];
  };
    char pqr;
};


int main(void){
    union test test;
    test.t.p = 20;   // you have to use the structure member name to access the elements
    test.pqr = 'c';     // direct access, as member of union
    test.a = 10;        // member of anonymous structure, so it behaves as if direct member of union
}
Bedesman answered 5/6, 2019 at 14:14 Comment(2)
That is for the access purpose. Can you please expand a bit? The members of an anonymous structure or union are considered to be members of the containing structure or union. goes in the semantic section while the fact that unions cannot contain flexible array members is constraint. So it looks a bit contradictory to me that we can interpret things in the Semantic as a a violation of constraint...Gault
After thinking about it for a while That is for the access purpose. definitely makes sense. I removed my previous comment.Gault
C
6

The (C11) standard says in §6.7.2.1 Structure and union specifiers ¶3 — a constraint:

¶3 A structure or union shall not contain a member with incomplete or function type (hence, a structure shall not contain an instance of itself, but may contain a pointer to an instance of itself), except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.

Note that only structures can (directly) contain a flexible array member — unions cannot.

The first case is legitimate. The second is not.

(It's §6.7.2.1 ¶18 that defines the term flexible array member.)

Incidentally, in the first version of the question, the printf() statement in the first example was accessing an element of the array that was not allocated — a defect that has since been fixed in revision 2. Writing union test test; gives you an array of size 0. You must use dynamic memory allocation (or some other mechanism) to allocate a union or structure with sufficient space for a non-empty FAM. Similar comments applied to the second example too.

Some Name says in a comment

But since in the first case the structure is an anonymous so the members of the structure should be considered as members of the containing union making the union to contain a flexible array member. As I quoted The members of an anonymous structure or union are considered to be members of the containing structure or union.

Note that the anonymous structure doesn't lose its shape just because it is embedded into a union. One difference is that the offset of b in union test cannot be 0 — which is completely different from normal members of a union. Normally, all members of a union start at offset 0. Mostly, though, that says that given a variable union test u;, you can refer to u.a and u.b. In times past, you would have had to specify a name for the structure: union test { struct { int a; int b[]; } s; }; and have used u.s.a or u.s.b to access the elements of the structure within the union. That doesn't affect where a flexible array member is allowed; only the notation used to access it.

Calan answered 5/6, 2019 at 14:15 Comment(5)
But since in the first case the structure is an anonymous so the members of the structure should be considered as members of the containing union making the union to contain a flexible array member. As I quoted The members of an anonymous structure or union are considered to be members of the containing structure or union.Gault
@SomeName Just addressed this in my answer. :)Bedesman
One difference is that the offset of b in union test cannot be 0 — which is completely different from normal members of a union. Normally, all members of a union start at offset 0. Mostly, though, that says that given a variable union test u;, you can refer to u.a and u.b. In times past, you would have had to specify a name for the structure: union test { struct { int a; int b[]; } s; }; and have used u.s.a or u.s.b to access the elements of the structure within the union. That doesn't affect where a flexible array member is allowed; only the notation used to access it.Calan
Now, you see, when you put “must” in italics like that, then we have go to digging through the C standard to see if it is true. What if I put the structure in union foo with another member that is an array of a bazillion char? Then defining union foo x allocates plenty of memory. Can I use the flexible array member then?Lumbye
Please don't change the question once you've got answers — at least not in a way that invalidates parts of those answers.Calan
F
3

It was certainly always the intention that untagged composites in an anonymous composite retain their shape. But that was not explicit in the wording of §6.7.2.1p13. The wording was revised tin C18 to: (emphasis added):

  1. An unnamed member whose type specifier is a structure specifier with no tag is called an anonymous structure; an unnamed member whose type specifier is a union specifier with no tag is called an anonymous union. The members of an anonymous structure or union are considered to be members of the containing structure or union, keeping their structure or union layout. This applies recursively if the containing structure or union is also anonymous.

See http://www.iso-9899.info/wiki/The_Standard for links to the C18 standard and freely-available draft (pdf)

Flavory answered 5/6, 2019 at 14:50 Comment(8)
Is the part you emphasized already added into some draft? If so could you please give a reference? N1570 does not contain it.Gault
@SomeName. Done.Flavory
Has the intention of the Committee mattered in the last decade or so? The authors of the Standard clearly and explicitly stated in their published Rationale a desire to uphold the Spirit of C and the principles behind it, and many "ambiguities" would be irrelevant for a compilers that try to uphold the Spirit of C. The Standards Committee never intended that compiler writers focus on whether various useful behaviors are mandated by the Standard, or are merely "popular extensions" which quality compilers will support, but which aren't required for conformance.Kristinakristine
@supercat: the intentions of the committee matter to the people who care about the intentions of the committee. I believe all major compilers are represented on the committee; other than that, I suppose the document matters most to those who for some reason regard it as biblical. I don't fall into that category; I think we both agree that common sense is a better guide.Flavory
@rici: I would think common sense would imply that if u is the address of a union with member um of struct type T with member sm, a sequence like T *p = &u->m; p->sm = 123; should be equivalent to u->m.sm = 123;, at least in cases where nothing (except p) accesses or addresses the union between the formation of p and the last use thereof. I can't really think what else the address-of operator would usefully mean. Neither gcc nor clang recognize such equivalence, however.Kristinakristine
@supercat: Sorry, you lost me. Does your example have something to do with this question? Does it relate to some other question on SO which I haven't happened to see? A simple -O3 -S gcc-8 test compile showed the same assembly code generated for both expressions. So what am I missing?Flavory
There are many situations where what used to be common sense would suggest that compilers should support constructs beyond what the Standard would mandate, but where compiler writers treat the lack of a mandate as an invitation to break such constructs. Common sense carries no weight with the way some compilers treat unions. I would think common sense would imply that code which takes the address of a union member should be able to access it at least in cases where a compiler could see that was the last action done with the union lvalue. If common sense isn't honored there...Kristinakristine
...I wouldn't rely on it being honored anywhere.Kristinakristine

© 2022 - 2024 — McMap. All rights reserved.