Essentially, if I have
typedef struct {
int x;
int y;
} A;
typedef struct {
int h;
int k;
} B;
and I have A a
, does the C standard guarantee that ((B*)&a)->k
is the same as a.y
?
Essentially, if I have
typedef struct {
int x;
int y;
} A;
typedef struct {
int h;
int k;
} B;
and I have A a
, does the C standard guarantee that ((B*)&a)->k
is the same as a.y
?
Are C-structs with the same members types guaranteed to have the same layout in memory?
Almost yes. Close enough for me.
From n1516, Section 6.5.2.3, paragraph 6:
... if a union contains several structures that share a common initial sequence ..., and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
This means that if you have the following code:
struct a {
int x;
int y;
};
struct b {
int h;
int k;
};
union {
struct a a;
struct b b;
} u;
If you assign to u.a
, the standard says that you can read the corresponding values from u.b
. It stretches the bounds of plausibility to suggest that struct a
and struct b
can have different layout, given this requirement. Such a system would be pathological in the extreme.
Remember that the standard also guarantees that:
Structures are never trap representations.
Addresses of fields in a structure increase (a.x
is always before a.y
).
The offset of the first field is always zero.
You rephrased the question,
does the C standard guarantee that
((B*)&a)->k
is the same as a.y?
No! And it very explicitly states that they are not the same!
struct a { int x; };
struct b { int x; };
int test(int value)
{
struct a a;
a.x = value;
return ((struct b *) &a)->x;
}
This is an aliasing violation.
struct a
and struct b
is visible where code inspects the struct member, a conforming and non-buggy compiler will recognize the possibility of aliasing. Some compiler writers who only want to abide by the standard when it suits them will break such code even though the Standard guarantees that it will work; that merely means their compilers are not conforming. –
Prowler -fstrict-aliasing
is also the default mode for those compilers and its purpose is to allow compilers to use optimizations that standard tries to allow them to use. At least GCC, Clang and ICC do not implement this rule. MSVC does not use strict aliasing, so it is conformant. In my experience C rules that were not adopted by C++ standard are usually buggy. Clang uses C++ semantics in at least few cases (this one and loop termination at least) even when compiling in C mode. –
Flameproof someStruct.member = value;
unless member
has a character type. Problems with this rule became apparent with Defect Report #28, but the response reached a correct conclusion using a totally nonsensical rationale, and that silly rationale formed the basis for C99's unnecessary and unworkable "effective type" rules. –
Prowler x
s will have the same address and will be accessed using the same type - int
. –
Isologous x
has the same type in both. The problem is that a
has type struct a
, and you’re accessing it through a type of struct b
. Here is a link that shows you how a compiler will optimize based on aliasing: gcc.godbolt.org/z/7PMjbT try removing -fstrict-aliasing
and seeing how the generated code changes. –
Kataway Piggybacking on the other replies with a warning about section 6.5.2.3. Apparently there is some debate about the exact wording of anywhere that a declaration of the completed type of the union is visible
, and at least GCC doesn't implement it as written. There are a few tangential C WG defect reports here and here with follow-up comments from the committee.
Recently I tried to find out how other compilers (specifically GCC 4.8.2, ICC 14, and clang 3.4) interpret this using the following code from the standard:
// Undefined, result could (realistically) be either -1 or 1
struct t1 { int m; } s1;
struct t2 { int m; } s2;
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union {
struct t1 s1;
struct t2 s2;
} u;
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1 and warns about the aliasing violation
// Global union declaration, result should be 1 according to a literal reading of 6.5.2.3/6
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
struct t1 s1;
struct t2 s2;
};
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union u u;
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1 but warns about aliasing violation
// Global union definition, result should be 1 as well.
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
struct t1 s1;
struct t2 s2;
} u;
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1, no warning
Of course, without strict aliasing optimizations all three compilers return the expected result every time. Since clang and gcc don't have distinguished results in any of the cases, the only real information comes from ICC's lack of a diagnostic on the last one. This also aligns with the example given by the standards committee in the first defect report mentioned above.
In other words, this aspect of C is a real minefield, and you'll have to be wary that your compiler is doing the right thing even if you follow the standard to the letter. All the worse since it's intuitive that such a pair of structs ought to be compatible in memory.
union
, not raw pointers to the contained types. This, however, defeats the point of using a union
in the first place, to my mind. I've got a question about this clause - specifically its notable (and perhaps accidental?) exclusion from C++ - over here: https://mcmap.net/q/246365/-union-39-punning-39-structs-w-quot-common-initial-sequence-quot-why-does-c-99-but-not-c-stipulate-a-39-visible-declaration-of-the-union-type-39/2757035 –
Gorget struct
s can be 'punned' very clear. I've collected this and much more into an answer on my linked question –
Gorget m
as part of the CIS, then given S1 *p1; S2 *p2;
, a presumption that p1->m
might alias p2->m
does not seem overly pessimistic. If such access weren't required, why declare the union type? –
Prowler struct foo {int x;} *p, it;
, something like p=⁢ p->x=4;
would invoke UB since it uses an lvalue of type int
to modify an object of type struct foo
, but the authors of the Standard expect that compiler writers won't be so obtuse as to pretend they shouldn't treat that as defined. The Standard has never made any reasonable attempt to fully specify the full range of semantics that should be supported by an implementation targeting any particular platform and purpose. The nonsensical "effective type" rules can't even... –
Prowler This sort of aliasing specifically requires a union
type. C11 §6.5.2.3/6:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
This example follows:
The following is not a valid fragment (because the union type is not visible within function f):
struct t1 { int m; }; struct t2 { int m; }; int f(struct t1 *p1, struct t2 *p2) { if (p1->m < 0) p2->m = -p2->m; return p1->m; } int g() { union { struct t1 s1; struct t2 s2; } u; /* ... */ return f(&u.s1, &u.s2);} }
The requirements appear to be that 1. the object being aliased is stored inside a union
and 2. that the definition of that union
type is in scope.
For what it's worth, the corresponding initial-subsequence relationship in C++ does not require a union
. And in general, such union
dependence would be an extremely pathological behavior for a compiler. If there's some way the existence of a union type could affect a concerete memory model, it's probably better not to try to picture it.
I suppose the intent is that a memory access verifier (think Valgrind on steroids) can check a potential aliasing error against these "strict" rules.
union
members - via both GCC and Clang. See @ecatmur's test on my question here about why this clause was left out of C++: https://mcmap.net/q/246365/-union-39-punning-39-structs-w-quot-common-initial-sequence-quot-why-does-c-99-but-not-c-stipulate-a-39-visible-declaration-of-the-union-type-39/2757035 Any thoughts readers might have on this difference would be very welcome. I suspect this clause should be added to C++ and was just accidentally omitted for 'inheritance' from C99, where it was added (C99 did not have it). –
Gorget struct
s in a union
must happen via a locally visible instance of said union
. (B) N685 was a misreading of that, applied to the union
type and aliasing, mandating complexity that most implementors disagreed with and ignored. (C) The C++ reflector quoted shows a conscious decision to ignore N685 –
Gorget foo *p,*q;
, a presumption that ((bar*)p)->x=5;
might alter q
would be considered considered "overly pessimistic", or that a quality compiler shouldn't be capable of recognizing aliasing in such cases. –
Prowler struct S {int x;} s={0}; s.x=1;
as UB because it modifies an object of type struct S
using an lvalue of type int
, in violation of 6.5p7, I think it's pretty clear... –
Prowler I want to expand on @Dietrich Epp 's answer. Here is a quote from C99:
6.7.2.1 point 14 ... A pointer to a union object, suitably converted, points to each of its members ... and vice versa.
Which means we can copy the memory from a struct to a union containing it:
struct a
{
int foo;
char bar;
};
struct b
{
int foo;
char bar;
};
union ab
{
struct a a;
struct b b;
};
void test(struct a *aa)
{
union ab ab;
memcpy(&ab, aa, sizeof *aa);
// ...
}
C99 also says:
6.5.2.3 point 5 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence ..., and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types .... for a sequence of one or more initial members.
Which means the following will also be legal after the memcpy
:
ab.a.bar;
ab.b.bar;
The struct could be initialized in a separate translation unit and the copying is done in the standard library (out of the control of the compiler).
Thus, memcpy
will copy byte-by-byte the value of the object of type struct a
and the compiler has to ensure the result is valid for both structs.
The compiler cannot do anything other than generate instructions that read from the corresponding memory offset for both of those lines, thus the address needs to be the same.
Even though it is not stated explicitly, I would say the standard implies that C-structs with the same member types have the same layout in memory.
© 2022 - 2024 — McMap. All rights reserved.