Nested structs and strict aliasing in c
Asked Answered
U

4

18

Please consider the following code:

typedef struct {
  int type;
} object_t;

typedef struct {
  object_t object;
  int age;
} person_t;

int age(object_t *object) {
  if (object->type == PERSON) {
    return ((person_t *)object)->age;
  } else {
    return 0;
  }
}

Is this legal code or is it violating the C99 strict aliasing rule? Please explain why it is legal/illegal.

Uninterested answered 7/12, 2011 at 13:54 Comment(18)
@outis: what exactly do you mean by class?Uninterested
@user1085684 This is not a homework assignment or an interview question, is it?Ultramundane
@dasblinkenlight No, i'm writing a vm and need to determine an object type at runtime. That's why i need some kind of information in every runtime object that i can access without knowing it's type.Uninterested
The first member of each struct needs to be int type to check the type then cast it correctly.Prehistory
why should this even work? object_t has nothing to do with person_t?Sterile
While I agree with others that the question gives a bad example with persons and age, I see nothing wrong with the principle behind it. In fact, many structures in the old Amiga operating system was using this scheme.Poseur
@Prehistory that would break the strict aliasing ruleUninterested
@Sterile It's a rather common trick, you can cast a struct pointer to a pointer its first member, cause the memory layout is guaranteed to be the same (so you can treat a person_t* as an object_t* as its first member is an object_t). e.g. all of the object hierarchy of the GTK toolkit is built around this concept.Lambrequin
@Sterile but person_t has something to do with object_tUninterested
@Lambrequin yes, i know - but is it breaking the strict alias rule?Uninterested
Well, hopefully someone can answer that instead of pointing out how the code can be rewritten :-)Lambrequin
@Lambrequin - The problem with the OP's example is that he doesn't treat a person_t* as an object_t* but the other way. And that other way is unsafe even with the check on type because it could be cheated.Ramsden
@mouviciel: The problem is not that it could be cheated; the problem is that this may be violating strict-aliasing rules.Cathey
@Ramsden but that is not the question - its about if the aliasing rule is violatedUninterested
@OliCharlesworth - The two problems are linked.Ramsden
@Ramsden how are they link exactly?Uninterested
person_t includes object_t: aliasing is allowed. object_t does not include person_t: aliasing is not allowed.Ramsden
@Ramsden In that case, that's might the answer. Having a type field used for discriminating is a common way to implement your own type/object system. Such a system, ofcourse, relies on information being correct, it breaks down if e.g. object->type is incorrect, just as strlen() breaks down(gives you undefined behavior) if you pass it a NULL pointer or something that's not a string.Lambrequin
U
17

Strict aliasing rule is about two pointers of different types referencing the same location in memory (ISO/IEC9899/TC2). Although your example reinterprets the address of object_t object as an address of person_t, it does not reference memory location inside object_t through the reinterpreted pointer, because age is located past the boundary of object_t. Since memory locations referenced through pointers are not the same, I'd say that it is not in violation of the strict aliasing rule. FWIW, gcc -fstrict-aliasing -Wstrict-aliasing=2 -O3 -std=c99 seems to agree with that assessment, and does not produce a warning.

This is not enough to decide that it's legal code, though: your example makes an assumption that the address of a nested structure is the same as that of its outer structure. Incidentally, this is a safe assumption to make according to the C99 standard:

6.7.2.1-13. A pointer to a structure object, suitably converted, points to its initial member

The two considerations above make me think that your code is legal.

Ultramundane answered 7/12, 2011 at 14:42 Comment(4)
@Uninterested I liked the question too, because I got an opportunity to learn something new in the process of answering it. Specifically, the second part was a little counter-intuitive to me: I did not know that C prohibits compilers from padding before the initial member of the structure. Anyway, if you think a post answers your question, it's a good idea to accept an answer: your accepted %% will go up, and you will earn a badge in the process. Good luck with your VM project!Ultramundane
what if it dereferenced the memory location within object_t boundary? (person_t *)object->object->type + (person_t *)object->age would be okay? (code does not have any meaning, written only to demonstrate access to memory within object_t boundary)Firstnighter
"Suitably converted" is just about the worst wording the Standard authors could have picked here -- does it mean the pointer must be converted to void*, or that we can actually use a different struct pointer type?Samuele
@larsmans I am pretty sure that by "suitably converted" they mean "converted to the type of the initial member of the outer struct".Ultramundane
W
3

http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

As an add-on to the accepted answer, here is the full citation from the standard, with the important part highlighted that the other answer omitted, and one more:

6.7.2.1-13: Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

6.3.2.3-7: A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. [...]

I find your example to be a perfect place for a void pointer:

int age(void *object) {

Why? Because your obvious intention is to give different "object" types to such a function, and it gets the information according to the encoded type. In your version, you need a cast each time you call the function: age((object_t*)person);. The compiler will not complain when you give the wrong pointer to it, so there is no type safety involved, anyway. Then you can as well use a void pointer and avoid the cast when calling the function.

Alternatively you could call the function with age(&person->object), of course. Each time you call it.

Winn answered 7/12, 2011 at 16:59 Comment(0)
B
2

The strict aliasing rule limits by what types you access an object (a region of memory). There are a few places in the code where the rule might crop up: within age() and when calling age().

Within age, you have object to consider. ((person_t *)object) is an lvalue expression because it has an object type and it designates an object (a region of memory). However, the branch is only reached if object->type == PERSON, so (presumably) the effective type of the object is a person_t*, hence the cast doesn't violate strict aliasing. In particular, strict aliasing allows:

  • a type compatible with the effective type of the object,

When calling age(), you will presumably be passing an object_t* or a type that descends from object_t: a struct that has an object_t as the first member. This is allowed as:

  • an aggregate or union type that includes one of the aforementioned types among its members

Furthermore, the point of strict aliasing is to allow for optimizing away loading values into registers. If an object is mutated via one pointer, anything pointed to by pointers of an incompatible type are assumed to remain unchanged, and thus don't need to be reloaded. The code doesn't modify anything, so shouldn't be affected by the optimization.

Bannerman answered 7/12, 2011 at 14:54 Comment(2)
Whilst this code doesn't modify anything, we don't know what code occurs after age returns. If the next line modifies the memory, is the compiler free to re-order things so that the modification occurs before age is called?Cathey
That won't affect anything as long as the call to age is preserved as a function call rather than inlined, as there still won't be any modification within age. Moreover, if age is inlined, then there won't be a separate object variable, so the aliasing created by the function call will vanish.Bannerman
P
0

One acceptable way that is explicitly condoned by the standard is to make a union of structs with identical initial segment, like so:

struct tag  { int value;                };
struct obj1 { int tag;    Foo x; Bar y; };
struct obj2 { int tag;    Zoo z; Car w; };

typedef union object_
{
  struct tag;
  struct obj1;
  struct obj2;
} object_t;

Now you can pass an object_t * p and examine p->tag.value with impunity, and then access the desired union member.

Physiologist answered 7/12, 2011 at 14:43 Comment(5)
that would make alle objects as big as the biggest member of object_t, and i want to avoid this if possibleUninterested
@Johannes: True. I'm not sure, but I have a feeling that those same rules would allow you to inspect the initial element of any struct from a collection of structs that share the same initial element. I don't have a standard reference for that. It would seem to follow from the union requirement, though.Physiologist
According to cellperformance.beyond3d.com/articles/2006/06/…, casting via a union isn't strictly defined either.Cathey
@OliCharlesworth: We're not casting, though, we're just inspecting the initial elemenent. See C99/6.5.2.3/5.Physiologist
@KerrekSB: The authors of gcc have their own notion of what the Common Initial Sequence rule is supposed to mean. They interpret it in such a way as to be basically useless, and then claim that it would be silly to have the compiler support such a useless rule, so they don't.Lordship

© 2022 - 2024 — McMap. All rights reserved.