OO Polymorphism in C, aliasing issues?
Asked Answered
L

2

7

Me and a colleague are trying to achieve a simple polymorphic class hierarchy. We're working on an embedded system and are restricted to only using a C compiler. We have a basic design idea that compiles without warnings (-Wall -Wextra -fstrict-aliasing -pedantic) and runs fine under gcc 4.8.1.

However, we are a bit worried about aliasing issues as we do not fully understand when this becomes a problem.

In order to demonstrate we have written a toy example with an 'interface' IHello and two classes implementing this interface 'Cat' and 'Dog.

#include <stdio.h>

/* -------- IHello -------- */
struct IHello_;
typedef struct IHello_
{
    void (*SayHello)(const struct IHello_* self, const char* greeting);
} IHello;

/* Helper function */
void SayHello(const IHello* self, const char* greeting)
{
    self->SayHello(self, greeting);
}

/* -------- Cat -------- */
typedef struct Cat_
{
    IHello hello;
    const char* name;
    int age;
} Cat;

void Cat_SayHello(const IHello* self, const char* greeting)
{
    const Cat* cat = (const Cat*) self;
    printf("%s I am a cat! My name is %s and I am %d years old.\n",
           greeting,
           cat->name,
           cat->age);
}

Cat Cat_Create(const char* name, const int age)
{
    static const IHello catHello = { Cat_SayHello };
    Cat cat;

    cat.hello = catHello;
    cat.name = name;
    cat.age = age;

    return cat;
}

/* -------- Dog -------- */
typedef struct Dog_
{
    IHello hello;
    double weight;
    int age;
    const char* sound;
} Dog;

void Dog_SayHello(const IHello* self, const char* greeting)
{
    const Dog* dog = (const Dog*) self;
    printf("%s I am a dog! I can make this sound: %s I am %d years old and weigh %.1f kg.\n",
           greeting,
           dog->sound,
           dog->age,
           dog->weight);
}

Dog Dog_Create(const char* sound, const int age, const double weight)
{
    static const IHello dogHello = { Dog_SayHello };
    Dog dog;

    dog.hello = dogHello;
    dog.sound = sound;
    dog.age = age;
    dog.weight = weight;

    return dog;
}

/* Client code */
int main(void)
{
    const Cat cat = Cat_Create("Mittens", 5);
    const Dog dog = Dog_Create("Woof!", 4, 10.3);

    SayHello((IHello*) &cat, "Good day!");
    SayHello((IHello*) &dog, "Hi there!");

    return 0;
}

Output:

Good day! I am a cat! My name is Mittens and I am 5 years old.

Hi there! I am a dog! I can make this sound: Woof! I am 4 years old and weigh 10.3 kg.

We're pretty sure the the 'upcast' from Cat and Dog to IHello is safe since IHello is the first member of both these structs.

Our real concern is the 'downcast' from IHello to Cat and Dog respectively in the corresponding interface implementations of SayHello. Does this cause any strict aliasing issues? Is our code guaranteed to work by the C standard or are we simply lucky that this works with gcc?

Update

The solution that we eventually decide to use must be standard C and cannot rely on e.g. gcc extensions. The code must be able to compile and run on different processors using various (proprietary) compilers.

The intention with this 'pattern' is that client code shall receive pointers to IHello and thus only be able to call functions in the interface. However, these calls must behave differently depending on which implementation of IHello that was received. In short, we want identical behaviour to the OOP concept of interfaces and classes implementing this interface.

We are aware of the fact that the code only works if the IHello interface struct is placed as the first member of the structs which implement the interface. This is a limitation that we are willing to accept.

According to: Does accessing the first field of a struct via a C cast violate strict aliasing?

§6.7.2.1/13:

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

The aliasing rule reads as follows (§6.5/7):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

According to the fifth bullet above and the fact that structures contain no padding at the top we are fairly sure that 'upcasting' a derived struct that implements the interface to a pointer to the interface is safe, i.e.

Cat cat;
const IHello* catPtr = (const IHello*) &cat; /* Upcast */

/* Inside client code */
void Greet(const IHello* interface, const char* greeting)
{
    /* Users do not need to know whether interface points to a Cat or Dog. */
    interface->SayHello(interface, greeting); /* Dereferencing should be safe */
}

The big question is whether the 'downcast' used in the implementation of the interface function(s) is safe. As seen above:

void Cat_SayHello(const IHello* hello, const char* greeting)
{
    /* Is the following statement safe if we know for
     * a fact that hello points to a Cat?
     * Does it violate strict aliasing rules? */
    const Cat* cat = (const Cat*) hello;
    /* Access internal state in Cat */
}

Also note that changing the signature of the implementation functions to

Cat_SayHello(const Cat* cat, const char* greeting);
Dog_SayHello(const Dog* dog, const char* greeting);

and commenting out the 'downcast' also compiles and runs fine. However, this generates a compiler warning for function signature mismatch.

Labia answered 17/7, 2015 at 13:44 Comment(9)
It would be much easier if the structs cat and dog had its own pointer to its own function. Then you would just call: cat.SayHello(&cat , "Good day!");Unicellular
And then it would be easier not to have a pointer in the first place :pQuinary
If we did not want polymorphic behaviour then we could just use two separate functions, e.g. void Cat_SayHello(Cat* cat) and void Dog_SayHello(Dog* dog) however this is not sufficient. The idea here is that the client code only receives an IHello*and can then only call functions implemented by this interface but the result will differ depending on which implementation of IHello that was passed to the client code.Labia
For an excellent article/tutorial on incomplete datatypes, encapsulation, data-hiding, dynamic linkage/late binding and object oriented approaches to dynamic data-structures, see Object Oriented Programming in ANSI-C. While it is written at, and requires, a fairly in-depth knowledge of C, it is well worth the effort requierd to digest the material. It covers a number of topics not included in most C books or tutorials. Specifically, see the examples in Chapter 2.Dolph
It's perfectly safe, yes. However, you should understand that if you happen to have a Kitten "class" with SayHello pointing to Kitten_SayHello(), invoking Greet((Cat *)kitten, greeting); will invoke Kitten_SayHello(), not Cat_SayHello() as intended by the programmer. You'd need a helper function-like macro that mutates the type information temporarily to make it work that way. That could obviously get ugly very easily. E.g. IHello->Cat->Kitten means casting Kitten to Cat would involve manipulating kitten->cat.hello.SayHello, rather than a simple cast that allows Cat_SayHello().Extramural
Why is not possible to go with C++?Cultivable
This looks like a good article: cs.rit.edu/~ats/books/ooc.pdfFirearm
The rules you posted should allow for both the upcast and downcast. Since IHello is compatible with Cat, and a value can be accessed through a pointer to "a type compatible with the effective type of the object", casting Cat to IHello should be legal. Since Cat is a struct that contains IHello, and a value can be accessed through a pointer to "an aggregate or union type that includes one of the aforementioned types", casting IHello to Cat should be legal.Cosmetic
Side note: using SayHello(&cat.hello, "Hi there!"); (compiler check) instead of SayHello((IHello*) &cat, "Good day!"); (no check at all) would be niceFootton
F
3

I've been doing objects in c for many years doing exactly the kind of composition you are doing here. I'm going to recommend you not do the simple cast you are describing, but to justify that I need an example. For instance a timer callback mechanism used with a layered implementation:

typedef struct MSecTimer_struct MSecTimer;
struct MSecTimer_struct {
     DoubleLinkedListNode       m_list;
     void                       (*m_expiry)(MSecTimer *);
     unsigned int               m_ticks;
     unsigned int               m_needsClear: 1;
     unsigned int               m_user: 7;
};

When one of these timers expires the managing system calls the m_expiry function and passes in the pointer to the object:

timer->m_expiry(timer);

Then take a base object that does something amazing:

typedef struct BaseDoer_struct BaseDoer;
struct BaseDoer_struct
{
     DebugID      m_id;
     void         (*v_beAmazing)(BaseDoer *);  //object's "virtual" function
};

//BaseDoer's version of BaseDoer's 'virtual' beAmazing function
void BaseDoer_v_BaseDoer_beAmazing( BaseDoer *self )
{
    printf("Basically, I'm amazing\n");
}

My naming system has a purpose here, but that's not really the focus. We can see a variety of object oriented function calls that might be needed:

typedef struct DelayDoer_struct DelayDoer;
struct DelayDoer_struct {
     BaseDoer     m_baseDoer;
     MSecTimer    m_delayTimer;
};

//DelayDoer's version of BaseDoer's 'virtual' beAmazing function
void DelayDoer_v_BaseDoer_beAmazing( BaseDoer *base_self )
{
     //instead of just casting, have the compiler do something smarter
     DelayDoer *self = GetObjectFromMember(DelayDoer,m_baseDoer,base_self);

     MSecTimer_start(m_delayTimer,1000);  //make them wait for it
}

//DelayDoer::DelayTimer's version of MSecTimer's 'virtual' expiry function
void DelayDoer_DelayTimer_v_MSecTimer_expiry( MSecTimer *timer_self )
{
    DelayDoer *self = GetObjectFromMember(DelayDoer,m_delayTimer,timer_self);
    BaseDoer_v_BaseDoer_beAmazing(&self->m_baseDoer);
}

I've been using the same macro for GetObjectFromMember since around 1990, and somewhere along the line the Linux kernel created the same macro and called it container_of (the parameters are in a different order though):

  #define GetObjectFromMember(ObjectType,MemberName,MemberPointer) \
              ((ObjectType *)(((char *)MemberPointer) - ((char *)(&(((ObjectType *)0)->MemberName)))))

which relies on (technically) undefined behavior (dereferencing a NULL object), but is portable to every old (and new) c compiler I've ever tested. The newer version requires the offsetof macro, which is now part of the standard (as of C89 apparently):

#define container_of(ptr, type, member) ({ \
            const typeof( ((type *)0)->member ) *__mptr = (ptr); 
            (type *)( (char *)__mptr - offsetof(type,member) );})

I, of course, prefer my name, but whatever. Using this method makes your code not rely on putting the base object first, and also makes the second use case possible, which I find very useful in practice. All of the aliasing compiler issues are managed within the macro (casting through the char * I think, but I'm not really a standards lawyer).

Fleecy answered 13/11, 2015 at 18:11 Comment(1)
Thank you for your detailed answer! As of now we are currently using a macro similar to your GetObjectFromMember. As you explained this allows us to put the base object anywhere in the struct and also enables us to use 'multiple inheritance'.Labia
L
2

From the section of the standard that you quoted:

A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa

It's definitely safe to convert a pointer like cat->hello into a Cat pointer, and similarly for dog->hello, so the casts in your SayHello functions should be fine.

At the call site, you're doing the opposite: converting a pointer to a structure into a pointer to the first element. That's also guaranteed to work.

Lemming answered 13/11, 2015 at 22:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.