easy struct inheritance & pseudo-polymorphism vs strict aliasing
Asked Answered
B

2

6

If anybody answers my question, please don't tell me to use C++.

So, I'm making a small library in C that uses an object-oriented approach. I chose to use the less-common of the two main approaches to inheritance in C: copying the members of the base type to the beginning of the derived type. Something like this:

struct base {
  int a;
  int b;
  char c;
};

struct derived {
  int a;
  int b;
  char c;
  unsigned int d;
  void (*virtual_method)(int, char);
};

This approach is less popular than the other one (an instance of the base type as the first member of the derived type) because

  1. technically, there is no standartized guarantee that the first common members of the base and derived structs will have the same offsets. However, with the exception of cases when one of the structs is packed and the other is not, they will have the same offsets on most, if not all, known compilers.
  2. this approach's most serious flaw: it violates strict aliasing. Casting a pointer to a derived struct to its base type and then dereferencing the pointer is technically undefined behaviour.

However, it also has its benefits compared to the other approach:

  1. Less verbosity: accessing a member of a derived struct that has been inherited is the same as accessing one that has not been inherited, instead of casting to the base type and then accessing the needed member;
  2. This is actually real inheritance and not composition;
  3. It is as easy to implement as the other approach, although a little preprocessor abuse may be needed;
  4. We can get a half-baked form of actual multiple inheritance, where we can inherit from several base types, but can cast to only one of them.

I have been looking into possibilities for making my library compile and work correctly with compilers that enforce strict aliasing (like gcc) without the user needing to turn it off manually. Here are the possibilities that I've looked into:

  1. Unions. These are, sadly, a no-no for several reasons:

    1. Verbosity returns! To follow the standard's rules for accessing the first common members of 2 structs via a union, one must (as of C99) explicitly use the union to access the first common members. We'd need special syntaxis to access members of each type in the union!
    2. Space. Consider an inheritance hierarchy. We have a type that we want to be able to cast to from each of its derived types. And we want to do it for every type. The only feasible union-employing solution I see is a union of the entire hierarchy that would have to be used to convert instances of a derived type to a base type. And it would have to be just as large as the most derived type in the entire hierarchy...
  2. Using memcpy instead of direct dereferencing (like here). That looks like a nice solution. However, the function call incurs an overhead, and yes, once again, verbosity. As I understand, what memcpy does can also be done manually by casting a pointer to a struct to a pointer to char and then dereferencing it, something like this: (member_type)(*((char*)(&struct_pointer->member))) = new_value; Gah, verbosity again. Well, this can be wrapped with a macro. But will that still work if we've casted our pointer to a pointer to an incompatible type, and then casted it to char* and dereferenced it? Like this: (member_type)(*((char*)(&((struct incompatible_type*)struct_pointer)->member))) = new_value;

  3. Declaring all instances of types that we're going to cast as volatile. I wonder why this doesn't come up often. volatile is, as I understand, used to tell the compiler that the memory pointed to by a pointer may change unexpectedly, thus cancelling optimizations based on the assumption that a segment of pointed-to memory is not going to change, which is the cause of all strict-aliasing problems. This is, of course, still undefined behaviour; but can't it be a feasible cross-platform solution for "hackishly" disabling strict aliasing optimisations for certain instances of certain types?

Aside from the questions above, here's two more:

  1. Is something I said above erroneous?
  2. Have I missed something that could help in my case?
Brookins answered 16/1, 2015 at 9:35 Comment(2)
As of C11, anonymous struct s are permissible. See is this question has anything useful.Loki
@MikeofSST: sorry, this is a bad example. An anonymous struct is, by the standard, incompatible with another struct (including non-anonymous ones) even if they share the same members. This might serve as a workaround for strict aliasing in compilers that follow the non-standard practice of allowing the usage of typedefs for anonymous structs, but follow the standard in other regards. However these are very few...Brookins
H
4

I don't think your idea about casting via char* is valid. The rule is:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types

A sub-expression of your expression is compatible but the overall expression isn't compatible.

I think the only realistic approach is composition:

struct base {
  int a;
  int b;
  char c;

  void (*virtual_method)(base*/*this*/,int, char);

};

struct derived {
    struct base;
    unsigned int d;
};

I realize that's an intellectually unappealing way to achieve inheritance.

PS: I haven't put your virtual member function pointer in my derived class. It needs to be accessible from base so needs to be declared there (assuming it's a polymorphic function that exists for both base and derived). I've also added a this parameter to flesh out the model a touch.

Handle answered 16/1, 2015 at 10:46 Comment(3)
Thank you for your answer. Your statement on my idea about casting to char* seems perfectly correct. So, if I'd want to manually emulate what memcpy does, I'd have to do something like this for each byte-chunk of the value I need to assign? *((char*)(&struct_pointer->member)) = (char)chunk_of_new_value Doesn't look like it's worth the loss of potential overhead that calling memcpy can cause...Brookins
@Mints97 yes, what you end up doing is treating structures as blocks of memory into and out of which you copy bytes. It would be very cumbersome and error-prone. However you shouldn't necessarily perceive an overhead. Some compilers (incl. gcc) inline intrinsic functions like memcpy(). There's potentially no reason to think x=y is faster (or potentially [significantly] different from) memcpy(&x,&y,sizeof(x)). You can macro that up...Handle
@DanAllen: A compiler with sensible aliasing rules should be able to outperform one which requires memcpy, since the source argument to memcpy is allowed to alias any kind of data in cases where the destination has a declared type, while sensible aliasing rules would make it possible to have the compiler recognize accesses via baseType* as aliasing any derived type, without having to presume that such accesses might alias almost anything of any type.Willable
A
1

memcpy should be the way to go. Don't worry about function call overhead. Most often than not, there's none. memcpy is usually a compiler intrinsic, which means the compiler should inline the most efficient possible code for it, and it should know where it can optimize memcpies out.

Don't cast pointers to incompatible pointers and then dereference. That's a road towards undefined behavior.

If you accept expression statements and gcc's ##__VA_ARGS__, you could have a MC_base_method(BaseType,BaseMethod,Derived_ptr,...) macro that calls a BaseMethod with Derived_ptr and ... correctly, as long as you can work with a copy of a struct as if it was the original (e.g., no pointers to the struct's own members).

Here's an example with some additional OOP-supporting macro sugar:

//Helper macros for some C++-like OOP in plain C 
#define MC_t_alias(Alias, ...)  typedef __VA_ARGS__ Alias               //like C++'s  using 
#define Struct(Nm,...) MC_t_alias(Nm, struct Nm); struct Nm __VA_ARGS__ //autypedefed structs

#define ro const //readonly -- I don't like the word const

//Helper macros for method declarations following my 
//Type__method(Type* X, ...) naming convention
#define MC_mro(Tp,Meth, ...) Tp##__##Meth(Tp ro*X, ##__VA_ARGS__)

#include <stdio.h>
#include <string.h>
//I apend my data structs with _d to know they're data structs
Struct(base_d, {
  int a;
  int b;
  char c;
});

Struct(derived_d, {
  int a;
  int b;
  char c;
  unsigned int d;
  void (*virtual_method)(derived_d*, int, char);
});

//print method is unaware of derived_d 
//it takes a `base_d const *X` (the mro (method, readonly) macros hides that argument (X==`this` in common OOP speak))
int MC_mro(base_d,print) 
{
    return printf("{ a=%d b=%d c=%d }", X->a, X->b, X->c);
}

/*
    Call a (nonvirtual) base method 
*/

#define MC_base_method(BaseType, Method, Derived_p, ...)                       \
({                                                                             \
    int _r; /*if you conventionally return ints*/                                \
            /*otherwise you'll need __typeof__ to get the type*/               \
    BaseType _b;                                                               \
    memcpy(&_b, Derived_p, sizeof(_b));                                        \
    _r = BaseType##__##Method(&_b, ##__VA_ARGS__);                             \
    /*sync back -- for non-readonly methods */                                 \
    /*a smart compiler might be able to get rid of this for ro method calls*/  \
    memcpy(Derived_p, &_b, sizeof(_b));                                        \
    _r;                                                                        \
})


int main()
{
    derived_d d = {1,2,3,4};
    MC_base_method(base_d, print, &d);
}

I consider it the compilers job to optimize the memcpies out. However, if it doesn't and your structs are huge, you're screwed. Same if your structs contain pointers to their own members (i.e., if you can't work with a byte per byte copy as if it was the original).

Adjunction answered 8/1, 2017 at 13:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.