Is it safe to cast a C struct to another with fewer elements?
Asked Answered
B

9

11

I'm trying to do OOP on C (just for fun) and I've come up with a method to do data abstraction by having a struct with the public part and a larger struct with the public part first and then the private part. This way I create in the constructor the whole struct and return it casted to the small struct. Is this correct or could it fail?

Here is an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// PUBLIC PART (header)
typedef struct string_public {
    void (*print)( struct string_public * );
} *string;

string string_class_constructor( const char *s );
void string_class_destructor( string s );

struct {
    string (*new)( const char * );
    void (*delete)( string );
} string_class = { string_class_constructor, string_class_destructor };


// TEST PROGRAM ----------------------------------------------------------------
int main() {
    string s = string_class.new( "Hello" );
    s->print( s );
    string_class.delete( s ); s = NULL;
    return 0;
}
//------------------------------------------------------------------------------

// PRIVATE PART
typedef struct string_private {
    // Public part
    void (*print)( string );
    // Private part
    char *stringData;
} string_private;

void print( string s ) {
    string_private *sp = (string_private *)( s );
    puts( sp->stringData );
}

string string_class_constructor( const char *s ) {
    string_private *obj = malloc( sizeof( string_private ) );
    obj->stringData = malloc( strlen( s ) + 1 );
    strcpy( obj->stringData, s );
    obj->print = print;
    return (string)( obj );
}

void string_class_destructor( string s ) {
    string_private *sp = (string_private *)( s );
    free( sp->stringData );
    free( sp );
}
Babble answered 10/3, 2015 at 16:5 Comment(4)
string is a typedef for string_public and also a member name in string_private, this is really confusing.Lidless
You are right, I should rename it, but it was just a test. Anyway as a member is the actual array of characters and as a typedef is the "class" with its methods.Arevalo
"to cast a C struct to another with less elements?" is misleading. There is no casting from one C struct to another here. OTOH there is casting from one pointer type to another pointer type. Suggest "to cast a C struct * to another that points to a struct with less elements?"Karim
fewer elements, not less.Sonorant
S
12

In theory, this could be unsafe. Two separately-declared structs are allowed to have different internal arrangements, as there's absolutely no positive requirement for them to be compatible. In practice, a compiler is highly unlikely to actually generate different structures for two identical member lists (unless there's an implementation-specific annotation somewhere, at which points the bets are off - but you'd know about this).

The conventional solution is to take advantage of the fact that a pointer to any given struct is always guaranteed to be the same as the pointer to that struct's first element (i.e. structs do not have leading padding: C11, 6.7.2.1.15). That means that you can force the leading elements of two structs to be not only the same, but strictly compatible, by using a value struct of a shared type in the leading position for both of them:

struct shared {
    int a, b, c;
};
struct Foo {
    struct shared base;
    int d, e, f;
};
struct Bar {
    struct shared base;
    int x, y, z;
};

void work_on_shared(struct shared * s) { /**/ }

//...
struct Foo * f = //...
struct Bar * b = //...
work_on_shared((struct shared *)f);
work_on_shared((struct shared *)b);

This is perfectly compliant and guaranteed to work, because packing the shared elements into a single leading struct means that only the position of the leading element of Foo or Bar is ever explicitly relied upon.


In practice alignment isn't likely to be the problem that bites you. A much more pressing concern is aliasing (i.e. the compiler is allowed to assume pointers to incompatible types do not alias). A pointer to a struct is always compatible with a pointer to one of its member types, so the shared base strategy will give you no problems; using types that the compiler isn't forced to mark as compatible could cause it to emit incorrectly optimised code in some circumstances, which can be a very difficult Heisenbug to find if you aren't aware of it.

Supersaturate answered 10/3, 2015 at 16:40 Comment(3)
Also pointer arithmetic is problematic. struct shared *b; struct foo *f = b; --> &b[1] != &f[1];Karim
So using this technique (base struct instance as first member of derived structs) there is no need to turn off strict aliasing (contrary to op's solution as underlined by Mints97 in his answer below) ?Turbit
@Turbit yes, if you check C11 6.5 paragraph 7, this is mentioned as one of the situations where aliasing is allowed in all circumstances (fifth example). This is always well-defined without extensions.Supersaturate
S
1

Here's what I would do if you're really intent on hiding the definition of string_private.

First, you should extern the struct containing the class definition or it will be duplicated in every translation unit that declares the header. Move it to the 'c' file. Otherwise, very little changes in the public interface.

string_class.h:

#ifndef STRING_CLASS_H
#define STRING_CLASS_H
// PUBLIC PART (header)
typedef struct string_public {
    void (*print)( struct string_public * );
} *string;

string string_class_constructor( const char *s );
void string_class_destructor( string s );

typedef struct {
    string (*new)( const char * );
    void (*delete)( string );
} string_class_def; 

extern string_class_def string_class;

#endif

In the string_class source, declare a private structure type, not seen outside the translation unit. Make the public type a member of that struct. Constructor will allocate the private struct object, but return a pointer to the public object contained within. Use offsetof magic to cast from public back to private.

string_class.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#include "string_class.h"

typedef struct string_private {
    void (*print)( string );
    char *string;
    struct string_public public;
} string_private;

string_class_def string_class = { string_class_constructor, string_class_destructor };

void print( string s ) {
    /* this ugly cast is where the "Magic"  happens.  Basically,
       it converts the string into a char pointer so subtraction will
       work on byte boundaries.  Then subtracts the offset of public 
       from the start of string_private to back up to a pointer to 
       the private object. "offsetof" should be in <stddef.h>*/
    string_private *sp = (string_private *)( (char*) s - offsetof(struct string_private, public));
    // Private part
    puts( sp->string );
}

string string_class_constructor( const char *s ) {
    string_private *obj = malloc( sizeof( string_private ) );
    obj->string = malloc( strlen( s ) + 1 );
    strcpy( obj->string, s );
    obj->public.print = print;
    return (string)( &obj->public );
}

void string_class_destructor( string s ) {
    string_private *sp = (string_private *)( (char*) s - offsetof(struct string_private, public));
    free( sp->string );
    free( sp );
}

Usage goes unchanged...

main.c:

#include <stdlib.h> // just for NULL
#include "string_class.h"

// TEST PROGRAM ----------------------------------------------------------------
int main() {
    string s = string_class.new( "Hello" );
    s->print( s );
    string_class.delete( s ); s = NULL;
    return 0;
}
//------------------------------------------------------------------------------
Stagger answered 10/3, 2015 at 16:32 Comment(1)
That's the idea, as you see it says "header" in the comment: "// PUBLIC PART (header)". In fact, to do it correctly you should have used the "static" keyword to ensure data hiding properly.Arevalo
E
1

Well, it might work, but it is not a very safe way to do things. Essentially you are just trying to ‘hide’ access to the object's private data by casting the structure short. The data is still there, it just can’t be accessed semantically. The problem with this approach is that you need to know exactly how the compiler is ordering the bytes in the structure or you will get varying results from the cast. From memory this is not defined in the C spec (someone else can correct me on this).

A better way would be to just prefix the private properties with private_ or something like that. If you really really want to limit scope, then create a static local data array inside the class’s .c file and append a ‘private’ data structure to this each time you create a new object. Essentially you are then keeping the private data inside the C module and making use of the c file scoping rules to give you your private access protection, though this is really a lot of work for nothing.

Also your OO design is a bit confusing. The string class is really a string factory object creating strings objects, and it would be clearer if you separated out these two things.

Embody answered 10/3, 2015 at 16:33 Comment(2)
"The string class is really a string factory object" It pretends to be the static part of the class, with the constructor, the destructor, and other static functions and variables that could be added.Arevalo
Oh okay I see what you're doing.Embody
U
1

C does not guarantee that it will work, but generally it does. In particular, C explicitly leaves most aspects of the representation of struct values unspecified (C99 6.2.6.1), including whether the representation of values of your smaller struct will be the same as the layout of the corresponding initial members of the larger struct.

If you want an approach that C guarantees will work, then give your subclass a member of its superclass's type (not a pointer to such). For example,

typedef struct string_private {
    struct string_public parent;
    char *string;
} string_private;

That requires different syntax for accessing "inherited" members, but you can be absolutely sure that ...

string_private *my_string;
/* ... initialize my_string ... */
function_with_string_parameter((string) my_string);

... works (given that you have typedefed "string" as struct string_public *). Moreover, you can even avoid casts like so:

function_with_string_parameter(&my_string->parent);

How useful any of this may be is an altogether different question, however. Using object-oriented programming is not an appropriate objective in itself. OO is a tool for organizing your code that has some notable advantages, but you can write in OO style without mimicking the specific syntax of any particular OO language.

Uniseptate answered 10/3, 2015 at 16:44 Comment(0)
P
1

In most cases, this is all right with an initial sequence of any length, since all known compilers will give the common members of the two structs the same padding. If they didn't give them the same padding, they'd have a hell of a time following this requirement of the C standard:

One special guarantee is made in order to simplify the use of unions: If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them.

I really can't imagine how a compiler would handle this if the "initial sequence" would be padded differently in the two structs.

But there is one serious "but". strict aliasing should be turned off for this setup to work.

Strict aliasing is a rule that basically states that two pointers of incompatible types cannot reference the same memory location. Therefore, if you cast a pointer to your larger struct to a pointer to the smaller one (or vice versa), get the value of a member in their initial sequence via dereferencing one of them, then change that value via the other, and then check it again from the first pointer, it won't have changed. I.e.:

struct smaller_struct {
    int memb1;
    int memb2;
}

struct larger_struct {
    int memb1;
    int memb2;
    int additional_memb;
}

/* ... */

struct larger_struct l_struct, *p_l_struct;
struct smaller_struct *p_s_struct;

p_l_struct = &l_struct;
p_s_struct = (struct smaller_struct *)p_l_struct;

p_l_struct->memb1 = 1;
printf("%d", p_l_struct->memb1); /* Outputs 1 */

p_s_struct->memb1 = 2;

printf("%d", p_l_struct->memb1); /* Should output 1 with strict-aliasing enabled and 2 without strict-aliasing enabled */

You see, a compiler which uses strict-aliasing optimisations (like GCC in -O3 mode) wants to make life easier for itself: it considers that two pointers of incompatible types just can't reference the same memory location, so it doesn't consider that they do. So, when you access p_s_struct->memb1, it will think that nothing ever changed the value of p_s_struct->memb1 (that it knows to be 1), so it won't "check" memb1's actual value and just output 1.

A way to circumvent this could be declaring your pointers as pointing to volatile data (which means telling the compiler that this data can be changed from elsewhere without it noticing), but the standard doesn't guarantee this to work.

Please note that all said above applies to structs that are not packed in a special way by the compiler.

Protoplast answered 10/3, 2015 at 16:49 Comment(1)
The primary reason that the CIS guarantee is useful is that it allows functions to process multiple kinds of structures interchangeably. Given that the Standard authors of the Standard expressly recognize the possibility of an implementation being conforming and yet also being of such poor quality as to be useless, the fact that it doesn't mandate that compilers honor CIS guarantees in useful fashion doesn't mean that implementations that make no effort to do so should not be recognized as being of poor quality.Medicine
M
1

Whether or not this code will work on a given compiler depends upon the quality, target platform, and intended usage of the compiler in question. There are two places you might run into trouble:

  1. On some platforms, the fastest way to write the last member of a structure may disturb padding bits or bytes that follow it. If that object is part of the Common Initial Sequence shared with a longer structure, and bits that were used as padding in the shorter one are used to hold meaningful data in the longer one, such data might get disturbed when writing the last field in the shorter type. I don't think I've seen any compilers actually do this, but the behavior would be allowable, which is why the CIS rule only allows for "inspection" of common members.

  2. While quality compilers should seek to uphold the Common Initial Sequence guarantees in useful fashion, the Standard treats support for such things as a Quality of Implementation issue, and it has become more fashionable for some compilers to interpret N1570 6.5p7 in the lowest-quality fashion they think the Standard would allow, unless invoked with -fno-strict-aliasing. From my observation, icc seems to support the CIS guarantees in -fstrict-aliasing mode, but both gcc and clang process a low-quality dialect that for all practical purposes ignores the Common Initial Sequence rule even in cases where pointers are never aliased within their respective lifetimes.

Use a good compiler and your code will work. Use a poor-quality compiler, or one that is configured to behave in poor-quality fashion, and your code will fail.

Medicine answered 13/7, 2018 at 20:12 Comment(0)
M
0

Casting from one struct to another is unreliable because the types are incompatible. What you can rely on though is that if the first elements of the parent struct are all at the top of the child struct and in the same order, then a reinterpret cast will let you do what you want. Like so:

struct parent {
  int data;
  char *more_data;
};

struct child {
  int data;
  char *more_data;
  double even_more_data;
};

int main() {
  struct child c = {0};

  struct parent p1 = (struct parent) c; /* bad */

  struct parent p2 = *(struct parent *) &c; /* good */
}

This is the exact same way that python implements its object oriented programing at the C level.

Madewell answered 10/3, 2015 at 16:10 Comment(7)
C does not have explicit reinterpret cast. All casts in C are reinterpret cast.Aldwin
Then the difference is on casting the pointer vs casting the struct right? I'm doing it as you say then, so I think it's fine, but thanks for the clarification.Arevalo
Calling them reinterpret casts does help those from a C++ perspective. Furthermore, although I'm not into C as much as C++, it's hard to believe that all casts in C are such.Provence
@Aldwin C casting is not always reinterpret casts, e.g. float a = 2.5; float b = (float) (int) a;Madewell
@Madewell you are right. I forgot about using casts for numeric conversions.Aldwin
@JotaGe Since you are doing casts on pointers, you are doing it correctly.Madewell
Per the standard, (all) C cast expressions convert the value of the operand to the specified type. However, converting a value from one object pointer type to another object pointer type is an exercise in relabelling; it is comparable to reinterpreting the pointer's referrent.Uniseptate
A
0

If I remember correctly, this type of casting is undefined behaviour per the standard. But, GCC and MS C both guarantee that this will work as you think.

So, for example:

struct small_header {
    char[5]  ident;
    uint32_t header_size;
}

struct bigger_header {
    char[5]  ident;
    uint32_t header_size;
    uint32_t important_number;
}

You can cast them back and forth and access the two first members safely. Of course, if you have a small one and cast it to the big one, accessing the important_number member with get you an UB.

Edit:

This guy makes a nice article about this:

Type punning isn't funny: Using pointers to recast in C is bad.

Assyria answered 10/3, 2015 at 16:22 Comment(5)
not UB, rather well defined, and guaranteed by the standardRevegetate
@Revegetate care to support that with the standard section?Assyria
hm, the only one I find is only valid is the structs are part of the same union. (6.5.2.3) Gonna look some moreRevegetate
"If I remember correctly, this type of casting is undefined behaviour per the standard." That's the answer I was looking for, If you (or anyone) can say it for sure I think it's the correct answer.Arevalo
The type access rules in 6.5p7 make no effort to define all the cases a quality implementation should handle in predictable fashion. It doesn't even allow for things like struct s {int x;} foo = {0}; foo.x=1; but instead relies upon compiler writers to use some common sense. Unfortunately, the way the Standard is written has led compiler writers to think quality compilers shouldn't be expected to handle any constructs beyond those listed, except for those that no program could plausibly be expected to do without.Medicine
A
0

Another elegant way to extend structs with a common part (like OOP)

#define BASE_T \
    int a;     \
    int b;     \
    int c;

struct Base_t {
    BASE_T
};
struct Foo_t {
    BASE_T
    int d, e, f;
};
struct Bar_t {
    BASE_T
    int x, y, z;
};

void doBaseStuff(struct Base_t * pBase) {
    pBase->a = 1;
    pBase->b = 2;
    pBase->c = 3;
}

int main() {
    struct Foo_t foo;
    struct Bar_t bar;
    doBaseStuff((struct Base_t*) &foo);
    doBaseStuff((struct Base_t*) &bar);
    bar.a = 0; // I can directly access on properties of BASE_T, without doing any cast
    foo.e = 6;
    return 0;
}

This code is compatible for C98 and C99, but do not add any spaces after escape characters \ in BASE_T

Apheliotropic answered 28/11, 2019 at 15:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.