Casting one struct pointer to another - C
Asked Answered
A

3

38

Please consider the following code.

enum type {CONS, ATOM, FUNC, LAMBDA};

typedef struct{
  enum type type;
} object;

typedef struct {
  enum type type;
  object *car;
  object *cdr;
} cons_object;

object *cons (object *first, object *second) {
  cons_object *ptr = (cons_object *) malloc (sizeof (cons_object));
  ptr->type = CONS;
  ptr->car = first;
  ptr->cdr = second;
  return (object *) ptr;
}

In the cons function, variable ptr is of type cons_object*. But in the return value it is converted to type of object*.

  1. I am wondering how this is possible because cons_object and object are different structs.
  2. Are there any issues in doing stuff like this?

Any thoughts!

Arcturus answered 22/9, 2010 at 4:8 Comment(0)
E
39

This is fine and is a fairly common technique for implementing "object-orientation" in C. Because the memory layout of structs is well-defined in C, as long as the two object share the same layout then you can safely cast pointers between them. That is, the offset of the type member is the same in the object struct as it is in the cons_object struct.

In this case, the type member tells the API whether the object is a cons_object or foo_object or some other kind of object, so you might be see something like this:

void traverse(object *obj)
{
    if (obj->type == CONS) {
        cons_object *cons = (cons_object *)obj;
        traverse(cons->car);
        traverse(cons->cdr);
    } else if (obj->type == FOO) {
        foo_object *foo = (foo_object *)obj;
        traverse_foo(foo);
    } else ... etc
}

More commonly, I've seem implementations where the "parent" class is defined as the first member of the "child" class, like so:

typedef struct {
    enum type type;
} object;

typedef struct {
    object parent;

    object *car;
    object *cdr;
} cons_object;

This works in largely the same way, except you've got a strong gaurantee that the memory layout of the child "classes" will be the same as the parents. That is, if you add a member to the 'base' object, it'll automatically be picked up by the children and you won't have to manually make sure all of the structures are in sync.

Erode answered 22/9, 2010 at 4:14 Comment(8)
interesting. this knowledge is going to simplify my code a lot. thanks.Arcturus
You should mention that this is legitimate C with well-defined behavior, and not a "hack" or invocation of "undefined behavior".Millican
re. object-in-cons_object - you can also use macros to make the typecasting a little safer in this case, e.g. #define OBJECT(x) &((x)->parent). This has no runtime cost (it's a memory address equal to x) but means you don't accidentally cast something strange.Sulphanilamide
Thank you. That makes sense of some Git code I was looking at.Algar
It would be cool if someone cited a standard, and explained why the original example does not contradict the strict aliasing rule as claimed in this answer: https://mcmap.net/q/405195/-casting-one-struct-pointer-to-another-c Nested structs also mentioned at: #8416917Sadden
Having matching memory layout does not make it legal to cast between struct pointers as per the strict-aliasing rule. There's actually a PEP on fixing this very issue in CPython's object implementation: python.org/dev/peps/pep-3123. Obviously these casts seem reasonable and almost always work but UB is UB.Haemophilic
Very late to the party, but I also agree that this answer is misleading - two unrelated structs with the same layout may not alias one another.Sergius
@OliverCharlesworth: The answer is correct for the language invented by Dennis Ritchie, and also for dialects which the vast majority of compilers can be configured to process. The Standard relegates such ability to the status of a "popular extension" which implementations aren't required to support, but passes no judgement as to whether dialects that omit it should be considered suitable for any particular purpose.Styx
W
21

To add to Dean's answer, here's something about pointer conversions in general. I forgot what the term is for this, but a pointer to pointer cast performs no conversion (in the same way int to float is). It is simply a reinterpretation of the bits that they point to (all for the compiler's benefit). "Non-destructive conversion" I think it was. The data doesn't change, only how the compiler interprets what is being pointed at.

e.g.,
If ptr is a pointer to an object, the compiler knows that there is a field with a particular offset named type of type enum type. On the other hand if ptr is cast to a pointer to a different type, cons_object, again it will know how to access fields of the cons_object each with their own offsets in a similar fashion.

To illustrate imagine the memory layout for a cons_object:

                    +---+---+---+---+
cons_object *ptr -> | t | y | p | e | enum type
                    +---+---+---+---+
                    | c | a | r |   | object *
                    +---+---+---+---+
                    | c | d | r |   | object *
                    +---+---+---+---+

The type field has offset 0, car is 4, cdr is 8. To access the car field, all the compiler needs to do is add 4 to the pointer to the structure.

If the pointer was cast to a pointer to an object:

                    +---+---+---+---+
((object *)ptr)  -> | t | y | p | e | enum type
                    +---+---+---+---+
                    | c | a | r |   |
                    +---+---+---+---+
                    | c | d | r |   |
                    +---+---+---+---+

All the compiler needs to know is that there is a field called type with offset 0. Whatever is in memory is in memory.

The pointers don't even have to be related whatsoever. You can have a pointer to an int and cast it to a pointer to an cons_object. If you were to access the car field, it's just like any ordinary memory access. It has a certain offset from the beginning of the structure. In this case, what is in that memory location is unknown but that's unimportant. To access a field, only the offset is needed and that information is found in the type's definition.

A pointer to an int points to a block of memory:

                        +---+---+---+---+
int             *ptr -> | i | n | t |   | int
                        +---+---+---+---+

Casted to a cons_object pointer:

                        +---+---+---+---+
((cons_object *)ptr) -> | i | n | t |   | enum type
                        +---+---+---+---+
                        | X | X | X | X | object *
                        +---+---+---+---+
                        | X | X | X | X | object *
                        +---+---+---+---+
Whisper answered 22/9, 2010 at 4:37 Comment(1)
"but a pointer to pointer cast performs no conversion (in the same way int to float is). It is simply a reinterpretation of the bits that they point to (all for the compiler's benefit)" That's false. Pointers to different types can have different representation, they don't even have to be the same size. c-faq.com/null/machexamp.htmlConstrictor
B
17

Using separate structs violates the strict aliasing rule and is undefined behaviour: http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

Using an embedded struct as in Dean's last example is fine.

Bernardo answered 22/9, 2010 at 6:58 Comment(1)
AFAIK, this answer is correct and the (currently) accepted answer is wrong. See also #47711085Sergius

© 2022 - 2024 — McMap. All rights reserved.