How does the C offsetof macro work? [duplicate]
Asked Answered
R

4

20

Possible Duplicate:
Why does this C code work?
How do you use offsetof() on a struct?

I read about this offsetof macro on the Internet, but it doesn't explain what it is used for.

#define offsetof(a,b) ((int)(&(((a*)(0))->b)))

What is it trying to do and what is the advantage of using it?

Ri answered 26/10, 2011 at 1:47 Comment(1)
That offsetof macro is incorrect. They should cast to size_t, not int, and they should probably subtract (char*)0 from the result before casting even though it's a null pointer constant.Deficiency
K
19

It has no advantages and should not be used, since it invokes undefined behavior (and uses the wrong type - int instead of size_t).

The C standard defines an offsetof macro in stddef.h which actually works, for cases where you need the offset of an element in a structure, such as:

#include <stddef.h>

struct foo {
    int a;
    int b;
    char *c;
};

struct struct_desc {
    const char *name;
    int type;
    size_t off;
};

static const struct struct_desc foo_desc[] = {
    { "a", INT, offsetof(struct foo, a) },
    { "b", INT, offsetof(struct foo, b) },
    { "c", CHARPTR, offsetof(struct foo, c) },
};

which would let you programmatically fill the fields of a struct foo by name, e.g. when reading a JSON file.

Koloski answered 26/10, 2011 at 1:55 Comment(12)
I am sorry - how does the offsetof macro cause undefined behavior especially since its was defined in the C standard?Londrina
The standard offsetof macro from stddef.h does not invoke UB. Defining your own hack to compute offsets this way does invoke UB.Koloski
Please quote me the standard reference that says defining your own version of the macro causes undefined behaviourLondrina
@Adrian: He didn't say, "defining your own version of the macro causes undefined behaviour." He specifically said, "Defining your own hack to compute offsets this way does invoke UB." In the code, at this point: ((a*)(0))-> you've invoked undefined behavior by dereferencing null.Fusco
@GMan - where the hell have you referenced null its casting null as a a* pointer. And "Defining your own hack" is that a technical term for some code that I dont know after 20 years of C programming? let see how linux defines it #ifndef offsetof # define offsetof(T,F) ((unsigned int)((char *)&((T *)0L)->F - (char *)0L)) #endif Hmm look very similar to OPLondrina
@Adrian: x-> is defined to be (*x).. In our case x is (a*)0, and *x dereferences null. And congrats: after twenty years of C you still don't know what implementation-defined behavior is. Quoting a specific definition on a specific implementation at a specific time has nothing to do with the language definition of the macro. The language states the effects of the macro and that's it, it doesn't define an implementation. I mean hell, you quoted the standard yourself; where in there does is state the definition of the macro?Fusco
@AdrianCornish: The implementation is allowed to define offsetof however it likes as long as it implements the correct behavior. Your application does not have this privilege because it can't define the behavior of anything; it can only use already-defined language constructs. That's how C works.Koloski
6.5.2.3 does not use the word "dereference", but specifies it as "the named member of the object to which the first expression points". Since (a*)(0) does not point to an object of type a, the behavior is undefined (by virtue of not being defined).Koloski
[I love language standard debates] Which subclause (are you using c99) I cannot find one. I would argue: 6.3.2.3 Pointers 1 A pointer to void may be converted to or from a pointer to any incomplete or object type. A pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.Londrina
The text you cited is irrelevant. No pointer to incomplete or object type is converted to a pointer to void in the bogus macro.Koloski
@R.. what is the type of foo_desc here? Did you mean food_desc[] instead?Clasp
I added the include directive because it just felt right after this edit suggestion got rejected.Slob
C
47

R.. is correct in his answer to the second part of your question: this code is not advised when using a modern C compiler.

But to answer the first part of your question, what this is actually doing is:

(
  (int)(         // 4.
    &( (         // 3.
      (a*)(0)    // 1.
     )->b )      // 2.
  )
)

Working from the inside out, this is ...

  1. Casting the value zero to the struct pointer type a*
  2. Getting the struct field b of this (illegally placed) struct object
  3. Getting the address of this b field
  4. Casting the address to an int

Conceptually this is placing a struct object at memory address zero and then finding out at what the address of a particular field is. This could allow you to figure out the offsets in memory of each field in a struct so you could write your own serializers and deserializers to convert structs to and from byte arrays.

Of course if you would actually dereference a zero pointer your program would crash, but actually everything happens in the compiler and no actual zero pointer is dereferenced at runtime.

In most of the original systems that C ran on the size of an int was 32 bits and was the same as a pointer, so this actually worked.

Clambake answered 26/10, 2011 at 2:15 Comment(1)
Excellent! Thank you. The key to me was placing a struct object at memory address zero and then finding out at what the address of a particular field is.Sluggish
K
19

It has no advantages and should not be used, since it invokes undefined behavior (and uses the wrong type - int instead of size_t).

The C standard defines an offsetof macro in stddef.h which actually works, for cases where you need the offset of an element in a structure, such as:

#include <stddef.h>

struct foo {
    int a;
    int b;
    char *c;
};

struct struct_desc {
    const char *name;
    int type;
    size_t off;
};

static const struct struct_desc foo_desc[] = {
    { "a", INT, offsetof(struct foo, a) },
    { "b", INT, offsetof(struct foo, b) },
    { "c", CHARPTR, offsetof(struct foo, c) },
};

which would let you programmatically fill the fields of a struct foo by name, e.g. when reading a JSON file.

Koloski answered 26/10, 2011 at 1:55 Comment(12)
I am sorry - how does the offsetof macro cause undefined behavior especially since its was defined in the C standard?Londrina
The standard offsetof macro from stddef.h does not invoke UB. Defining your own hack to compute offsets this way does invoke UB.Koloski
Please quote me the standard reference that says defining your own version of the macro causes undefined behaviourLondrina
@Adrian: He didn't say, "defining your own version of the macro causes undefined behaviour." He specifically said, "Defining your own hack to compute offsets this way does invoke UB." In the code, at this point: ((a*)(0))-> you've invoked undefined behavior by dereferencing null.Fusco
@GMan - where the hell have you referenced null its casting null as a a* pointer. And "Defining your own hack" is that a technical term for some code that I dont know after 20 years of C programming? let see how linux defines it #ifndef offsetof # define offsetof(T,F) ((unsigned int)((char *)&((T *)0L)->F - (char *)0L)) #endif Hmm look very similar to OPLondrina
@Adrian: x-> is defined to be (*x).. In our case x is (a*)0, and *x dereferences null. And congrats: after twenty years of C you still don't know what implementation-defined behavior is. Quoting a specific definition on a specific implementation at a specific time has nothing to do with the language definition of the macro. The language states the effects of the macro and that's it, it doesn't define an implementation. I mean hell, you quoted the standard yourself; where in there does is state the definition of the macro?Fusco
@AdrianCornish: The implementation is allowed to define offsetof however it likes as long as it implements the correct behavior. Your application does not have this privilege because it can't define the behavior of anything; it can only use already-defined language constructs. That's how C works.Koloski
6.5.2.3 does not use the word "dereference", but specifies it as "the named member of the object to which the first expression points". Since (a*)(0) does not point to an object of type a, the behavior is undefined (by virtue of not being defined).Koloski
[I love language standard debates] Which subclause (are you using c99) I cannot find one. I would argue: 6.3.2.3 Pointers 1 A pointer to void may be converted to or from a pointer to any incomplete or object type. A pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.Londrina
The text you cited is irrelevant. No pointer to incomplete or object type is converted to a pointer to void in the bogus macro.Koloski
@R.. what is the type of foo_desc here? Did you mean food_desc[] instead?Clasp
I added the include directive because it just felt right after this edit suggestion got rejected.Slob
S
6

It's finding the byte offset of a particular member of a struct. For example, if you had the following structure:

struct MyStruct
{
    double d;
    int i;
    void *p;
};

Then you'd have offsetOf(MyStruct, d) == 0, offsetOf(MyStruct, i) == 8, and offsetOf(MyStruct, p) == 12 (that is, the member named d is 0 bytes from the start of the structure, etc.).

The way that it works is it pretends that an instance of your structure exists at address 0 (the ((a*)(0)) part), and then it takes the address of the intended structure member and casts it to an integer. Although dereferencing an object at address 0 would ordinarily be an error, it's ok to take the address because the address-of operator & and the member dereference -> cancel each other out.

It's typically used for generalized serialization frameworks. If you have code for converting between some kind of wire data (e.g. bytes in a file or from the network) and in-memory data structures, it's often convenient to create a mapping from member name to member offset, so that you can serialize or deserialize values in a generic manner.

Schuller answered 26/10, 2011 at 1:58 Comment(1)
The question is C, we still have to use struct MyStruct. ;)Deficiency
L
-3

The implementation of the offsetof macro is really irrelevant.

The actual C standard defines it as in 7.17.3:

offsetof(type, member-designator)

which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The type and member designator shall be such that given static type t;.

Trust Adam Rosenfield's answer.

R is completely wrong, and it has many uses - especially being able to tell when code is non-portable among platforms.

(OK, it's C++, but we use it in static template compile time assertions to make sure our data structures do not change size between platforms/versions.)

Londrina answered 26/10, 2011 at 2:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.