How do one use `offsetof` to access a field in a standard conforming way?
Asked Answered
B

2

10

Let's suppose I have a struct and extract the offset to a member:

struct A {
    int x;
};

size_t xoff = offsetof(A, x);

how can I, given a pointer to struct A extract the member in a standard conforming way? Assuming of course that we have a correct struct A* and a correct offset. One attempt would be to do something like:

int getint(struct A* base, size_t off) {
    return *(int*)((char*)base + off); 
}

Which probably will work, but note for example that pointer arithmetics only seem to be defined in the standard if the pointers are pointers of the same array (or one past the end), this need not be the case. So technically that construct would seem to rely on undefined behaviour.

Another approach would be

int getint(struct A* base, size_t off) {
    return *(int*)((uintptr_t)base + off);
}

which also probably would work, but note that intptr_t is not required to exist and as far as I know arithmetics on intptr_t doesn't need to yield the correct result (for example I recall some CPU has the capability to handle non-byte aligned addresses which would suggest that intptr_t increases in steps of 8 for each char in an array).

It looks like there's something forgotten in the standard (or something I've missed).

Bacteria answered 24/5, 2016 at 11:57 Comment(5)
I'm pretty sure aliasing to char* and pointers that point into the same object (not necessarily array) are both valid. Waiting for an authoritative answer though.Crayton
(char *)base can be used to move around anywhere within base (and one past the end). Any object behaves like an array of size 1.Jackquelinejackrabbit
return *(int*)((char*)base + off); can readily fail as int access may be unaligned. E.g. int access can cause bus fault on odd address. OTOH OP did say "Assuming ... we have a correct struct A* and a correct offset"Mme
Best to access a field with with that field's type or unsigned char (No traps, no padding).Mme
It is not clear is why does not code use A->x to access the field. What do you what to do that A->x does not provide? If all code has is A and the offset to field x, lack of the field type/size prevents accessing in a conforming manner.Mme
C
3

Per the C Standard, 7.19 Common definitions <stddef.h>, paragraph 3, offsetof() is defined as:

The macros are

NULL

which expands to an implementation-defined null pointer constant; and

offsetof(*type*, *member-designator*)

which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type).

So, offsetoff() returns an offset in bytes.

And 6.2.6.1 General, paragraph 4 states:

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes.

Since CHAR_BIT is defined as the number of bits in a char, a char is one byte.

So, this is correct, per the standard:

int getint(struct A* base, size_t off) {
    return *(int*)((char*)base + off); 
}

That converts base to a char * and adds off bytes to the address. If off is the result of offsetof(A, x);, the resulting address is the address of x within the structure A that base points to.

Your second example:

int getint(struct A* base, size_t off) {
    return *(int*)((intptr_t)base + off);
}

is dependent upon the result of the addition of the signed intptr_t value with the unsigned size_t value being unsigned.

Compendious answered 24/5, 2016 at 12:46 Comment(8)
The cited parts are quite irrelevant. The relevant parts of the standard would be the ones in 6.5 concerning pointer aliasing, or perhaps the parts regarding pointer arithmetic. I don't see how the second example would fail. intptr_t is an unsigned integer type, not a pointer type. It does not do any pointer arithmetic so your assumptions are incorrect.Geoid
@Geoid - Yes, you're right. For some reason, I read intptr_t as int *. Revising the answer now, but first I need to think about what happens if intptr_t is signed.Compendious
Agree with @Geoid except intptr_t is a signed integer type vs. uintptr_tMme
@chux - Yes, it's signed. But I'm trying to remember what happens when adding a signed and unsigned int. The OP's code may be valid.Compendious
@Andrew Henle Adding signed type to equal rank unsigned type --> unsigned type.Mme
@chux Yes. I've spent most of my programming life trying to avoid relying on such conversion rules, so I don't know them offhand.Compendious
@Geoid Also, I don't know why you think the sections I quoted are irrelevant. They support the OP's method of using a char * and offset from offsetof() to find the address of a structure's field. There may very well be other sections that also support the OPs method.Compendious
The OP seems to know how offsetof works. Main concerns here is if you can do pointer arithmetic on a struct and pointer conversions. And regarding implicit conversion rules: they are there whether you like them or not. Not knowing about them might cause you to write code that relies upon them, by accident.Geoid
G
0

The reason why the standard (6.5.6) only allows pointer arithmetic for arrays, is that structs may have padding bytes to sate alignment requirements. So doing pointer arithmetic inside a struct is indeed formally undefined behavior.

In practice, it will work as long as you know what you are doing. base + off cannot fail, because we know that there is valid data there and it is not misaligned, given that it is accessed properly.

Therefore (intptr_t)base + off is indeed much better code, as there is no longer any pointer arithmetic, but just plain integer arithmetic. Because intptr_t is an integer, it is not a pointer.

As pointed out in a comment, this type is not guaranteed to exist, it is optional as per 7.20.1.4/1. I suppose for maximum portability, you could switch to other types that are guaranteed to exist, such as intmax_t or ptrdiff_t. It is however arguable if a C99/C11 compiler without support for intptr_t is at all useful.

(There is a small type issue here, namely that intptr_t is a signed type, and not necessarily compatible with size_t. You might get implicit type promotion issues. It is safer to use uintptr_t if possible.)

The next question then is if *(int*)((intptr_t)base + off) is well-defined behavior. The part of the standard regarding pointer conversions (6.3.2.3) says that:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

For this specific case, we know that we have a correctly aligned int there, so it is fine.

(I don't believe that any pointer aliasing concerns apply either. At least compiling with gcc -O3 -fstrict-aliasing -Wstrict-aliasing=2 doesn't break the code.)

Geoid answered 24/5, 2016 at 13:49 Comment(2)
"Because intptr_t is an integer, ... is guaranteed to exist ... compiler (C99/C11)" --> "intptr_t ... uintptr_t These types are optional." §7.20.1.4 1Mme
@chux Ah, I learnt something new then! :) Will edit the answer, thanks.Geoid

© 2022 - 2024 — McMap. All rights reserved.