Using offsetof to access struct member
Asked Answered
C

2

14

I have the following code:

#include <stddef.h>

int main() {
  struct X {
    int a;
    int b;
  } x = {0, 0};

  void *ptr = (char*)&x + offsetof(struct X, b);

  *(int*)ptr = 42;

  return 0;
}

The last line performs indirect access to x.b.

Is this code defined according to any of C standards?

I know that:

  • *(char*)ptr = 42; is defined though only implementation defined.
  • ptr == (void*)&x.b

I guess that accessing data pointed by ptr via int*does not violate strict aliasing rule but I'm not fully sure that the standard guarantees that.

Carpo answered 29/10, 2021 at 16:13 Comment(3)
x.b is an object with effective (and declared) type int and its stored value is accessed by an lvalue expression of type int, so that is perfectly legitimate.Spartacus
Isn't it the point of offsetof macro?Heinz
@EugeneSh. The problem is if it works for accessing via int type, not only char.Carpo
B
14

Yes, this is perfectly well defined, and is exactly how offsetof is intended to be used. You do the pointer arithmetic on a pointer to character type, so that it is done in bytes, and then cast back to the actual type of the member.

You can see for instance 6.3.2.3 p7 (all references are to C17 draft N2176):

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

So (char *)&x is a pointer to x converted to a pointer to char, therefore it points to the lowest addressed byte of x. When we add offsetof(struct X, b) (say it's 4) then we have a pointer to byte 4 of x. Now offsetof(struct X, b) is defined to return

the offset in bytes, to the structure member, from the beginning of its structure [7.19p3]

so 4 is in fact the offset from the beginning of x to x.b. Hence byte 4 of x is the lowest byte of x.b, and that's what ptr points to; in other words, ptr is a pointer to x.b, but of type char *. When we cast it back to int *, we have a pointer to x.b which is of the type int *, exactly the same as we would get from the expression &x.b. So dereferencing this pointer accesses x.b.


A question arose in the comments about this last step: when ptr is cast back to int *, how do we know we indeed have a pointer to the int x.b? This is a bit less explicit in the standard but I think it is the obvious intent.

However, I think we can also derive it indirectly. Hopefully we agree that ptr above is a pointer to the lowest addressed byte of x.b. Now by the same passage of 6.3.2.3 p7 quoted above, taking a pointer to x.b and converting it to char *, as in (char *)&x.b, would also yield a pointer to the lowest addressed byte of x.b. As they are pointers of the same type which point to the same byte, they are the same pointer: ptr == (char *)&x.b.

Then we look at the preceding sentences of 6.3.2.3 p7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

There are no problems with alignment here, because char has the weakest alignment requirement (6.2.8 p6). So converting (char *)&x.b back to int * must recover a pointer to x.b, i.e. (int *)(char *)&x.b == &x.b.

But ptr is the same pointer as (char *)&x.b, so we may substitute them in this equality: (int *)ptr == &x.b.

Obviously *&x.b produces an lvalue designating x.b (6.5.3.2 p4), hence so does *(int *)ptr.


There is no problem with strict aliasing (6.5p7). First, determine the effective type of x.b using 6.5p6:

The effective type of an object for an access to its stored value is the declared type of the object, if any. [Then explanations on what to do if it doesn't have a declared type.]

Well, x.b does have a declared type, which is int. So its effective type is int.

Now to see if the access is legal under strict aliasing, see 6.5p7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

— a type compatible with the effective type of the object,

[more options not relevant here]

We are accessing x.b through the lvalue expression *(int *)ptr, which has type int. And int is compatible with int per 6.2.7p1:

Two types have compatible type if their types are the same. [Then other conditions under which they may also be compatible].


An example of this same technique that maybe is more familiar is indexing into an array by bytes. If we have

int arr[100];
*(int *)((char *)arr + (17 * sizeof(int))) = 42;

then this is equivalent to arr[17] = 42;.

This is how generic routines like qsort and bsearch are implemented. If we try to qsort an array of int, then within qsort all the pointer arithmetic is done in bytes, on pointers to character type with the offsets manually scaled by the object size passed as an argument (which here would be sizeof(int)). When qsort needs to compare two objects, it casts them to const void * and passes them as arguments to the comparator function, which casts them back to const int * to do the comparison.

This all works fine and is clearly an intended feature of the language. So I think we needn't doubt that the use of offsetof in the current question is similarly an intended feature.

Barringer answered 12/11, 2021 at 0:31 Comment(9)
You cite a very important rule: When a pointer to an object is converted to a pointer to a character type ..., could you cite the rule about the reverse conversion, from a pointer to character type pointing to a byte to pointer to some other type, guaranteeing that the pointer will point to the object of the specified type? Without this rule, offsetof is useless.Traffic
@LanguageLawyer, not completely useless, offsetof could be still be used by accessing objects via char* or by memcpy. However, it's usefulness would be severely limitedCarpo
@LanguageLawyer, as you said. The crucial part is if shifted char* can be cast to int* and dereferenced without invoking UBCarpo
I added a section to address this point.Barringer
I not sure about the argument that if two pointers a and b are equal (i.e. a == b) then a can be replaced by b in other expressions. For example, assuming that there is no padding then C standard guarantees that &x.a + 1 == &x.b. However, only &x.b can be dereferenced while dereferencing &x.a + 1 invokes UB.Carpo
Re “But ptr is the same pointer as (char *)&x.b”: I agree with @Carpo here: We know that ptr == (char *) &x.b, but that does not mean they are the same. E.g., in floating-point, -0. == +0., but signbit(-0.) != signbit(+0.). Theoretically, a pointer could be, say, a thousand bytes and contain not only a memory address but also provenance information. I expect any reasonable compiler would support using ptr to access x.b, but I cannot say a super strict reading of the C standard specifies it.Cythera
Okay, well, it's the best I've got. I'm still convinced that the intent of the standard is that this is required to work. Unless someone can find a better proof, then maybe a defect report is in order.Barringer
To my understanding is that your argument is that if *(int*)ptr was not a valid designator of x.b then *(int*)(char*)&x.b would not be either. I must admit that the implication sounds absurd. Especially since *(int*)(void*)&x.b is valid and void* has the same representation as char*.Carpo
I guess that your answer is right. The only missing part is citing port70.net/~nsz/c/c11/n1570.html#6.5.9p6. It basically says that if two pointer are equal then exactly one of 4 condition must be met: 1. Both NULL (not this case). 2. The same object . 3. One past the same object (not this case). 4. One past object, other the next object (not this case). So by elimination both (int*)ptr and &x.b must point to the same int object.Carpo
C
3

I believe that this is perfectly legal; in fact, I've just encountered a similar technique used in a book I'm reading (not that it matters).

Here's why I think this is legal:

void *ptr = (char*)&x + offsetof(struct X, b);

First, x was dereferenced into a pointer to struct, but if we use its raw type for pointer arithmetic, every time we increase &x by 1 the value actually increases an amount equal to sizeof(struct X). Since offsetof returns a value which is a distance in bytes from the beginning of the struct, we need to convert &x into a compatible pointer to a byte-sized type, in this case char *. Since a char is always defined to be 1 byte, when we increase a char * by 1 we will advance 1 byte. This is why it is specifically called out in Section 6.5 Expressions:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

The result of this is now a pointer to the start of x.b in the type of char *, and it is perfectly aligned, therefore no undefined behavior invoked here. Why? because offsetof returns a distance in bytes from the beginning, and we have been doing byte-wise arithmetic on the pointer through the char * cast, the result should be pointing at exactly the beginning of b.

Since we've reached the start of the object we want, we don't need the result to be in the type char * anymore. The result will be casted to a generic pointer void * ptr now, to be cast into int * later before dereferencing it to give us access to x.b.

Since b is an int, and we in the end have a *(int*) which evaluates to an int type, we are following the standard under the "a type compatible with the effective type of the object" clause above (or one of the other ones; please correct me if I'm wrong).

Collodion answered 13/11, 2021 at 3:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.