Pointer difference across members of a struct?
Asked Answered
A

5

8

The C99 standard states that:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object

Consider the following code:

struct test {
    int x[5];
    char something;
    short y[5];
};

...

struct test s = { ... };
char *p = (char *) s.x;
char *q = (char *) s.y;
printf("%td\n", q - p);

This obviously breaks the above rule, since the p and q pointers are pointing to different "array objects", and, according to the rule, the q - p difference is undefined.

But in practice, why should such a thing ever result in undefined behaviour? After all, the struct members are laid out sequentially (just as array elements are), with any potential padding between the members. True, the amount of padding will vary across implementations and that would affect the outcome of the calculations, but why should that result be "undefined"?

My question is, can we suppose that the standard is just "ignorant" of this issue, or is there a good reason for not broadening this rule? Couldn't the above rule be rephrased to "both shall point to elements of the same array object or members of the same struct"?

My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?

I also suspect that this is the reason why GCC defines its own __builtin_offsetof, in order to have a "standards compliant" definition of the offsetof macro.

EDIT:

As already pointed out, arithmetic on void pointers is not allowed by the standard. It is a GNU extension that fires a warning only when GCC is passed -std=c99 -pedantic. I'm replacing the void * pointers with char * pointers.

Ames answered 3/11, 2014 at 9:54 Comment(5)
and in your example pointer arithmetic on void is prohibited anyway. if the types arent the same, how can you subtract them?Suspensory
You are correct that arithmetic on void pointers is not allowed by the standard and it is a GNU extension. Imagine that both pointers are char * then.Ames
gcc allows pointer arithmetic on void* by treating sizeof(void) as 1 (same as a char*). So for the purposes of your question, it makes no difference.Satiable
This is necessary to allow bounds-checking implementations, which is, afaik, the intent of the standard committee (or at least was for C89) to allow it. I believe an implementation checking bounds reliably can (must) catch this case (that is, it is UB, although it works in reality). Such an implementation would break a lot of existing code, though. And the standard is a little vague about its notion of an object which makes it hard to get an exact answer.Tercel
@mafso: In the absence of aliasing rules, every object could be viewed as a member of a union which contained a member of every type that could occupy the space. Given struct {int x, y;} foo;, if for some integer value n, the address of foo.y would equal ((int*)&foo)+n, then &foo.x and &foo.y would be the addresses of elements 0 and n of an array of integers that starts at address &foo. Unfortunately, the authors of the Standard ham-fistedly threw in aliasing rules that depend upon details of "objects" which they never defined because the language had no need for them.Vogue
L
3

Subtraction and relational operators (on type char*) between addresses of member of the same struct are well defined.

Any object can be treated as an array of unsigned char.

Quoting N1570 6.2.6.1 paragraph 4:

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

...

My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?

No. For a system with a segmented memory architecture, normally the compiler will impose a restriction that each object must fit into a single segment. Or it can permit objects that occupy multiple segments, but it still has to ensure that pointer arithmetic and comparisons work correctly.

Last answered 3/11, 2014 at 16:33 Comment(8)
+1 I think answer manages to get to point better than my answer does.Aviculture
I'm not convinced. If a pointer into an object could always be treated as a pointer into the outermost enclosing object, for example the struct hack would also be legal...Tercel
@Tercel Well, in standard there is entry in index: struct hack, see flexible array member...Aviculture
@user694733: Flexible array members were added to the standard in 1999 as a replacement for the struct hack, which was of questionable legality.Last
@KeithThompson I know. I just had to. :)Aviculture
@mafso: It's too bad the Standard made zero-size arrays a constraint violation. If it had instead said that a zero-sized array within a struct inserts padding necessary to force the proper alignment and then yields the resulting address without allocating anything, that would have offered more useful semantics than flexible array members, but on most implementations the feature would cost nothing (just change the minimum array bound from 1 to 0). The struct hack could then be justified by saying that the maximum subscript for an array, in a large-enough allocation, is (len-(size_t)1).Vogue
@supercat: We have flexible array members instead. They work. Deal with it.Last
@KeithThompson: They only work for objects of allocated duration; there's no standard way to create a structure of static or automatic duration which is compatible with a function that expects a structure with an FAM.Vogue
S
2

Pointer arithmetic requires that the two pointers being added or subtracted to be part of the same object because it doesn't make sense otherwise. The quoted section of standard specifically refers to two unrelated objects such as int a[b]; and int b[5]. The pointer arithmetic requires to know the type of the object that the pointers pointing to (I am sure you are aware of this already).

i.e.

int a[5];
int *p = &a[1]+1; 

Here p is calculated by knowing the that the &a[1] refers to an int object and hence incremented to 4 bytes (assuming sizeof(int) is 4).

Coming to the struct example, I don't think it can possibly be defined in a way to make pointer arithmetic between struct members legal.

Let's take the example,

struct test {
    int x[5];
    char something;
    short y[5];
};

Pointer arithmatic is not allowed with void pointers by C standard (Compiling with gcc -Wall -pedantic test.c would catch that). I think you are using gcc which assumes void* is similar to char* and allows it. So,

printf("%zu\n", q - p);

is equivalent to

printf("%zu", (char*)q - (char*)p);

as pointer arithmetic is well defined if the pointers point to within the same object and are character pointers (char* or unsigned char*).

Using correct types, it would be:

struct test s = { ... };
int *p = s.x;
short *q = s.y;
printf("%td\n", q - p);

Now, how can q-p be performed? based on sizeof(int) or sizeof(short) ? How can the size of char something; that's in the middle of these two arrays be calculated?

That should explain it's not possible to perform pointer arithmetic on objects of different types.

Even if all members are of same type (thus no type issue as stated above), then it's better to use the standard macro offsetof (from <stddef.h>) to get the difference between struct members which has the similar effect as pointer arithmetic between members:

printf("%zu\n", offsetof(struct test, y) - offsetof(struct test, x));

So I see no necessity to define pointer arithmetic between struct members by the C standard.

Satiable answered 3/11, 2014 at 10:47 Comment(5)
Good answer, but you're forgetting that it is allowed when the pointers are both of type char* and point into the same object. Without that, it's impossible to define offsetof.Kiln
Of course. But I am not sure where I contradict that or imply?Satiable
You're not directly contradicting it, but it is an important "loophole" to mention, IMHO.Kiln
TBH, I struggled where I could logically include it other than as a disjoint fact after your comment. Edited. Thanks.Satiable
Pointer arithmetic and relational operators (< <= > >=) between pointers to distinct objects doesn't necessarily not make sense. The language could have made the result unspecified rather than undefined behavior, and required it to behave consistently (so that &x < &y && &y < &z implies &x < &z, and so forth). And on many systems, it actually does work that way. The standard made such operations undefined because they can be difficult to implement consistently on some architectures, and because that extra implementation effort wouldn't buy you anything particularly useful.Last
A
1

Yes, you are allowed to perform pointer arithmetric on structure bytes:

N1570 - 6.3.2.3 Pointers p7:

... When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

This means that for the programmer, bytes of the stucture shall be seen as a continuous area, regardless how it may have been implemented in the hardware.

Not with void* pointers though, that is non-standard compiler extension. As mentioned on paragraph from the standard, it applies only to character type pointers.

Edit:

As mafso pointed out in comments, above is only true as long as type of substraction result ptrdiff_t, has enough range for the result. Since range of size_t can be larger than ptrdiff_t, and if structure is big enough, it's possible that addresses are too far apart.

Because of this it's preferable to use offsetof macro on structure members and calculate result from those.

Aviculture answered 3/11, 2014 at 10:27 Comment(16)
+1, I also believe that the word "elements" in the rule I quoted is used to distinguish between mere char * pointers and properly aligned pointers of the type that corresponds to the array elements.Ames
This answer seems to imply that padding could be considered an object which I highly doubt was intended.Saccharin
This answer misses the point. 1. If p + n compares equal to q, this does not imply that q - p is defined. 2. The important thing is, what's the object here (the whole structure or just the member). I tend to interpret the standard in a way saying the latter.Tercel
@ShafikYaghmour This clause from the standard seems only to cover the fact you can get valid pointer this way. Whether or not it's possible to dereference said pointer is covered elsewhere in the standard.Aviculture
@Tercel Are you saying that result of (char*)&s + offsetof(s, s.y) == q - p is not necessarily true? Rules for pointer arithmetric are rather strict, and I cannot see how implementation could do so without breaking the rules.Aviculture
ptrdiff_t is allowed to be way smaller than size_t (the standard doesn't say anything about their relation), so it is possible, that the difference is just undefined (if q-p is defined, the result is as expected). (I've asked a question about this some time ago.) For structure smaller than 2**15 this is always defined, and it's not really related to the problem here, my point was, that the implication "If p+n is defined, so is p+n-n" is not true (which is silently assumed in your post).Tercel
@Tercel You are right about possible ptrdiff_t overflow, I have edited the answer to accomodate that.Aviculture
Is a pointer to padding a valid pointer? Can you justify that claim?Saccharin
As I said, the ptrdiff_t issue isn't really related, it was just a counter-example to the wrong implication "p+n defined => p+n-n defined", which you still assume.Tercel
@ShafikYaghmour Paragraph I citated said that you will get pointers remaining to bytes of the object up to the size of the object. And sizeof(anyObject) equals full size of an object, including padding. So to me it seems pretty clear that pointer is valid in sense that it is a valid address for pointer arithmetric. Not in sense that it would be necessarily safe to dereference.Aviculture
@Tercel What are you trying to say? If we keep within limits of our types, then p+n-n is well defined.Aviculture
@user694733: Sure it is well-defined (if within limits and p+n was valid (and actually used!)), but that's the actual point here (and defined in 6.5.6 p9). And it is only defined for pointers within the same array object. And that's the question here: What is the object? The member (a pointer to which is converted to char * and in which case taking the difference is UB) or the containing structure (what meets common interpretation, but is nowhere defined in the standard, as far as I know)?Tercel
@Tercel My interpretation is that object is the containing structure. 3.15 describes object as "region of data storage..." "...contents of which can represent values". Structure fits this description. Also the original pointers to arrays are gained throught struct (with syntax s.x). If we would take char* pointer of s, the 6.5.6p7 says that ptr to object can be treated as ptr to first member of array. 6.3.2.3p7 guarantees that for purposes of pointer arithmetric, this area can be considered continuous. In other words, you can treat the memory region of the structure as a byte array. ...Aviculture
@Tercel ...So there seems to be no evindence that 6.3.2.3p8 and p9 couldn't be applied in this case. Of course, if x and y were separate arrays, this would of course not work, but since they are contained in the same struct, previously mentioned chapters give the safe guarantee. Standard doesn't explicitly mention this case as being UB (which admittedly isn't much of a quarantee), so there is not really evidence to prove otherwise. Sorry for the wall of text :)Aviculture
The notion of object is also interesting for aliasing rules, see e.g. this question about restrict. And, as I mentioned above, a bounds-checking implementation is the next problem (see e.g. here and here). Further discussion is probably better in chat.Tercel
@Tercel I only skimmed through the your links, but from answer and their comments it would seem that the bounds-checking comes into play in dereferencing stage. In any case, I'll read more on this, and you are right; further discussion should be on chat. Let's leave this for now. Luckily I haven't had, and don't have in foreseeable future, need to access structures other than the usual safe way, so I'm not in a hurry to solve this. :)Aviculture
S
1

I believe the answer to this question is simpler than it appears, the OP asks:

but why should that result be "undefined"?

Well, let's see that the definition of undefined behavior is in the draft C99 standard section 3.4.3:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

it is simply behavior for which the standard does not impose a requirement, which perfectly fits this situation, the results are going to vary depending on the architecture and attempting to specify the results would have probably been difficult if not impossible in a portable manner. This leaves the question, why would they choose undefined behavior as opposed to let's say implementation of unspecified behavior?

Most likely it was made undefined behavior to limit the number of ways an invalid pointer could be created, this is consistent with the fact that we are provided with offsetof to remove the one potential need for pointer subtraction of unrelated objects.

Although the standard does not really define the term invalid pointer, we get a good description in Rationale for International Standard—Programming Languages—C which in section 6.3.2.3 Pointers says (emphasis mine):

Implicit in the Standard is the notion of invalid pointers. In discussing pointers, the Standard typically refers to “a pointer to an object” or “a pointer to a function” or “a null pointer.” A special case in address arithmetic allows for a pointer to just past the end of an array. Any other pointer is invalid.

The C99 rationale further adds:

Regardless how an invalid pointer is created, any use of it yields undefined behavior. Even assignment, comparison with a null pointer constant, or comparison with itself, might on some systems result in an exception.

This strongly suggests to us that a pointer to padding would be an invalid pointer, although it is difficult to prove that padding is not an object, the definition of object says:

region of data storage in the execution environment, the contents of which can represent values

and notes:

When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.

I don't see how we can reason about the type or the value of padding between elements of a struct and therefore they are not objects or at least is strongly indicates padding is not meant to be considered an object.

Saccharin answered 3/11, 2014 at 15:35 Comment(1)
I don't see how pointer to padding could be invalid pointer. Padding is not an object, but part of an object. After all, standard guarantees that padding will exist, only the value of it is unspecified (6.2.6.1p1). See Keith Thompsons answer.Aviculture
A
0

I should point out the following:

from the C99 standard, section 6.7.2.1:

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

It isn't so much that the result of pointer subtraction between members is undefined so much as it is unreliable (i.e. not guaranteed to be the same between different instances of the same struct type when the same arithmetic is applied).

Ammoniate answered 3/11, 2014 at 10:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.