Can an equality comparison of unrelated pointers evaluate to true?
Asked Answered
K

4

44

Section 6.5.9 of the C standard regarding the == and != operators states the following:

2 One of the following shall hold:

  • both operands have arithmetic type;
  • both operands are pointers to qualified or unqualified versions of compatible types;
  • one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void; or
  • one operand is a pointer and the other is a null pointer constant.

...

6 Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.109)

7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

Footnote 109:

109) Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.

This would seem to indicate you could do the following:

int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);

This should be legal since we are using an address one element past the end of an array (which in this case is a single object treated as an array of size 1) without dereferencing it. More importantly, one of these two statements would be required to output 1 if one variable immediately followed the other in memory.

However, testing didn't seem to pan this out. Given the following test program:

#include <stdio.h>

struct s {
    int a;
    int b;
};

int main()
{
    int a;
    int b;
    int *x = &a;
    int *y = &b;

    printf("sizeof(int)=%zu\n", sizeof(int));
    printf("&a=%p\n", (void *)&a);
    printf("&b=%p\n", (void *)&b);
    printf("x=%p\n", (void *)x);
    printf("y=%p\n", (void *)y);

    printf("addr: a precedes b: %d\n", ((&a)+1) == &b);
    printf("addr: b precedes a: %d\n", &a == ((&b)+1));
    printf("pntr: a precedes b: %d\n", (x+1) == y);
    printf("pntr: b precedes a: %d\n", x == (y+1));

    printf("  x=%p,   &a=%p\n", (void *)(x), (void *)(&a));
    printf("y+1=%p, &b+1=%p\n", (void *)(y+1), (void *)(&b+1));

    struct s s1;
    x=&s1.a;
    y=&s1.b;
    printf("addr: s.a precedes s.b: %d\n", ((&s1.a)+1) == &s1.b);
    printf("pntr: s.a precedes s.b: %d\n", (x+1) == y);
    return 0;
}

Compiler is gcc 4.8.5, system is CentOS 7.2 x64.

With -O0, I get the following output:

sizeof(int)=4
&a=0x7ffe9498183c
&b=0x7ffe94981838
x=0x7ffe9498183c
y=0x7ffe94981838
addr: a precedes b: 0
addr: b precedes a: 0
pntr: a precedes b: 0
pntr: b precedes a: 1
  x=0x7ffe9498183c,   &a=0x7ffe9498183c
y+1=0x7ffe9498183c, &b+1=0x7ffe9498183c
addr: s.a precedes s.b: 1

We can see here that an int is 4 bytes and that the address of a is 4 bytes past the address of b, and that x holds the address of a while y holds the address of b. However the comparison &a == ((&b)+1) evaluates to false while the comparison (x+1) == y evaluates to true. I would expect both to be true as the addresses being compared appear identical.

With -O1, I get this:

sizeof(int)=4
&a=0x7ffca96e30ec
&b=0x7ffca96e30e8
x=0x7ffca96e30ec
y=0x7ffca96e30e8
addr: a precedes b: 0
addr: b precedes a: 0
pntr: a precedes b: 0
pntr: b precedes a: 0
  x=0x7ffca96e30ec,   &a=0x7ffca96e30ec
y+1=0x7ffca96e30ec, &b+1=0x7ffca96e30ec
addr: s.a precedes s.b: 1
pntr: s.a precedes s.b: 1

Now both comparisons evaluate to false even though (as before) the address being compared appear to be the same.

This seems to point to undefined behavior, but based on how I read the above passage it seems this should be allowed.

Note also that the comparison of the addresses of adjacent objects of the same type in a struct prints the expected result in all cases.

Am I misreading something here regarding what is allowed (meaning this is UB), or is this version of gcc non-conforming in this case?

Knives answered 30/8, 2017 at 17:46 Comment(22)
Did you mean (&a + 1) == &b) and (&b + 1) == &a)?Raglan
@Raglan That's correct. The checks are to see if one element past the address of a is the same as the address of b, or if one element past the address of b is the same as the address of a. The extra parenthesis in ((&a)+1) == &b aren't strictly needed.Knives
FWIW, my run produces addr: a precedes b: 0 addr: b precedes a: 1 - a match. (GNU C11 (GCC) version 5.4.0 (i686-pc-cygwin) compiled by GNU C version 5.4.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3). IMO, your compiler is non-compliant.Travesty
It would be interesting to see a binary dump of the values of &a, (&b)+1, not the void* converted values.Travesty
If I am reading the assembly for gcc 7.2 -O3 correctly, it seems that all the "something precedes something" instructions are all optimized away to 0 at compile time (lines 49 - 64 in the assembly).Cataclinal
Suggest trying with the latest gcc ref. Good luck!Travesty
@Groo Interesting. Does this suggest the check is UB?Knives
I guess so, I can't say I read it from the standard. For example this simplified code for -O0 uses lea rax, [rbp-8], add rax, 4, lea rdx, [rbp-4], after which it seems like rax should be equal to rdx. However with -O3 it simply optimized both checks away.Cataclinal
@chux I found the same with gcc 5.4.0, although I still see that 0 is printed for all lines with optimization greater than 0.Knives
FWIW: I changed the code to use arrays for a and binstead of of "just" int. In this way the code fulfil the bolded part of point 6 (i.e. "one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space"). So the pointer should compare in one of the two cases. However, I also got the result that for -O0 one compare as expected. For -O1 (and higher) both fails to compare. I'm not a language lawyer but this seems a compiler bug to me.Polymerization
FWIW: I also tried clang 3.8.0 and it always had one match - regardless of otimization levelPolymerization
As I read it "Can an equality comparison of unrelated pointers evaluate to true?" is certainly true for the case presented here. Yet the "Can an equality comparison of a pointer + 1 and its next higher adjacent pointer evaluate to false?" remains open. This question is not so much of can p+1 == q (when p,q are sequential) as much as must it be true?Travesty
@Knives BTW, thank-you for making a and b the same type. Considering variant types opens up many more rabbit holes.Travesty
@chux Clause 2 specifically disallows different types but allows different qualifiers, i.e. const int *, int *, register int *, etc.Knives
@chux I think the key phrase in addressing the question is "that happens to immediately follow the first array object in the address space" and what exactly this means.Knives
I do read the outcome false as a compiler bug in GCC, no matter how hard I read...Tellurium
See the extended discussion in gcc.gnu.org/bugzilla/show_bug.cgi?id=61502Inutility
@Knives a and b are not assigned. Does assigning them and using their values change the results? a=1; b = 2; .... return a + b;?Travesty
@chux I tried initializing them and printing the values, but no change.Knives
This is duplicate of #36036282 (related: stackoverflow.com/questions/40809553) - I'll close as dupe unless you still have questions not covered by that thread?Radian
@Radian I don't think it qualifies as a dup. That question seems to be primarily discussing objects in different translation units and focuses on conversion to uintptr_t, while this one is regarding objects in the same translation unit. The answers there also don't reference the gcc bugzilla tickets discussing the issue.Knives
BTW, in both clang and gcc, comparisons involving adjacent objects may not only yield inconsistent results, but may also cause operations which are conditionally executed to yield results whose behavior isn't even consistent with that of each individual comparison arbitrarily yielding true or false.Floozy
F
29

Can an equality comparison of unrelated pointers evaluate to true?

Yes, but ...

int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);

There are, by my interpretation of the C standard, three possibilities:

  • a immediately precedes b
  • b immediately precedes a
  • neither a nor b immediately precedes the other (there could be a gap, or another object, between them)

I played around with this some time ago and concluded that GCC was performing an invalid optimization on the == operator for pointers, making it yield false even when the addresses are the same, so I submitted a bug report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63611

That bug was closed as a duplicate of another report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502

The GCC maintainers who responded to these bug reports seem to be of the opinion that adjacency of two objects need not be consistent and that the comparison of their addresses might show them to be adjacent or not, within the same run of the program. As you can see from my comments on the second Bugzilla ticket, I strongly disagree. In my opinion, without consistent behavior of the == operator, the standard's requirements for adjacent objects is meaningless, and I think we have to assume that those words are not merely decorative.

Here's a simple test program:

#include <stdio.h>
int main(void) {
    int x;
    int y;
    printf("&x = %p\n&y = %p\n", (void*)&x, (void*)&y);
    if (&y == &x + 1) {
        puts("y immediately follows x");
    }
    else if (&x == &y + 1) {
        puts("x immediately follows y");
    }
    else {
        puts("x and y are not adjacent");
    }
}

When I compile it with GCC 6.2.0, the printed addresses of x and y differ by exactly 4 bytes at all optimization levels, but I get y immediately follows x only at -O0; at -O1, -O2, and -O3 I get x and y are not adjacent. I believe this is incorrect behavior, but apparently, it's not going to be fixed.

clang 3.8.1, in my opinion, behaves correctly, showing x immediately follows y at all optimization levels. Clang previously had a problem with this; I reported it:

https://bugs.llvm.org/show_bug.cgi?id=21327

and it was corrected.

I suggest not relying on comparisons of addresses of possibly adjacent objects behaving consistently.

(Note that relational operators (<, <=, >, >=) on pointers to unrelated objects have undefined behavior, but equality operators (==, !=) are generally required to behave consistently.)

Fell answered 30/8, 2017 at 22:45 Comment(13)
Interesting that a+1 == b may be true, yet a+1 >= b is UB per my reading of §6.5.9 6 and §6.5.8 5.Travesty
@Keith: "maintainers who responded to these bug reports seem to be of the opinion that adjacency of two objects need not be consistent" - if true, I would find this as yet another reason for the compiler not to optimize the expression away. In this case, compiler is acting like the adjacency is consistently false. Even the standard describes it as an "object that happens to immediately follow the first array", so it's completely unrelated to the fact whether it really follows it or not.Cataclinal
IMHO the standard committee brought just enough rope to hang themselves: paragraph 7 introduces this as-if rule (the same with operator +) but in paragraph 6 they explicitly restrict validity of the one-past-the-end rule to array objects - a wording that under the impression of paragraph 7 now should instead read "is a pointer to the next valid alignment past an object". Does gcc behave the same if you use int x[1]; int y[1]; and &y[0] == &x[0] + 1?Colophon
There are many situations where it is useful to guarantee that applying the equality operator to pointers will yield an equivalence relation. There are also some situations, however, where useful optimizations might be performed if there were a standard way to waive that guarantee with regard to comparisons involving a pointer to an object and a past-one pointer of a different, unrelated, object. The proper way to way to facilitate such optimizations would be to add a means of waiving the guarantee when it isn't needed.Floozy
@Vroomfondel: The Standard clearly specifies that if p and q both point at different objects, they will compare unequal; likewise, if both point just past different objects. The only defined situation where unrelated pointers would be allowed to compare equal is when one points at an object and the other points just past an unrelated object. If the Standard didn't allow that, compilers would be essentially required to add padding after every object. The only question is whether a comparison between two particular pointers might sometimes report them equal and sometimes not.Floozy
@Floozy um, no, the padding wouldn't be necessary as it could simply be defined as UB. IIRC the one-past-the-end rule was originally only defined for array objects, so a pointer past a non-array object was UB in previous standards. That made reasoning about comparison of non-array object pointers which do not point to objects, unnecessary in the first place. I can't come up with a convincing example but there must be one, otherwise the committee wouldn't have included the new wording. The unlucky thing is that they didn't adapt the wording in p.6 which I think is the reason for the gcc bug.Colophon
@Vroomfondel: Both the ability to process things of arbitrary type as sequences of character values, and the ability to create "just-past" pointers without UB have been essential features of the language since its inception, and nothing in any published rationale I've seen even hinted at a desire to remove such features. If sizeof q is 8, then code must be able to evaluate ((unsigned char*)&q)+8 without UB, regardless of the type of q.Floozy
@Vroomfondel: If an implementation is not going to regard two pointers as being "consistently" equal, it must ensure that their representations will never be observed to be equal. If an implementation stored pointers as a base address plus offset, that would naturally imply that a pointer just past the end of declared object o would have a different representation from a pointer to the start of an adjacent object (or any part of any object other than o, for that matter) even if such pointers were allowed to compare as equal.Floozy
@Floozy was the wording of paragraph 7 really in C90? I think we are talking past each other. All I'm saying is that this gcc bug could be explained by misreading p.6 & 7 due to the unclear formulation in 6, talking about array objects. p.7 says pointers to valid objects are like pointers to array objects. The nitpicking that is necessary in the standard sometimes, can lend to the conclusion that the one-past-rule of p.6 does not apply iff the object is not an array object notwithstanding the exception from p.7.Colophon
@Vroomfondel: C was in wide use long before the publication of C89, and the authors of C89 explicitly stated that they wanted to avoid breaking existing code. They were far more interested in mandating behavior in cases where compilers might behave differently in the absence of a mandate, than in cases where all compilers had always behaved the same and they saw no reason to believe future compilers might do otherwise. If it would be essentially impossible for a compiler to yield Standard-mandated behavior in some cases without yielding useful behaviors in others...Floozy
...waste ink ordering compiler writers to yield the latter useful behaviors, since compiler writers would do so with or without such a mandate. Nothing in any published rationale I've seen suggests that such omissions were supposed to be taken as invitations for "clever" compilers to figure out how to yield the Standard-mandated behaviors without also yielding other useful behaviors other code might rely upon. To the contrary, the authors explicitly recognize that it would be possible to make an implementation that is simultaneously conforming but useless.Floozy
@supercat: Yes, yes, we know. You really should start a blog so you can just link to it rather than rewriting the same points again and again and again.Fell
@KeithThompson: I was responding to Vroomfondel, who seemed to think that the ability to use the just-past rule for non-array objects wasn't part of the language prior to C99. If just about every twentieth-century compiler treated it the same way, that would suggest to me it should have been viewed as part of the language whether or not the authors of the Standard saw fit to mention it explicitly.Floozy
C
15
int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);

is perfectly well-defined code, but probably more by luck than by judgement.

You are allowed to take the address of a scalar and set a pointer one past that address. So &a + 1 is valid, but &a + 2 is not. You are also allowed to compare the value of a pointer of the same type with the value of any other valid pointer using == and !=, although pointer arithmetic is only valid within arrays.

Your assertion that the address of a and b tells you about anything about how these are placed in memory is bunk. To be clear, you cannot "reach" b by pointer arithmetic on the address of a.

As for

struct s {
    int a;
    int b;
};

The standard guarantees that the address of the struct is the same as the address of a, but an arbitrary amount of padding is allowed to be inserted between a and b. Again, you can't reach the address of b by any pointer arithmetic on the address of a.

Columba answered 30/8, 2017 at 18:50 Comment(7)
But what about point 6. in OP's question (6.5.9)? I.e. one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. Combined with point 7., it seems like it matches this situation.Cataclinal
@groo when defining int a; int b; nothing can be assumed about memory location of these two variables...Puzzlement
@4386427 Part 2 of section 6.5.9 states that the two pointers must be "pointers to qualified or unqualified versions of compatible types". A pointer to an int and a pointer to a float are not compatible, however exceptions are made for void * and NULL.Knives
@4386427 I suppose he would have mean "of the same type".Puzzlement
@Jean-BaptisteYunès: the question is whether it's defined behavior to compare them. In this case, they are next to each other, as you can see from their dumped addresses, so it's compiler which is assuming they aren't.Cataclinal
@Groo Yes it is defined in this special case, misread the full question. His compiler behave strangely.Puzzlement
The first snippet contains unspecified behaviour (0 1, 1 0 and 0 0 are all permitted). Also this doesn't answer the question, and the results may tell you something about how the objects are placed in memory (a 1 would indicate that one object immediately follows the other).Radian
T
8

Can an equality comparison of unrelated pointers evaluate to true?

Yes. C specifies when this is true.

Two pointers compare equal if and only if ... or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. C11dr §6.5.9 6

To be clear: adjacent variables in code do not need to be adjacent in memory, yet can be.


The below code demonstrates that it is possible. It uses a memory dump of a int* in addition to the conventional "%p" and (void*).

Yet OP's code and output not reflect this. Given the "compare equal if and only if" part of the above spec, IMO, OP's compilation is non-compliant. Adjacent in memory variables p,q , of the same type, either &p+1 == &q or &p == &q+1 must be true.

No opinion if the objects differ in type - OP does not ask that IAC.


void print_int_ptr(const char *prefix, int *p) {
  printf("%s %p", prefix, (void *) p);
  union {
    int *ip;
    unsigned char uc[sizeof (int*)];
  } u = {p};
  for (size_t i=0; i< sizeof u; i++) {
    printf(" %02X", u.uc[i]);
  }
  printf("\n");
}

int main(void) {
  int b = rand();
  int a = rand();
  printf("sizeof(int) = %zu\n", sizeof a);
  print_int_ptr("&a     =", &a);
  print_int_ptr("&a + 1 =", &a + 1);
  print_int_ptr("&b     =", &b);
  print_int_ptr("&b + 1 =", &b + 1);
  printf("&a + 1 == &b: %d\n", &a + 1 == &b);
  printf("&a == &b + 1: %d\n", &a == &b + 1);
  return a + b;
}

Output

sizeof(int) = 4
&a     = 0x28cc28 28 CC 28 00
&a + 1 = 0x28cc2c 2C CC 28 00  <-- same bit pattern
&b     = 0x28cc2c 2C CC 28 00  <-- same bit pattern
&b + 1 = 0x28cc30 30 CC 28 00
&a + 1 == &b: 1                <-- compare equal
&a == &b + 1: 0
Travesty answered 30/8, 2017 at 22:2 Comment(4)
This has been discussed a lot in the past, but gcc deliberately fails to comply with C11 because they deem that compliance in this case would pessimize the code (and I agree); it has been proposed for C2X that this changes to be unspecifiedRadian
@Radian Interesting part about gcc failing to comply. Any reference?Travesty
Not offhand, noRadian
I think the Standard would allow an implementation to behave as though unrelated objects are sometimes adjacent and sometimes not, if a pointer just past the end of one object has a different representation from a pointer to the start of an adjacent object in memory. On any commonplace hardware, however, the cost of ensuring the pointer representations are distinct would almost certainly exceed any optimization benefit achievable from such license.Floozy
F
5

The authors of the Standard weren't trying to make it "language-lawyer-proof", and as a consequence, it is somewhat ambiguous. Such ambiguity will not generally be a problem when compiler writers make a bona fide effort to uphold the Principle of Least Astonishment, since there is a clear non-astonishing behavior, and any other behavior would have astonishing consequences. On the other hand, it does mean those compiler writers who are more interested in whether optimizations can be justified under any reading of the Standard than in whether they will be compatible with existing code can find interesting opportunities to justify incompatibility.

The Standard doesn't require that pointers' representations bear any relationship to the underlying physical architecture. It would be perfectly legitimate for a system to represent each pointer as a combination of a handle and an offset. A system which represented pointers in such fashion would be free to move the objects represented thereby around in physical storage as it saw fit. On such a system, the first byte of object #57 might follow immediately after the last byte of object #23 at one moment in time, but might be at some completely unrelated location at some other moment. I see nothing in the Standard that would prohibit such an implementation from reporting a "just past" pointer for object #23 as equal to a pointer to object #57 when the two objects happened to be adjacent, and as unequal when they happened not to be.

Further, under the as-if rule, an implementation that would be justified in moving objects around in such fashion and having a quirky equality operator, as a result, would be allowed to have a quirky equality operator whether or not it physically moved objects around in storage.

If, however, an implementation specifies how pointers are stored in RAM, and such definition would be inconsistent with the behavior described above, however, that would compel the implementation to implement the equality operator in a fashion consistent with that specification. Any compiler that wants to have a quirky equality operator must refrain from specifying a pointer-storage format that would be inconsistent with such behavior.

Further, the Standard would seem to imply that if code observes that if two pointers with defined values have identical representation, they must compare equal. Reading an object using a character type and then writing that same sequence of character-type values into another object should yield an object equivalent to the original; such equivalence is a fundamental feature of the language. If p is a pointer "just past" one object, and q is a pointer to another object, and their representations are copied to p2 and q2, respectively, then p1 must compare equal to p and q2 to q. If the decomposed character-type representations of p and q are equal, that would imply that q2 was written with the same sequence of character-type values as p1, which would, in turn, imply that all four pointers must be equal.

Consequently, while it would be allowable for a compiler to have quirky equality semantics for pointers which are never exposed to code that might observe their byte-level representation, such behavioral license would not extend to pointers which are thus exposed. If an implementation defines a directive or setting that invites compilers to have individual comparisons arbitrarily report equal or unequal when given pointers to the end of one object and the start of another whose placement would only be observable via such comparison, the implementation wouldn't have to worry about conformance in cases where pointer representations are observed. Otherwise, though, even in if there are cases where conforming implementations would be allowed to have quirky comparison semantics, that doesn't mean any quality implementations should do so unless invited unless a pointer just past the end of one object would naturally have a different representation from a pointer to the start of the next.

Floozy answered 21/9, 2017 at 23:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.