Is it unspecified behavior to compare pointers to different arrays for equality?
Asked Answered
D

3

18

The equality operators have the semantic restrictions of relational operators on pointers:

The == (equal to) and the != (not equal to) operators have the same semantic restrictions, conversions, and result type as the relational operators except for their lower precedence and truth-value result. [C++03 §5.10p2]

And the relational operators have a restriction on comparing pointers:

If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified. [§5.9p2]

Is this a semantic restriction which is "inherited" by equality operators?

Specifically, given:

int a[42];
int b[42];

It is clear that (a + 3) < (b + 3) is unspecified, but is (a + 3) == (b + 3) also unspecified?

Durwood answered 5/2, 2011 at 21:26 Comment(5)
Interesting question. If it was so, what about all the self-assignment tests if (this != &other)Syzran
Except for the opaque phrasing, it seems really quite simple: The standard fully specifies under which circumstances two pointers compare not-equal. Which pointer value (address) of two unrelated objects is larger however, is simply (and - IMHO - rather obviously) unspecified.Shoot
@Martin: What about a segmented architecture with near pointers having same offset, but for different segments? I don't think you'd want equality to be fully specified in this case, and the standard requires this comparison case to be well-formed (must compile, execute, etc.), near as I can tell.Durwood
the standard does require meaningful results, even in that case -- i.e., == must only be true if both of them are null pointers, or else refer to the same object (and the converse for !=, of course).Alvita
In segmented memory the == and != must compare the segment part of the pointer too.Lighterman
S
15

The semantics for op== and op!= explicitly say that the mapping is except for their truth-value result. So you need to look what is defined for their truth value result. If they say that the result is unspecified, then it is unspecified. If they define specific rules, then it is not. It says in particular

Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address

Snodgrass answered 5/2, 2011 at 21:31 Comment(3)
I was not reading it as the unspecified-ness of the result being included in "except for their truth-value result", since that would seem to negate "have the same semantic restrictions". I'm not sure this answer is the best way to interpret or not, but it would resolve this question.Durwood
+1 : I totally agree about the interpretation of except for their truth-value result. It is horrible standardese though :-)Shoot
The quoted text has been superseded by DR1652Ustkamenogorsk
A
13

The result from equality operators (== and !=) produce specified results as long as the pointers are to objects of the same type. Given two pointers to the same type, exactly one of the following is true:

  1. both are null pointers, and they compare equal to each other.
  2. both are pointers to the same object, and they compare equal to each other.
  3. they are pointers to different objects, and they compare not-equal to each other.
  4. at least one is not initialized, and the result of the comparison is not defined (and, in fact, the comparison itself may never happen--just trying to read the pointer to do the comparison gives undefined behavior).

Under the same constraints (both pointers are to the same type of object) the result from the ordering operators (<, <=, >, >=) is only specified if both of them are pointers to the same object, or to separate objects in the same array (and for this purpose, a "chunk" of memory allocated with malloc, new, etc., qualifies as an array). If the pointers refer to separate objects that are not part of the same array, the result is unspecified. If one or both the pointers has not be initialized, you have undefined behavior.

Despite that, however, the comparison templates in the standard library (std::less, std::greater, std::less_equal and std::greater_equal) do all yield a meaningful result, even when/if the built-in operators do not. In particular, they are required to yield a total ordering. As such, you can get ordering if you want it, just not with the built-in comparison operators (though, of course, if either or both of the pointers is un-initialized, the behavior is still undefined).

Alvita answered 5/2, 2011 at 21:52 Comment(9)
The relational operators are defined, but unspecified in the given case. None of the comparison operators (when used on pointers) are undefined. I'm asking about == and not <, and std::equal_to says it uses == without being included in the special allowance for std::less, etc.Durwood
@Fred -- the operators are defined, but the results are not. I guess I could try to re-word that to be a bit more clear, but (IMO) what I've said is already easier to understand than the wording in the standard.Alvita
The results are unspecified, which is distinctly defined differently from undefined (as in undefined behavior). A terminological nitpick, perhaps, but important since UB has severe implications; and one which I found surprising, since I had also thought they were UB.Durwood
At least IMO, saying a result is not defined is different from saying that the code has undefined behavior, but I've done a bit of editing to ensure against that misunderstanding.Alvita
I think you are using "a result is not defined" to mean exactly what the standard means with "unspecified", which, in contrast to "implementation-defined", doesn't require documentation and thus doesn't require consistency. (I base the lack of consistency on behavior that's undocumented can always include non-obvious and unspecified factors affecting it.) We both definitely agree it's different from undefined, but if you do mean what the standard does with "unspecified", then I think it's best to use that word when talking standardese.Durwood
I would like to point out that confusingly for C, comparing "unrelated" pointers gives rise to undefined behavior (6.5.8.5).Agio
Your case 3 has a special case that should be mentioned: comparison of one-past-the-end pointer of one object, to the start pointer of another object. In the C++11 original text it actually said this case should compare equal; however DR 1652 changed/clarified it to be unspecifiedUstkamenogorsk
Late comment, I know – but 'exactly one of the following is true' lacks yet a case: one pointer null pointer, the other one pointing to a valid object... Would a (dangling) pointer to a destructed object count as 'uninitialised'???Clavicle
@Aconcagua: You're right, there is another case there. Comparing a null pointer to a non-null pointer should always compare as not equal. After you've destroyed the pointee object, attempting to use the pointer in any way is pretty much like its uninitialized (comparison can fail completely), but in practice that's now so rare it hardly matters (the usual case it would fail would be on a segmented architecture, which can fail completely once a segment ceases to exist).Alvita
S
6

Since there's confusion on conformance semantics, these are the rules for C++. C uses a completely different conformance model.

  1. Undefined behaviour is an oxymoronic term, it means the translator NOT your program, may do as it pleases. This generally means it can generate code which will also do anything it pleases (but that is a deduction). Where the Standard says behaviour is undefined the text is actually of no significance to the user in the sense that eliding this text will not change the requirements the Standard imposes on translators.

  2. Ill formed program means that unless otherwise specified the behaviour of the translator is rigidly defined: it is required to reject your program and issue a diagnostic message. The primary special case here is the One-Definition Rule, if you breach that your program is ill-formed but no diagnostic is required.

  3. Implementation defined imposes a requirement on the translator that it contain documentation specifying the behaviour explicitly. In this special case Undefined Behaviour can be the result but must be explicitly stated.

  4. Unspecified is a stupid term which means that the behaviour come from a set. In this sense well-defined is just a special case where the set of permitted behaviours contains only one element. Unspecified does not require documentation, so in some sense it also means the same as implementation defined without documentation.

In general, the C++ Standard is a not a Language Standard, it is a model for a language Standard. To generate an actual Standard you have to plug in various parameters. The easiest of these to recognize are the implementation defined limits.

There are a couple of silly conflicts in the Standard, for example, a legitimate translator can reject every apparently good C++ program on the basis that you are required to supply a main() function but the translator only supports identifiers of 1 character. This problem is resolve by the notion of QOI or Quality of Implementation. It basically says, who cares, no one is going to buy that compiler just because it is conforming.

Technically the unspecified nature of operator < when the pointers are to unrelated objects is probably intended to mean: you will get some kind of result which is either true or false but your program will not crash, however this is not the correct meaning of unspecified, so that is a Defect: unspecified imposed a burden on the Standards writers to document the set of allowed behaviours because if the set is open, then it is equivalent to undefined behaviour.

I actually proposed std::less as a solution to the problem that some data structures require keys to be totally ordered, but pointers are not totally ordered by operator <. On most machines using linear addressing less is the same as <, but the less operation on, say, an x86 processor is potentially more expensive.

Sheilasheilah answered 6/2, 2011 at 2:59 Comment(3)
What term would you prefer to describe the situation where there is no guarantee as to the value an expression will produce, but code may nonetheless legitimately evaluate the expression if it is prepared for any value that may result? For example, suppose f(x), f(y), and f(z) should ideally all yield the same value, but at most one of x, y, or z may be corrupt. If operations on a corrupt value yield indeterminate result, one could (absent side-effects) safely say temp = f(x); return temp==f(y) ? temp : f(z); even if one couldn't determine whether x, y, or z was corrupt.Hemihedral
If invoking f() on a corrupt value were not guaranteed safe, then if one didn't save some other means of telling whether x, y, or z was corrupt one couldn't handle corruption. On the other hand, if the invocation is safe, then even though one wouldn't know when storing temp whether the value was good or not, one would nonetheless be able to determine that later.Hemihedral
The implementation is not required to reject an ill-formed program. It must generate a diagnostic but it could carry on to generate a binary or do anything else.Ustkamenogorsk

© 2022 - 2024 — McMap. All rights reserved.