How does pointer comparison work in C? Is it ok to compare pointers that don't point to the same array?
Asked Answered
T

7

36

In K&R (The C Programming Language 2nd Edition) chapter 5 I read the following:

First, pointers may be compared under certain circumstances. If p and q point to members of the same array, then relations like ==, !=, <, >=, etc. work properly.

Which seems to imply that only pointers pointing to the same array can be compared.

However when I tried this code

    char t = 't';
    char *pt = &t;
    char x = 'x';
    char *px = &x;

    printf("%d\n", pt > px);

1 is printed to the screen.

First of all, I thought I would get undefined or some type or error, because pt and px aren't pointing to the same array (at least in my understanding).

Also is pt > px because both pointers are pointing to variables stored on the stack, and the stack grows down, so the memory address of t is greater than that of x? Which is why pt > px is true?

I get more confused when malloc is brought in. Also in K&R in chapter 8.7 the following is written:

There is still one assumption, however, that pointers to different blocks returned by sbrk can be meaningfully compared. This is not guaranteed by the standard which permits pointer comparisons only within an array. Thus this version of malloc is portable only among machines for which the general pointer comparison is meaningful.

I had no issue comparing pointers that pointed to space malloced on the heap to pointers that pointed to stack variables.

For example, the following code worked fine, with 1 being printed:

    char t = 't';
    char *pt = &t;
    char *px = malloc(10);
    strcpy(px, pt);
    printf("%d\n", pt > px);

Based on my experiments with my compiler, I'm being led to think that any pointer can be compared with any other pointer, regardless of where they individually point. Moreover, I think pointer arithmetic between two pointers is fine, no matter where they individually point because the arithmetic is just using the memory addresses the pointers store.

Still, I am confused by what I am reading in K&R.

The reason I'm asking is because my prof. actually made it an exam question. He gave the following code:

struct A {
    char *p0;
    char *p1;
};

int main(int argc, char **argv) {
    char a = 0;
    char *b = "W";
    char c[] = [ 'L', 'O', 'L', 0 ];

   struct A p[3];
    p[0].p0 = &a;
    p[1].p0 = b;
    p[2].p0 = c;

   for(int i = 0; i < 3; i++) {
        p[i].p1 = malloc(10);
        strcpy(p[i].p1, p[i].p0);
    }
}

What do these evaluate to:

  1. p[0].p0 < p[0].p1
  2. p[1].p0 < p[1].p1
  3. p[2].p0 < p[2].p1

The answer is 0, 1, and 0.

(My professor does include the disclaimer on the exam that the questions are for a Ubuntu Linux 16.04, 64-bit version programming environment)

(editor's note: if SO allowed more tags, that last part would warrant , , and maybe . If the point of the question / class was specifically low-level OS implementation details, rather than portable C.)

Toback answered 29/12, 2019 at 2:1 Comment(21)
You are maybe confusing what is valid in C with what is safe in C. Comparing two pointers to the same type can always be done (checking for equality, for example) however, using pointer arithmetic and comparing > and < is only safe when used within a given array (or memory block).Divorcement
@AdrianMole So basically using pointer arithmetic comparing > and < can be done, but it's not safe right?Toback
Moreover, wouldn't strcpy(px, pt); invoke UB, as pt is not a NUL terminated string?Verdure
As an aside, you should not be learning C from K&R. For a start, the language has been through a lot of changes since then. And, to be honest, the example code in there was from a time when terseness rather than readabilty was valued.Adhibit
No, it isn't guaranteed to work. It can fail in practice on machines with segmented memory models. See Does C have an equivalent of std::less from C++? On most modern machines, it will happen to work despite UB.Modicum
Your confusion is this: "I thought I would get undefined or some type or error". You would in Java or Python. C doesn't check assumptions for you. Your code isn't valid, but that just means you get undefined behavior. It doesn't mean you'll see an error. In C it's up to you to make sure your code is valid. Comparing pointers within an array does what you think it does. Comparing other pointers will still give you an answer, but the language doesn't define what that answer will mean (if anything).Nubile
@Adam: Close, but this is actually UB (unless the compiler the OP was using, GCC, does choose to define it. It might). But UB doesn't mean "definitely explodes"; one of the possible behaviours for UB is working the way you expected!! This is what makes UB so nasty; it can work right in a debug build and fail with optimization enabled, or vice versa, or break depending on surrounding code. Comparing other pointers will still give you an answer, but the language doesn't define what that answer will mean (if anything). No, crashing is allowed. It's truly UB.Modicum
@Shisui: It would be more accurate to say that pointer comparisons outside the same memory block (which may or not be thought of as an array) simply won't return a meaningful answer - at least on systems with virtual memory.Travel
@PeterCordes that's what I said, it's UB. OP expects it to throw an error of some kind like you would get in other languages. My point is that C doesn't do that.Nubile
@Adam: Oh yes, nevermind the first part of my comment, I misread yours. But you claim Comparing other pointers will still give you an answer. That's not true. That would be an unspecified result, not full UB. UB is much worse and means your program could segfault or SIGILL if execution reaches that statement with those inputs (at any point before or after that actually happens). (Only plausible on x86-64 if the UB is visible at compile time, but in general anything can happen.) Part of the point of UB is to let the compiler make "unsafe" assumptions while generating asm.Modicum
@Adam: If you want an unspecified result, cast to intptr_t or uintptr_t before comparing. On a machine with a flat memory model, that will actually do what you want.Modicum
@PeterCordes: Nothing in the Standard would say anything meaningful about the result of such comparisons. Indeed, given int p[3],*q=p+1,*r=p+2; uintptr_t p1 = (uintptr_t)p, p2=(uintptr_t)p, q1=(uintptr_t)q, q2=(uintptr_t)r;, the values of the uintptr_t values could be ranked p1 < q1 < r1 < p2, and such a thing could plausibly happen on a 32-bit segmented-mode compiler for the 80386 even though pointer comparisons among p, q, and r would be transitively ranked.Lilla
Are you sure the questions didn't dereference the pointers? UB aside, the the code as-is compares pointers to locals with pointers returned by malloc, which you can't predict without knowing the internal state of the memory allocator at the time of execution. Maybe it's a trick question requiring knowledge of local vs heap regions on Ubuntu, and the strings themselves are just a red herring?Calicle
@PeterCordes read the second part of the sentence you quoted. I never said the answer is "correct" or even "useful". But there will be a value returned, for the reasons you explained. We agree. I sense OP has assumptions from a non-C background and that all they need is to have those assumptions pointed out and explained.Nubile
Can such a comparison be coded? Yes, certainly. I doubt there's a compiler or runtime system in existence that would throw an error about this. What does the result mean, though? Is it useful to know that pt contains an address that is "greater" than px? Maybe, but most often, maybe not.Illimani
@Adam: I think it's an important point to make for someone new to C that UB doesn't just mean the value might be wrong, it can corrupt other things and even lead to crashes. Whether or not that's likely for GCC targeting x86-64, I thought we were talking about C in general. In a program that encounters UB, it doesn't make sense to say that a value is returned at all. If what you said was true, if(p1<p2){} could safely exist in a portable program; the only diff would be whether the empty if body executed. But UB means your whole program could crash, or literally anything.Modicum
@Adam: Like I said earlier, if you want to portably definitely get a result that might be meaningless, cast pointers to intptr_t. Comparing intptr_t is always safe (assuming pointers don't convert to trap representations). In most implementations it will be the same as actual pointer comparison, but in the ISO C abstract machine it avoids UB.Modicum
@PeterCordes: The result using intptr_t or uintptr_t may be meaningless even in cases where the Standard would have defined the behavior of a pointer comparison. Are you aware of any commercially-designed compilers which aggressively interpret the Standard as an excuse to process actions whose natural platform behavior would have no side effects in such a way as to arbitrarily disrupt the behavior of surrounding code, or is such behavior unique to clang, gcc, or compilers derived from them?Lilla
@Peter Cordes: Or you could have "if (p1 < p2) {printf (".lt.\n");} else {printf (".ge.\n");}, the only problem being that you would potentially get different results every time you ran the program. Undefined behavior, but not dangerous.Travel
@supercat: no, I'm not, and I don't think in practice even GCC / clang break this. In practice comparing pointers is probably better, and we can say that quality implementation mostly do define the behaviour. My replies to \@Nubile were just making the point that UB can in theory / in general be much worse than an undefined result. I'm not saying that's going to actually be the case for pointer comparison on a compiler anyone wants to use.Modicum
@PeterCordes: Clang and gcc will sometimes break code which observes that a pointer "just past" one object is coincidentally equal to another object. Given e.g. extern int x[],y[]; void test(int i) { y[0] = 1; int *p = y+i; if (p == x+10) *p=2; return y[0];}, clang will replace the write to *p with a write to x[10], but then ignore the possibility that the resulting access to x[10] (which in source code was an access to y[i]!) might affect y[0].Lilla
B
38

According to the C11 standard, the relational operators <, <=, >, and >= may only be used on pointers to elements of the same array or struct object. This is spelled out in section 6.5.8p5:

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object,pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

Note that any comparisons that do not satisfy this requirement invoke undefined behavior, meaning (among other things) that you can't depend on the results to be repeatable.

In your particular case, for both the comparison between the addresses of two local variables and between the address of a local and a dynamic address, the operation appeared to "work", however the result could change by making a seemingly unrelated change to your code or even compiling the same code with different optimization settings. With undefined behavior, just because the code could crash or generate an error doesn't mean it will.

As an example, an x86 processor running in 8086 real mode has a segmented memory model using a 16-bit segment and a 16-bit offset to build a 20-bit address. So in this case an address doesn't convert exactly to an integer.

The equality operators == and != however do not have this restriction. They can be used between any two pointers to compatible types or NULL pointers. So using == or != in both of your examples would produce valid C code.

However, even with == and != you could get some unexpected yet still well-defined results. See Can an equality comparison of unrelated pointers evaluate to true? for more details on this.

Regarding the exam question given by your professor, it makes a number of flawed assumptions:

  • A flat memory model exists where there is a 1-to-1 correspondence between an address and an integer value.
  • That the converted pointer values fit inside an integer type.
  • That the implementation simply treats pointers as integers when performing comparisons without exploiting the freedom given by undefined behavior.
  • That a stack is used and that local variables are stored there.
  • That a heap is used to pull allocated memory from.
  • That the stack (and therefore local variables) appears at a higher address than the heap (and therefore allocated objects).
  • That string constants appear at a lower address then the heap.

If you were to run this code on an architecture and/or with a compiler that does not satisfy these assumptions then you could get very different results.

Also, both examples also exhibit undefined behavior when they call strcpy, since the right operand (in some cases) points to a single character and not a null terminated string, resulting in the function reading past the bounds of the given variable.

Billiebilling answered 29/12, 2019 at 2:12 Comment(16)
My professor does include the disclaimer on the exam that the questions are for a Ubuntu Linux 16.04, 64-bit version programming environment, so I guess that covers the list of assumptions you listed? I know that it's a linear memory address soToback
@Toback Even given that, you still shouldn't depend on the results. Compilers can get very aggressive when it comes to optimization and will use undefined behavior as an opportunity to do so. It's possible that using a different compiler and/or different optimization settings can generate different output.Billiebilling
@Shisui: It will in general happen to work on machines with a flat memory model, such as x86-64. Some compilers for such systems might even define the behaviour in their documentation. But if not, then "insane" behaviour can happen because of compile-time-visible UB. (In practice I don't think anyone wants that so it's not something mainstream compilers look for and "try to break".)Modicum
Like if a compiler sees that one path of execution would lead to < between malloc result and a local variable (automatic storage, i.e. stack), it could assume that path of execution is never taken and just compile the whole function to a ud2 instruction (raises an illegal-instruction exception which the kernel will handle by delivering a SIGILL to the process). GCC / clang do this in practice for other kinds of UB, like falling off the end of a non-void function. godbolt.org is down right now it seems, but try copy/pasting int foo(){int x=2;} and note the lack of a retModicum
@Shisui: TL:DR: it's not portable C, despite the fact that it happens to work fine on x86-64 Linux. Making assumptions about the results of the comparison is just crazy, though. If you aren't in the main thread, your thread stack will have been dynamically allocated using the same mechanism malloc uses to get more memory from the OS, so there's no reason to assume that your local vars (thread stack) are above malloc dynamically-allocated storage.Modicum
@dbush: re: how pointer comparison (might) work on a segmented machine like x86 real mode with 32-bit seg:off pointers able to represent the same linear address in multiple possible ways: see the thought experiments in Does C have an equivalent of std::less from C++?.Modicum
Also see open-std.org/jtc1/sc22/wg14/www/docs/… - ISO C has considered changing it to possibly implementation-defined. (Working collecting up these comments into a proper answer.)Modicum
@PeterCordes That's a really interesting read. It's good to know the standard committee is considering some "common sense" semantics regarding pointer comparisons in architectures where it makes sense to allow it.Billiebilling
@PeterCordes: What's needed is to recognize various aspects of behavior as "optionally defined", such that implementations may define them or not, at their leisure, but must indicate in testable fashion (e.g. predefined macro) if they do not do so. Additionally, instead of characterizing that any situation where the effects of an optimization would be observable as "Undefined Behavior", it would be far more useful to say that optimizers may regard certain aspects of behavior as "non-observable" if they indicate that they do so. For example, given int x,y;, an implementation...Lilla
...that stated that it regarded the relative placement of x and y as non-observable would be allowed to evaluate x<y or (&x)+1 == &y as arbitrarily yielding 0 or 1, without any requirement for consistency, but any particular invocation of that expression would be required to yields 0 or yields 1, with no side effects in either case. Programs needing stronger semantics could then refuse to run on implementations configured to behave in such fashion, while having defined behavior on platforms configured for the required semantics. Unfortunately, clang and gcc would...Lilla
...be required to reject a lot of programs unless the back-ends are substantially reworked to offer more semantic guarantees than their optimizers are designed to accommodate; they would no longer be able to claim that any programs they don't support are "broken".Lilla
@PeterCordes: "Falling off the end" of a non-void function is not UB at all. Only use of the return value by the caller is UB.Point
@R..: Oh, do C and C++ differ on that? I notice that GCC and clang only optimize away the ret in C++ mode, not with -xcModicum
@PeterCordes: Yes, I believe they do.Point
@R..: ok, then my example of how compilers handle some kinds of UB in practice only works when compiled as C++. oops. But other than that, it works very well to illustrate just how "broken" the generated asm can be.Modicum
@R..: On many platforms, an implementation could at no cost allow a wrapper function to handle interchangeably the cases where the called function returns a value which should be given to the wrapper's caller, or where the wrapper's return value is ignored; unfortunately, the Standard provides no means of identifying implementations that can uphold that guarantee.Lilla
D
12

The primary issue with comparing pointers to two distinct arrays of the same type is that the arrays themselves need not be placed in a particular relative positioning--one could end up before and after the other.

First of all, I thought I would get undefined or some type or error, because pt an px aren't pointing to the same array (at least in my understanding).

No, the result is dependent on implementation and other unpredictable factors.

Also is pt>px because both pointers are pointing to variables stored on the stack, and the stack grows down, so the memory address of t is greater than that of x? Which is why pt>px is true?

There isn't necessarily a stack. When it exists, it need not to grow down. It could grow up. It could be non-contiguous in some bizarre way.

Moreover, I think pointer arithmetic between two pointers is fine, no matter where they individually point because the arithmetic is just using the memory addresses the pointers store.

Let's look at the C specification, §6.5.8 on page 85 which discusses relational operators (i.e. the comparison operators you're using). Note that this does not apply to direct != or == comparison.

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. ... If the objects pointed to are members of the same aggregate object, ... pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values.

In all other cases, the behavior is undefined.

The last sentence is important. While I cut down some unrelated cases to save space, there's one case that's important to us: two arrays, not part of the same struct/aggregate object1, and we're comparing pointers to those two arrays. This is undefined behavior.

While your compiler just inserted some sort of CMP (compare) machine instruction which numerically compares the pointers, and you got lucky here, UB is a pretty dangerous beast. Literally anything can happen--your compiler could optimize out the whole function including visible side effects. It could spawn nasal demons.

1Pointers into two different arrays that are part of the same struct can be compared, since this falls under the clause where the two arrays are part of the same aggregate object (the struct).

Decry answered 29/12, 2019 at 2:12 Comment(2)
More importantly, with t and x being defined in the same function, there's zero reason to assume anything about how a compiler targeting x86-64 will lay out locals in the stack frame for this function. The stack growing downward has nothing to do with declaration order of variables in one function. Even in separate functions, if one could inline into the other then the "child" function's locals could still mix with the parents.Modicum
your compiler could optimize out the whole function including visible side effects Not an overstatement: for other kinds of UB (like falling off the end of a non-void function) g++ and clang++ really do that in practice: godbolt.org/z/g5vesB they assume that path of execution is not taken because it leads to UB, and compile any such basic blocks to an illegal instruction. Or to no instructions at all, just silently falling through to whatever asm is next if that function ever were called. (For some reason gcc doesn't do this, only g++).Modicum
E
6

Then asked what

p[0].p0 < p[0].p1
p[1].p0 < p[1].p1
p[2].p0 < p[2].p1

Evaluate to. The answer is 0, 1, and 0.

These questions reduce to:

  1. Is the heap above or below the stack.
  2. Is the heap above or below the string literal section of the program.
  3. same as [1].

And the answer to all three is "implementation defined". Your prof's questions are bogus; they have based it in traditional unix layout:

<empty>
text
rodata
rwdata
bss
< empty, used for heap >
...
stack
kernel

but several modern unices (and alternative systems) do not conform to those traditions. Unless they prefaced the question with " as of 1992 "; make sure to give a -1 on the eval.

Economizer answered 29/12, 2019 at 7:29 Comment(10)
Not implementation defined, undefined! Think of it this way, the former may vary between implementations but the implementations should document how behaviour is decided. The latter means behaviour may vary in any manner and the implementation doesn't have to tell you squat :-)Adhibit
@paxdiablo: According to Rationale by the authors of the Standard, "Undefined behavior... also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." The Rationale further says "The goal is to give the programmer a fighting chance to make powerful C programs that are also highly portable, without seeming to demean perfectly useful C programs that happen not to be portable, thus the adverb strictly." Commercial compiler writers understand this, but some other compiler writers don't.Lilla
There is another implementation defined aspect; pointer comparison is signed, so depending upon the machine / os / compiler, some addresses may be interpreted as negative. For example, a 32bit machine that placed the stack at 0xc<<28, would likely show the automatic variables at a lessor address than the heap or rodata.Economizer
@mevets: Does the Standard specify any situation in which the signedness of pointers in comparisons would be observable? I would expect that if a 16-bit platform allows objects greater than 32768 bytes, and arr[] is such an object,, the Standard would mandate that arr+32768 compare greater than arr even if a signed pointer comparison would report otherwise.Lilla
I don't know; the C standard is orbiting in Dante's ninth circle, praying for euthanasia. The OP specifically referenced K&R and an exam question. #UB is debris from a lazy working group.Economizer
@mevets: I think the root cause for UB posing the problems that it does is that the Standard is self-referentially contradictory in situations where parts of the Standard and an implementation's documentation specify the behavior in some circumstance, but some other part X of the Standard characterizes a more general encompassing action as undefined. If there were no difference in emphasis between X saying an action is undefined, and X saying nothing about it, then in the above situation the action should be expected to behave as described despite X, but that's not quite what's intended.Lilla
@mevets: The authors of clang and gcc interpret the Standard as given X absolute priority over everything else, but that's even further from intent. If the authors of the Standard had explicitly said that when such contradictions arise, quality implementations should behave as described when practical, and clearly document any situations in which they will deviate from such behavior, that would have made clear that low-quality implementations could behave as clang and gcc do without being non-conforming, but also made clear the Standard did not endorse such behavior.Lilla
@mevets: Better yet would be if the Standard made clear that implementations should behave as described in ways and cases that matter. Conforming implementations are deliberately allowed to deviate in ways or cases that wouldn't matter to their users, and consequently allowed to deviate even in ways that would. Compiler writers were expected to know better than the Committee about what cases would matter to their users, and the Committee expected the marketplace to resolve such issues (rather than being undermined by compilers whose maintainers are hostile to user complaints).Lilla
alt.standards.advocacy.yawn.Economizer
@mevets: Sorry--I should have said that failure to explicitly state things that should be obvious to any reasonable readers (the failing of the C89 group) doesn't really constitute "laziness" in my book, and I don't think laziness is the problem with later committees either. The problem nowadays is a deadlock between ideological groups that won't be resolvable, no matter how much effort is expended, without recognizing the legitimacy of multiple C dialects.Lilla
L
1

On almost any remotely-modern platform, pointers and integers have an isomorphic ordering relation, and pointers to disjoint objects are not interleaved. Most compilers expose this ordering to programmers when optimizations are disabled, but the Standard makes no distinction between platforms that have such an ordering and those that don't and does not require that any implementations expose such an ordering to the programmer even on platforms that would define it. Consequently, some compiler writers perform various kinds of optimizations and "optimizations" based upon an assumption that code will never compare use relational operators on pointers to different objects.

According to the published Rationale, the authors of the Standard intended that implementations extend the language by specifying how they will behave in situations the Standard characterizes as "Undefined Behavior" (i.e. where the Standard imposes no requirements) when doing so would be useful and practical, but some compiler writers would rather assume programs will never try to benefit from anything beyond what the Standard mandates, than allow programs to usefully exploit behaviors the platforms could support at no extra cost.

I'm not aware of any commercially-designed compilers that do anything weird with pointer comparisons, but as compilers move to the non-commercial LLVM for their back end, they're increasingly likely to process nonsensically code whose behavior had been specified by earlier compilers for their platforms. Such behavior isn't limited to relational operators, but can even affect equality/inequality. For example, even though the Standard specifies that a comparison between a pointer to one object and a "just past" pointer to an immediately-preceding object will compare equal, gcc and LLVM-based compilers are prone to generate nonsensical code if programs perform such comparisons.

As an example of a situation where even equality comparison behaves nonsensically in gcc and clang, consider:

extern int x[],y[];
int test(int i)
{
    int *p = y+i;
    y[0] = 4;
    if (p == x+10)
        *p = 1;
    return y[0];
}

Both clang and gcc will generate code that will always return 4 even if x is ten elements, y immediately follows it, and i is zero resulting in the comparison being true and p[0] being written with the value 1. I think what happens is that one pass of optimization rewrites the function as though *p = 1; were replaced with x[10] = 1;. The latter code would be equivalent if the compiler interpreted *(x+10) as equivalent to *(y+i), but unfortunately a downstream optimization stage recognizes that an access to x[10] would only defined if x had at least 11 elements, which would make it impossible for that access to affect y.

If compilers can get that "creative" with pointer equality scenario which is described by the Standard, I would not trust them to refrain from getting even more creative in cases where the Standard doesn't impose requirements.

Lilla answered 29/12, 2019 at 16:33 Comment(0)
C
0

It's simple: Comparing pointers does not make sense as memory locations for objects are never guaranteed to be in the same order as you declared them. The exception is arrays. &array[0] is lower than &array[1]. Thats what K&R points out. In practice struct member addresses are also in the order you declare them in my experience. No guarantees on that.... Another exception is if you compare a pointer for equal. When one pointer is equal to another you know it's pointing to the same object. Whatever it is. Bad exam question if you ask me. Depending on Ubuntu Linux 16.04, 64-bit version programming environment for an exam question ? Really ?

Chlorinate answered 30/12, 2019 at 23:13 Comment(2)
Technically, arrays aren't really an exception since you don't declare arr[0], arr[1], etc separately. You declare arr as a whole so the ordering of individual array elements is a different issue to that described in this question.Adhibit
Structure elements are guaranteed to be in order, which guarantees that one may use memcpy to copy a contiguous part of a structure and affect all of the elements therein and not affect anything else. The Standard is sloppy about terminology as to what kinds of pointer arithmetic can be done with structures or malloc() allocated storage. The offsetof macro would be rather useless if one couldn't to the same kind of pointer arithmetic with the bytes of a struct as with a char[], but the Standard doesn't expressly say that the bytes of a struct are (or can be used as) an array object.Lilla
T
-5

Pointers are just integers, like everything else in a computer. You absolutely can compare them with < and > and produce results without causing a program to crash. That said, the standard does not guarantee that those results have any meaning outside of array comparisons.

In your example of stack allocated variables, the compiler is free to allocate those variables to registers or stack memory addresses, and in any order it so choose. Comparisons such as < and > therefore won't be consistent across compilers or architectures. However, == and != aren't so restricted, comparing pointer equality is a valid and useful operation.

Tatiana answered 29/12, 2019 at 2:14 Comment(15)
The word stack appears exactly zero times in the C11 standard. And undefined behaviour means anything can happen (including program crash).Adhibit
@Adhibit Did I say it did?Tatiana
You mentioned stack-allocated variables. There is no stack in the standard, that's just an implementation detail. The more serious issue with this answer is the contention you can compare pointers with no chance of a crash - that's just wrong.Adhibit
@Adhibit I don't think being a standards maven is a useful practice. What compiler crashes on pointer comparisons in practice? For that matter what non-exotic platform operates without a stack? Agree to disagree I thinkTatiana
@nickelpro: If one wishes to write code that is compatible with the optimizers in gcc and clang, it is necessary to jump through a lot of silly hoops. Both optimizers will aggressively seek out opportunities to draw inferences about what things will be accessed by pointers whenever there's any way the Standard can be twisted to justify them (and even sometimes when there isn't). Given int x[10],y[10],*p;, if code evaluates y[0], then evaluates p>(x+5) and writes *p without modifying p in the interim, and finally evaluates y[0] again, ...Lilla
... the Standard would allow the compiler to treat the evaluation of p > (x+5) as evidence that p cannot possibly be usable to access y[0]. I don't know of any cases where present versions of clang and gcc would make those particular "optimizations", but I would not trust them to refrain from making them unless there's a major change in some language trends.Lilla
Also the assertion "Pointers are just integers" is plain wrong on some (e.g. segmented) architectures.Supposititious
@Peter-ReinstateMonica: Pointers were just integers on the original language Dennis Ritchie invented, and implementations targeting platforms where such treatment would make sense. The Standard loosened the specs to make the language usable on more exotic platforms, but I've seen no effort that was intended to weaken their semantics on code that wanted to exploit features of the target architecture.Lilla
@Lilla I think I agree with what you say but that does not invalidate what I said.Supposititious
nickelpro, agree to agree to disagree but your answer is still fundamentally wrong. I liken your approach to that of the people that use (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z') instead of isalpha() because what sane implementation would have those characters discontinuous? The bottom line is that, even if no implementation you know has a problem, you should be coding to the standard as much as possible if you value portability. I do appreciate the label "standards maven" though, thanks for that. I may put in on my CV :-)Adhibit
Don't get me wrong, I didn't downvote you (others have already done so no real point in me doing that). But I have to call out clearly incorrect answers, at a minimum to let the authors fix the problems.Adhibit
@Adhibit It's cool, I respect and understand my position is unpopular and my answer is significantly worse than the more comprehensive answers provided. When I provided my answer this question was unanswered completely. As an aside: I don't value portability until there's a demonstrated need, since 99% of code will only ever run on a single arch + os.Tatiana
@Tatiana You must be young and inexperienced. (This is not meant as an insult but I am working with a compiler from 1998, written for DOS, compiling software which is running on an i386 embedded system and will need to be ported to arm, most likely.) That compiler is running on its 4th OS and now on a 64 bit system.)Supposititious
@Tatiana You want to write your code as bug free as possible. Playing loose with conventions and standards is a great way of writing buggie code. Who ever heard of the iPhone a dozen years ago, but people want to use the iPhone instead of their PC. You never know where your code will go. You don't know when your compiler might be updated causing a nightmare for your program.Forkey
@Peter-ReinstateMonica: What you say is true of the language the Standard was written to describe, but compiler writers that don't need to worry about paying customers process a language where pointer comparisons can (and sometimes do) have weird and wacky side effects.Lilla
I
-5

What A Provocative Question!

Even cursory scanning of the responses and comments in this thread will reveal how emotive your seemingly simple and straight forward query turns out to be.

It should not be surprising.

Inarguably, misunderstandings around the concept and use of pointers represents a predominant cause of serious failures in programming in general.

Recognition of this reality is readily evident in the ubiquity of languages designed specifically to address, and preferably to avoid the challenges pointers introduce altogether. Think C++ and other derivatives of C, Java and its relations, Python and other scripts -- merely as the more prominent and prevalent ones, and more or less ordered in severity of dealing with the issue.

Developing a deeper understanding of the principles underlying, therefore must be pertinent to every individual that aspires to excellence in programming -- especially at the systems level.

I imagine this is precisely what your teacher means to demonstrate.

And the nature of C makes it a convenient vehicle for this exploration. Less clearly than assembly -- though perhaps more readily comprehensible -- and still far more explicitly than languages based on deeper abstraction of the execution environment.

Designed to facilitate deterministic translation of the programmer’s intent into instructions that machines can comprehend, C is a system level language. While classified as high-level, it really belongs in a ‘medium’ category; but since none such exists, the ‘system’ designation has to suffice.

This characteristic is largely responsible for making it a language of choice for device drivers, operating system code, and embedded implementations. Furthermore, a deservedly favoured alternative in applications where optimal efficiency is paramount; where that means the difference between survival and extinction, and therefore is a necessity as opposed to a luxury. In such instances, the attractive convenience of portability loses all its allure, and opting for the lack-lustre performance of the least common denominator becomes an unthinkably detrimental option.

What makes C -- and some of its derivatives -- quite special, is that it allows its users complete control -- when that is what they desire -- without imposing the related responsibilities upon them when they do not. Nevertheless, it never offers more than the thinnest of insulations from the machine, wherefore proper use demands exacting comprehension of the concept of pointers.

In essence, the answer to your question is sublimely simple and satisfyingly sweet -- in confirmation of your suspicions. Provided, however, that one attaches the requisite significance to every concept in this statement:

  • The acts of examining, comparing and manipulating pointers are always and necessarily valid, while the conclusions derived from the result depends on the validity of the values contained, and thus need not be.

The former is both invariably safe and potentially proper, while the latter can only ever be proper when it has been established as safe. Surprisingly -- to some -- so establishing the validity of the latter depends on and demands the former.

Of course, part of the confusion arises from the effect of the recursion inherently present within the principle of a pointer -- and the challenges posed in differentiating content from address.

You have quite correctly surmised,

I'm being led to think that any pointer can be compared with any other pointer, regardless of where they individually point. Moreover, I think pointer arithmetic between two pointers is fine, no matter where they individually point because the arithmetic is just using the memory addresses the pointers store.

And several contributors have affirmed: pointers are just numbers. Sometimes something closer to complex numbers, but still no more than numbers.

The amusing acrimony in which this contention has been received here reveals more about human nature than programming, but remains worthy of note and elaboration. Perhaps we will do so later...

As one comment begins to hint; all this confusion and consternation derives from the need to discern what is valid from what is safe, but that is an oversimplification. We must also distinguish what is functional and what is reliable, what is practical and what may be proper, and further still: what is proper in a particular circumstance from what may be proper in a more general sense. Not to mention; the difference between conformity and propriety.

Toward that end, we first need to appreciate precisely what a pointer is.

  • You have demonstrated a firm grip on the concept, and like some others may find these illustrations patronizingly simplistic, but the level of confusion evident here demands such simplicity in clarification.

As several have pointed out: the term pointer is merely a special name for what is simply an index, and thus nothing more than any other number.

This should already be self-evident in consideration of the fact that all contemporary mainstream computers are binary machines that necessarily work exclusively with and on numbers. Quantum computing may change that, but that is highly unlikely, and it has not come of age.

Technically, as you have noted, pointers are more accurately addresses; an obvious insight that naturally introduces the rewarding analogy of correlating them with the ‘addresses’ of houses, or plots on a street.

  • In a flat memory model: the entire system memory is organized in a single, linear sequence: all houses in the city lie on the same road, and every house is uniquely identified by its number alone. Delightfully simple.

  • In segmented schemes: a hierarchical organization of numbered roads is introduced above that of numbered houses so that composite addresses are required.

    • Some implementations are still more convoluted, and the totality of distinct ‘roads’ need not sum to a contiguous sequence, but none of that changes anything about the underlying.
    • We are necessarily able to decompose every such hierarchical link back into a flat organization. The more complex the organization, the more hoops we will have to hop through in order to do so, but it must be possible. Indeed, this also applies to ‘real mode’ on x86.
    • Otherwise the mapping of links to locations would not be bijective, as reliable execution -- at the system level -- demands that it MUST be.
      • multiple addresses must not map to singular memory locations, and
      • singular addresses must never map to multiple memory locations.

Bringing us to the further twist that turns the conundrum into such a fascinatingly complicated tangle. Above, it was expedient to suggest that pointers are addresses, for the sake of simplicity and clarity. Of course, this is not correct. A pointer is not an address; a pointer is a reference to an address, it contains an address. Like the envelope sports a reference to the house. Contemplating this may lead you to glimpse what was meant with the suggestion of recursion contained in the concept. Still; we have only so many words, and talking about the addresses of references to addresses and such, soon stalls most brains at an invalid op-code exception. And for the most part, intent is readily garnered from context, so let us return to the street.

Postal workers in this imaginary city of ours are much like the ones we find in the ‘real’ world. No one is likely to suffer a stroke when you talk or enquire about an invalid address, but every last one will balk when you ask them to act on that information.

Suppose there are only 20 houses on our singular street. Further pretend that some misguided, or dyslexic soul has directed a letter, a very important one, to number 71. Now, we can ask our carrier Frank, whether there is such an address, and he will simply and calmly report: no. We can even expect him to estimate how far outside the street this location would lie if it did exist: roughly 2.5 times further than the end. None of this will cause him any exasperation. However, if we were to ask him to deliver this letter, or to pick up an item from that place, he is likely to be quite frank about his displeasure, and refusal to comply.

Pointers are just addresses, and addresses are just numbers.

Verify the output of the following:

void foo( void *p ) {
   printf(“%p\t%zu\t%d\n”, p, (size_t)p, p == (size_t)p);
}

Call it on as many pointers as you like, valid or not. Please do post your findings if it fails on your platform, or your (contemporary) compiler complains.

Now, because pointers are simply numbers, it is inevitably valid to compare them. In one sense this is precisely what your teacher is demonstrating. All of the following statements are perfectly valid -- and proper! -- C, and when compiled will run without encountering problems, even though neither pointer need be initialized and the values they contain therefore may be undefined:

  • We are only calculating result explicitly for the sake of clarity, and printing it to force the compiler to compute what would otherwise be redundant, dead code.
void foo( size_t *a, size_t *b ) {
   size_t result;
   result = (size_t)a;
   printf(“%zu\n”, result);
   result = a == b;
   printf(“%zu\n”, result);
   result = a < b;
   printf(“%zu\n”, result);
   result = a - b;
   printf(“%zu\n”, result);
}

Of course, the program is ill-formed when either a or b is undefined (read: not properly initialized) at the point of testing, but that is utterly irrelevant to this part of our discussion. These snippets, as too the following statements, are guaranteed -- by the ‘standard’ -- to compile and run flawlessly, notwithstanding the IN-validity of any pointer involved.

Problems only arise when an invalid pointer is dereferenced. When we ask Frank to pick up or deliver at the invalid, non-existent address.

Given any arbitrary pointer:

int *p;

While this statement must compile and run:

printf(“%p”, p);

... as must this:

size_t foo( int *p ) { return (size_t)p; }

... the following two, in stark contrast, will still readily compile, but fail in execution unless the pointer is valid -- by which we here merely mean that it references an address to which the present application has been granted access:

printf(“%p”, *p);
size_t foo( int *p ) { return *p; }

How subtle the change? The distinction lies in the difference between the value of the pointer -- which is the address, and the value of the contents: of the house at that number. No problem arises until the pointer is dereferenced; until an attempt is made to access the address it links to. In trying to deliver or pick up the package beyond the stretch of the road...

By extension, the same principle necessarily applies to more complex examples, including the aforementioned need to establish the requisite validity:

int* validate( int *p, int *head, int *tail ) { 
    return p >= head && p <= tail ? p : NULL; 
}

Relational comparison and arithmetic offer identical utility to testing equivalence, and are equivalently valid -- in principle. However, what the results of such computation would signify, is a different matter entirely -- and precisely the issue addressed by the quotations you included.

In C, an array is a contiguous buffer, an uninterrupted linear series of memory locations. Comparison and arithmetic applied to pointers that reference locations within such a singular series are naturally, and obviously meaningful in relation both to each other, and to this ‘array’ (which is simply identified by the base). Precisely the same applies to every block allocated through malloc, or sbrk. Because these relationships are implicit, the compiler is able to establish valid relationships between them, and therefore can be confident that calculations will provide the answers anticipated.

Performing similar gymnastics on pointers that reference distinct blocks or arrays do not offer any such inherent, and apparent utility. The more so since whatever relation exists at one moment may be invalidated by a reallocation that follows, wherein that is highly likely to change, even be inverted. In such instances the compiler is unable to obtain the necessary information to establish the confidence it had in the previous situation.

You, however, as the programmer, may have such knowledge! And in some instances are obliged to exploit that.

There ARE, therefore, circumstances in which EVEN THIS is entirely VALID and perfectly PROPER.

In fact, that is exactly what malloc itself has to do internally when time comes to try merging reclaimed blocks -- on the vast majority of architectures. The same is true for the operating system allocator, like that behind sbrk; if more obviously, frequently, on more disparate entities, more critically -- and relevant also on platforms where this malloc may not be. And how many of those are not written in C?

The validity, security and success of an action is inevitably the consequence of the level of insight upon which it is premised and applied.

In the quotes you have offered, Kernighan and Ritchie are addressing a closely related, but nonetheless separate issue. They are defining the limitations of the language, and explaining how you may exploit the capabilities of the compiler to protect you by at least detecting potentially erroneous constructs. They are describing the lengths the mechanism is able -- is designed -- to go to in order to assist you in your programming task. The compiler is your servant, you are the master. A wise master, however, is one that is intimately familiar with the capabilities of his various servants.

Within this context, undefined behaviour serves to indicate potential danger and the possibility of harm; not to imply imminent, irreversible doom, or the end of the world as we know it. It simply means that we -- ‘meaning the compiler’ -- are not able to make any conjecture about what this thing may be, or represent and for this reason we choose to wash our hands of the matter. We will not be held accountable for any misadventure that may result from the use, or mis-use of this facility.

In effect, it simply says: ‘Beyond this point, cowboy: you are on your own...’

Your professor is seeking to demonstrate the finer nuances to you.

Notice what great care they have taken in crafting their example; and how brittle it still is. By taking the address of a, in

p[0].p0 = &a;

the compiler is coerced into allocating actual storage for the variable, rather than placing it in a register. It being an automatic variable, however, the programmer has no control over where that is assigned, and so unable to make any valid conjecture about what would follow it. Which is why a must be set equal to zero for the code to work as expected.

Merely changing this line:

char a = 0;

to this:

char a = 1;  // or ANY other value than 0

causes the behaviour of the program to become undefined. At minimum, the first answer will now be 1; but the problem is far more sinister.

Now the code is inviting of disaster.

While still perfectly valid and even conforming to the standard, it now is ill-formed and although sure to compile, may fail in execution on various grounds. For now there are multiple problems -- none of which the compiler is able to recognize.

strcpy will start at the address of a, and proceed beyond this to consume -- and transfer -- byte after byte, until it encounters a null.

The p1 pointer has been initialized to a block of exactly 10 bytes.

  • If a happens to be placed at the end of a block and the process has no access to what follows, the very next read -- of p0[1] -- will elicit a segfault. This scenario is unlikely on the x86 architecture, but possible.

  • If the area beyond the address of a is accessible, no read error will occur, but the program still is not saved from misfortune.

  • If a zero byte happens to occur within the ten starting at the address of a, it may still survive, for then strcpy will stop and at least we will not suffer a write violation.

  • If it is not faulted for reading amiss, but no zero byte occurs in this span of 10, strcpy will continue and attempt to write beyond the block allocated by malloc.

    • If this area is not owned by the process, the segfault should immediately be triggered.

    • The still more disastrous -- and subtle --- situation arises when the following block is owned by the process, for then the error cannot be detected, no signal can be raised, and so it may ‘appear’ still to ‘work’, while it actually will be overwriting other data, your allocator’s management structures, or even code (in certain operating environments).

This is why pointer related bugs can be so hard to track. Imagine these lines buried deep within thousands of lines of intricately related code, that someone else has written, and you are directed to delve through.

Nevertheless, the program must still compile, for it remains perfectly valid and standard conformant C.

These kinds of errors, no standard and no compiler can protect the unwary against. I imagine that is exactly what they are intending to teach you.

Paranoid people constantly seek to change the nature of C to dispose of these problematic possibilities and so save us from ourselves; but that is disingenuous. This is the responsibility we are obliged to accept when we choose to pursue the power and obtain the liberty that more direct and comprehensive control of the machine offers us. Promoters and pursuers of perfection in performance will never accept anything less.

Portability and the generality it represents is a fundamentally separate consideration and all that the standard seeks to address:

This document specifies the form and establishes the interpretation of programs expressed in the programming language C. Its purpose is to promote portability, reliability, maintainability, and efficient execution of C language programs on a variety of computing systems.

Which is why it is perfectly proper to keep it distinct from the definition and technical specification of the language itself. Contrary to what many seem to believe generality is antithetical to exceptional and exemplary.

To conclude:

  • Examining and manipulating pointers themselves is invariably valid and often fruitful. Interpretation of the results, may, or may not be meaningful, but calamity is never invited until the pointer is dereferenced; until an attempt is made to access the address linked to.

Were this not true, programming as we know it -- and love it -- would not have been possible.

Isochronize answered 29/12, 2019 at 2:52 Comment(34)
This answer is unfortunately inherently invalid. You can't reason anything about undefined behaviour. The comparison need not be done at machine level.Ashanti
No, undefined behaviour means that anything that's possible can happen. Literally. It's just as valid that your program will crash (or delete all your files) as it is valid to return a correct or incorrect comparison result.Adhibit
@paxdiablo, We are talking about C here, aren't we? Until one dereferences an invalid pointer, no ill can result from examining the value, and that includes comparing said value to an other.Isochronize
Ghii, actually no. If you look at C11 Annex J and 6.5.8, the act of comparison itself is UB. Dereferencing is a separate issue.Adhibit
No, UB can still be harmful even before a pointer is dereferenced. A compiler is free to completely optimize a function with UB into a single NOP, even though this obviously changes visible behavior.Decry
@Antti Haapala. Really? What would the alternative be? This is C. It is not a scripting language like Python, or bytecode like Java, nor does it even have the compile-time function of C++. It is compiled directly into machine code and executed on the CPU.Isochronize
paxdiablo: the only applicable reference in C11 Annex J is '— Pointers that do not point to the same aggregate or union (nor just beyond the same array object) are compared using relational operators (6.5.8).'. This explicitly confirms my argument, not yours.Isochronize
@GhiiVelte: Some compiler writers have a mentality that suggests that if I'm entitled to assume someone will show up at a party, and I know that they won't be able to make it if a certain road is impassible, I should feel free to ignore any "BRIDGE OUT" signs on that road because if the bridge really were out, that would contradict the assumption that the person would be at the party.Lilla
@supercat. Certainly! However, when the compiler elides the instruction it is guaranteed not to cause a problem directly. UB will result from the interpretation attached and the actions taken in response, not from the act of examining the value of the pointer.Isochronize
@GhiiVelte: Not necessarily. The authors of clang and gcc have expressed a philosophy that if a compiler can determine that some inputs would cause some downstream action to invoke UB, upstream code should ignore the possibility of such actions occurring. For example, given unsigned mul16(unsigned short x, unsigned short y) { return (x*y) & 0xFFFF; }, if gcc can determine that the second iteration of a loop would invoke mul16(32769, 65535); it will exit the loop after the first branch without checking the loop condition.Lilla
@GhiiVelte: I'd be interested to know how often such inferences are actually helpful in non-contrived cases. I can certainly imagine that they would appear to offer big savings in code size if one doesn't care about whether the resulting program behaves usefully, but I'm curious how often they yield more efficient programs that actually work usefully.Lilla
@supercat: Perhaps we are talking at cross-purposes? My intent was only to point out that examining the value of a pointer (as opposed to the value at the address pointed-to), cannot be classed as UB, since if it were it would never be possible to determine whether a pointer is in fact valid, or not. void foo( int *x ) { printf("%p\n",x); } will print garbage if x is uninitialized, but cannot trigger an access violation, or any other misfortune.Isochronize
@Ghii, Annex J (the bit I mentioned) is the list of things that are undefined behaviour, so I'm not sure how that supports your argument :-) 6.5.8 explicitly calls out the comparison as UB. For your comment to supercat, there's no comparison going on when you print a pointer so you're probably right that it will not crash. But that's not what the OP was asking about. 3.4.3 is also a section you should look at: it defines UB as behaviour "for which this International Standard imposes no requirements".Adhibit
printf serves only to side-step supercat's bridge and force the compilation of the statement. void foo( int *x, int *y ) { printf("%d\n", y - x); } likewise must compile on all standard conforming compilers, and run without a hitch. In order to suffer catastrophe, it is necessary to derefence the invalid pointer, eg.: printf("%d\n", *x);Isochronize
@paxdiablo: The Standard makes no attempt to specify everything necessary to make an implementation suitable for any particular purpose. The fact that the Standard does not require that conforming implementations be capable of meaningfully processing a construct does not imply any judgment that failure to do so would not make implementations unsuitable for some, or even most, purposes.Lilla
@GhiiVelte: The way gcc and clang process pointer comparisons can cause other parts of the code to behave nonsensically. For example, given extern int x[],y[], a comparison between y+i and x+10 may result in the compiler replacing an access to y[i] with an access to x[10] and then ignoring the possibility that it might affect the value of y[0].Lilla
@GhiiVelte, you keep stating things that are just plainly wrong, despite that being pointed out to you. Yes, the snippet you posted must compile but your contention that it run without hitch is incorrect. I suggest you actually read the standard, particularly (in this case) C11 6.5.6/9, keeping in mind that the word "shall" indicates a requirementL "When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object".Adhibit
You can examine pointers regardless of their validity, what you can't do reliably is compare or subtract them except under limited conditions.Adhibit
"It is compiled directly into machine code and executed on the CPU" Wrong, wrong, wrong. It may not be as thick an abstraction as C++ is, but it's still an abstraction. Someone has taught you falsehoods about C, and you're clinging onto them very firmly.Carmancarmarthen
Technically you are correct: conventionally it is compiled into assembly language, which then is passed through an assembler to produce the corresponding machine code, and different implementations are possible, but rare. Does harping on this detail add to, or detract from the discussion?Isochronize
@LightnessRacesBY-SA3.0: The authors of the Standard explicitly said that they did not wish to preclude the use of language a "high-level assembler". Not all implementations are required to be suitable for use in such fashion, but the language the Standard was written to describe is far closer to a high level assembler than to the language the clang and gcc optimizers are designed to process.Lilla
@LightnessRacesBY-SA3.0: When the Standard says that there is no difference in emphasis between categorizing an action as UB versus failing to specify it, does that mean that the categorization as UB should take absolute priority, or that the latter statement should mean "not generally meaningful", while allowing for the possibility that other parts of the Standard or an implementation's documentation might describe how a construct will behave in some particular cases even if if the construct isn't described generally?Lilla
@Lilla Doesn't really matter since the semantics of comparing two unrelated pointers isn't described elsewhere.Carmancarmarthen
@LightnessRacesBY-SA3.0: Many platforms define a global transitive ordering of all addresses, and many implementations specify that their pointers are nothing more than platform addresses, and inherit the semantics thereof. The C Standard would allow implementations to target platforms that don't define a global transitive ordering, or to process pointers in ways different from how the underlying platform processes addresses, but on an implementation that specifies pointer semantics mirroring those of a platform that specifies a global ordering, pointers have a specified global ordering.Lilla
@LightnessRacesBY-SA3.0: On a platform where an attempted relational comparison between pointers to objects in different segments might trap, it may be expensive for an implementation to guarantee anything about the effect of such comparisons, and the value of trying to guarantee anything may be limited. The authors of the Standard wanted to avoid putting an unreasonable burden on implementations targeting such platforms. That does not imply that they did not expect that implementations would support a global transitive ordering on platforms where it would make sense to do so.Lilla
@Lilla You're using a lot of long words, but comparing two unrelated pointers has undefined behaviour and honestly that's just the end of it. Yes, it may still work on some platforms (historically it does) unless the implementation makes use of the UB to do strange things around the comparison . Just don't "do UB". Simples!Carmancarmarthen
@paxdiablo: According to the Standard, the term "shall" in a run-time complaint is only applicable to strictly conforming programs. Violation of a run-time constraint means that nothing an implementation does would make it non-conforming, but the authors of the Standard explicitly recognize in the published Rationale document that implementations may usefully extend the semantics of the language by meaningfully processing programs in cases beyond those required by the Standard; the decision of when to do so is a Quality of Implementation issue outside the Standard's jurisdiction.Lilla
@LightnessRacesBY-SA3.0: If one only regards a combination of source text and input data as having defined behavior if there is some circumstance in which the Standard would impose at least some requirements upon how at least one possible implementation would process it, what fraction of such combinations would have defined behavior? The only reason that C is even remotely usable is that implementations process many programs usefully in cases where the Standard would impose no requirements.Lilla
@Lilla Not really; there is no usually need to write programs with undefined behaviour per the standard. This is also one of those times. Not really sure what you're trying to get at, but it's off-topic.Carmancarmarthen
@LightnessRacesBY-SA3.0: Why did the authors of the Standard explicitly say that it was appropriate for implementations to extend the available semantics by defining behaviors in cases the Standard characterized as UB, say that implementations may process UB by behaving "in a documented fashion characteristic of the environment", and say that they did not wish to demean useful programs that happened not to be portable?Lilla
@Lilla So that they can do that thing if they want to? What's the confusion?Carmancarmarthen
@LightnessRacesBY-SA3.0: Would such statements make any sense if the authors of the Standard intended that programmers must avoid at all cost any actions the Standard characterizes as Undefined Behavior? To me, they'd only make sense if the intention was that implementations for platforms that provide such semantics allow programmers to exploit them when practical (with the question of when to support such behaviors being a Quality of Implementation issue best resolved by a "marketplace" of compiler vendors seeking to meet their customers' needs rather than the Committee)Lilla
@Lilla I don't really care what the authors of the standard intended. Writing code with undefined behaviour is really silly, unless you have a very good reason to do it. End of story.Carmancarmarthen
@LightnessRacesBY-SA3.0: So when the Standard says there's no difference in emphasis between characterizing an action as UB versus failing to say anything about it, you view that as implying that the former characterization trumps everything else the Standard or an implementation's documentation might happen to say about it. Got it.Lilla

© 2022 - 2024 — McMap. All rights reserved.