Is memcpy of a pointer the same as assignment?
Asked Answered
P

9

-3

Introduction: This question is part of my collection of C and C++ (and C/C++ common subset) questions regarding the cases where pointers object with strictly identical byte-wise representation are allowed to have different "values", that is, to behave differently for some operation (including to have defined behavior on one object and undefined behavior on the other).

Following another question which caused much confusion, here is question about pointer semantics that will hopefully clear things up:

Is this program valid in all cases? The only interesting part is in the "pa1 == pb" branch.

#include <stdio.h>
#include <string.h>

int main() {
    int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;
    if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
        int *p;
        printf ("pa1 == pb\n"); // interesting part
        memcpy (&p, &pa1, sizeof p); // make a copy of the representation
        memcpy (&pa1, &p, sizeof p); // pa1 is a copy of the bytes of pa1 now
        // and the bytes of pa1 happens to be the bytes of pb 
        *pa1 = 2; // does pa1 legally point to b?
    }
    else {
        printf ("pa1 != pb\n"); // failed experiment, nothing to see
        pa1 = &a[0]; // ensure well defined behavior in printf
    }
    printf ("b = %d *pa1 = %d\n", b, *pa1);
    return 0;
 }

I would like an answer based on standard quotes.

EDIT

By popular demand, here is what I want to know:

  • is a pointer's semantic "value" (its behavior according to the specification) determined only by its numerical value (the numerical address it contains), for a pointer of a given type?
  • if not, it is possible to copy only the physical address contained in a pointer while leaving out the associated semantic?

Here let's say that some one past the end pointer happens to accidentally point to another object; how can I use such one past the end pointer to access the other object?

I have the right to do anything, except use a copy of the address of the other object. (It's a game to understand pointers in C.)

IOW, I try to recycle dirty money just like the mafia. But I recycle a dirty pointer by extracting its value representation. Then it looks like the clean money, I mean pointer. Nobody can tell the difference, no?

Pang answered 17/8, 2015 at 11:1 Comment(6)
What is memcpy( &p, &pa1, sizeof p ) and memcpy( &pa1, &p, sizeof p ) supposed to do, in your own words? Also, you really should add some lines about your intent with the memcmp (see my comment to Sourav Ghosh).Auld
@Auld copy physical value, just like assignment of int; not transfer semanticPang
It would really have helped if you had written some text about what you are trying to prove here, exactly. As far as I understand, you try to 1) construct a pointer that's "illegal" to dereference, under the terms of the standard (*pa1 = &a[0] + 1), then 2) prove that it's the same bit pattern as another, legal pointer (*pb = &b; if (memcmp (&pa1, &pb, sizeof pa1) == 0)), then 3) do dereference the "illegal" pointer "legally" because it's the same bit-pattern as a "legal" pointer (*pa1 = 2). Right? (Your edit ninja'd this comment...)Auld
@Auld I try to extract the bit pattern and value of &b without saying &b. It's a charade.Pang
You said you're trying to use a pointer to one object to access an unrelated object. The whole problem with this is that the compiler optimizations won't see this access and they will assume the second object is unchanged. With pointers the same value you would think the machine code would be the same. But that is not true because the compiler never even bothers to read the object's memory because it KNOWS it could not have changed, since no valid pointer wrote to it.Reneerenegade
@ZanLynx "compiler optimizations won't see this access" I kind of guessed that. The problem here is that what compilers commonly do isn't the definition of the language (C, C++, other languages). The behavior of compilers should be justifiable by the std. The std should be enough to determine which programs have well defined behavior (or rather, which program executions, a program trace, with its interaction with the user, have well defined behavior). Of course the std should be realistic, and not ask for compilers to do silly things or pessimive codegen.Pang
M
7

The question was:

Is this program valid in all cases?

The answer is "no, it is not".


The only interesting part of the program is what happens within the block guarded by the if statement. It is somewhat difficult to guarantee the truthness of the controlling expression, so I've modified it somewhat by moving the variables to global scope. The same question remains: is this program always valid:

#include <stdio.h>
#include <string.h>

static int a[1] = { 2 };
static int b = 1;
static int *pa1 = &a[0] + 1;
static int *pb = &b;

int main(void) {
    if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
        int *p;
        printf ("pa1 == pb\n"); // interesting part
        memcpy (&p, &pa1, sizeof p); // make a copy of the representation
        memcpy (&pa1, &p, sizeof p); // pa1 is a copy of the bytes of pa1 now
        // and the bytes of pa1 happens to be the bytes of pb 
        *pa1 = 2; // does pa1 legally point to b?
    }
}

Now the guarding expression is true on my compiler (of course, by having these have static storage duration, a compiler cannot really prove that they're not modified by something else in the interim...)

The pointer pa1 points to just past the end of the array a, and is a valid pointer, but must not be dereferenced, i.e. *pa1 has undefined behaviour given that value. The case is now made that copying this value to p and back again would make the pointer valid.

The answer is no, this is still not valid, but it is not spelt out very explicitly in the standard itself. The committee response to C standard defect report DR 260 says this:

If two objects have identical bit-pattern representations and their types are the same they may still compare as unequal (for example if one object has an indeterminate value) and if one is an indeterminate value attempting to read such an object invokes undefined behavior. Implementations are permitted to track the origins of a bit-pattern and treat those representing an indeterminate value as distinct from those representing a determined value. They may also treat pointers based on different origins as distinct even though they are bitwise identical.

I.e. you cannot even draw the conclusion that if pa1 and pb are pointers of same type and memcmp (&pa1, &pb, sizeof pa1) == 0 is true that it is also necessary pa1 == pb, let alone that copying the bit pattern of undereferenceable pointer pa1 to another object and back again would make pa1 valid.

The response continues:

Note that using assignment or bitwise copying via memcpy or memmove of a determinate value makes the destination acquire the same determinate value.

i.e. it confirms that memcpy (&p, &pa1, sizeof p); will cause p to acquire the same value as pa1, which it didn't have before.


This is not just a theoretical problem - compilers are known to track pointer provenance. For example the GCC manual states that

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

i.e. were the program written as:

int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;
if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
    uintptr_t tmp = (uintptr_t)&a[0]; // pointer to a[0]
    tmp += sizeof (a[0]); // value of address to a[1]
    pa1 = (int *)tmp;
    *pa1 = 2; // pa1 still would have the bit pattern of pb,
              // hold a valid pointer just past the end of array a,
              // but not legally point to pb
}

the GCC manual points out that this is explicitly not legal.

Maryjanemaryjo answered 11/12, 2018 at 4:44 Comment(8)
What’s not legal in this example is to “use integer arithmetic to avoid the undefined behavior of pointer arithmetic,” i.e. the line tmp += sizeof (a[0]); followed by converting tmp back into a pointer. This MCVE just compares and copies the object representations without modifying them, which is legal.Excitor
I also think that “which it didn’t have before” is an unsound inference. Does the Committee Response you quote mean that memmove( &p, &p, sizeof(p) ); causes p to acquire the value of p, which it didn’t have before?Excitor
tmp += sizeof (a[0]); and converting tmp to pointer are both well-defined here according to how GCC behaves on x86. But dereferencing it is not OK because GCC tracks the provenance of pointer values stored in objects of type uintptr_t.Scrotum
@Excitor I believe you're reading too much into the part you quoted. In memcpy (&p, &pa1, sizeof p); p simply did not have the value of pa1 before in this particular piece of code.Ahouh
@AnttiHaapala I think we agree that the standard doesn’t say that modifying a uintptr_t value derived from a pointer, then converting it back, yields a valid pointer? That section of the manual does say that copying the value around unmodified ans converting it back is legal. it also applies to architectures other than x86.Excitor
@IljaEverilä Possibly. That makes more sense than my reading.Excitor
That answer validates my question. The intent of the standard committee is to follow intuition and the expectations of compiler writers. Of course the formalisation of that intuition is quite problematic. I take issue with "C Standard does not prohibit an implementation from tracking the provenance of the bit-pattern representing a value." (DR 260) without a) a definition of "provenance" (a difficult notion) and b) a justification that existing std text indeed allows that. Either way, there is no doubt which way the committee wants to go.Pang
@Pang indeed, there is a proposal for it for a later C revisionScrotum
B
5

A pointer is simply an unsigned integer whose value is the address of some location in memory. Overwriting the contents of a pointer variable is no different than overwriting the contents of normal int variable.

So yes, doing e.g. memcpy (&p, &pa1, sizeof p) is equivalent of the assignment p = pa1, but might be less efficient.


Lets try it a bit differently instead:

You have pa1 which points to some object (or rather, one beyond some object), then you have the pointer &pa1 which points to the variable pa1 (i.e. the where the variable pa1 is located in memory).

Graphically it would look something like this:

+------+     +-----+     +-------+
| &pa1 | --> | pa1 | --> | &a[1] |
+------+     +-----+     +-------+

[Note: &a[0] + 1 is the same as &a[1]]

Belvia answered 17/8, 2015 at 11:8 Comment(11)
"but less efficient" – most compilers treat these functions as builtins, so it's possible that the same code is emitted in both cases.Kickoff
OK, so are you there is no undefined behavior when dereferencing a past the end pointer that happens by chance to point to some object?Pang
@Pang But there is no dereferencing of either pointer. If you had written e.g. pa1 or p (without the address-operator) then yes you would have undefined behavior.Belvia
But then I do *pa1 = 2;Pang
@Pang Well that will be leading to UB. The memcpy or memcpy calls will not. Also, there's no guarantee that later local variables will be higher or lower on the stack. There's actually any guarantee that there will be a stack.Belvia
But pa1 contains the address of b at this point. (At least the bit pattern of both are the same.)Pang
@Pang It's still undefined behavior. Dereferencing out of bounds leads to undefined behavior, there's no way around it. That it happens to work and be safe using a specific compiler on a specific system doesn't make it any less undefined.Belvia
Let us continue this discussion in chat.Pang
"A pointer is simply an unsigned integer" Is this simplifying the standardese, or explaining the implementation? Although it doesn't matter here I guess, pointers are somewhat magic in the standard, e.g. 6.2.4p2 "The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime."Robbery
@Robbery More implementation-wise. At least on PC-like systems.Belvia
The Standard in fact directly says that that comparing pointers with == might not have the same result as comparing their object representations with memcmp.Excitor
D
5

Undefined behaviour: A play in n parts.

Compiler1 and Compiler2 enter, stage right.

int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;

[Compiler1] Hello, a, pa1, b, pb. How very nice to make your acquaintance. Now you just sit right there, we're going to look through the rest of the code to see if we can allocate you some nice stack space.

Compiler1 looks through the rest of the code, frowning occasionally and making some markings on the paper. Compiler2 picks his nose and stares out the window.

[Compiler1] Well, I'm afraid, b, that I have decided to optimize you out. I simply couldn't detect somewhere which modified your memory. Maybe your programmer did some tricks with Undefined Behaviour to work around this, but I'm allowed to assume that there is no such UB present. I'm sorry.

Exit b, pursued by a bear.

[Compiler2] Wait! Hold on a second there, b. I couldn't be bothered optimizing this code, so I've decided to give you a nice cosy space over there on the stack.

b jumps in glee, but is murdered by nasal demons as soon as he is modified through undefined behaviour.

[Narrator] Thus ends the sad, sad tale of variable b. The moral of this story is that one can never rely on undefined behaviour.

Derward answered 17/8, 2015 at 12:40 Comment(3)
Well, he takes b's address which is subsequently fed to functions, or this would indeed be a clear-cut case. ;-)Auld
I am not convinced there is UB here!Pang
The question requested citations from the standard. Which behavior here is undefined, specifically? For instance, comparing the two pointers would be UB, and copying between memory locations that overlap would be UB, and aliasing a pointer with a pointer of the wrong type would be UB. But this MCVE does none of those things.Excitor
O
2
   *pa1 = 2; // does pa1 legally point to b?

No, that pa1 points to b is purely coincidental. Note that a program must conform at compilation, that the pointer happens to have the same value in runtime doesn't matter.

Nobody can tell the difference, no?

The compiler optimizer can tell the difference! The compiler optimizer can see (through static analysis of the code) that b and is never accessed through a "legal" pointer, so it assumes is safe to keep b in a register. This decision is made at compilation.

Bottom line:

"Legal" pointers are pointers obtained from a legal pointer by assignment or by copying the memory. You can also obtain a "legal" pointer using pointer arithmetic, provided the resulting pointer is within the legal range of the array/memory block it was assigned/copied from. If the result of pointer arithmetic happens to point to a valid address in another memory block, the use of such a pointer is still UB.

Also note that pointer comparison is valid only if the two pointers are pointing to same array/memory block.

EDIT:

Where did it go wrong?

The standard states that accessing an array out-of-bounds results in undefined behaviour. You took the address of an out-of-bounds by one pointer, copied it and then dereferenced it.

The standard states that an out-of-bounds pointer may compare equal to a pointer to another object that happens to be placed adjacent in memory (6.5.9 pt 6). However, even though they compare equal, semantically they don't point to the same object.

In your case, you don't compare the pointers, you compare their bit patterns. Doesn't matter. The pointer pa1 is still considered to be a pointer to one past the end of an array.

Note that if you replace memcpy with some function you write yourself, the compiler won't know what value pa1 has but it can still statically determine that it cannot contain a "legally" obtained copy of &b.

Thus, the compiler optimizer is allowed to optimize the read/store of b in this case.

is a pointer's semantic "value" (its behavior according to the specification) determined only by its numerical value (the numerical address it contains), for a pointer of a given type?

No. The standard infers that valid pointers can only be obtained from objects using the address-of operator (&), by copying another valid pointer or by in/decreasing a pointer inside the bounds of an array. As a special case, pointers one past the end of an array are valid but they must not be dereferenced. This might seem a bit strict but without it the possibility to optimize would be limited.

if not, it is possible to copy only the physical address contained in a pointer while leaving out the associated semantic?

No, at least not in a way that is portable to any platform. In many implementations the pointer value is just the address. The semantics is in the generated code.

Oleta answered 17/8, 2015 at 12:26 Comment(23)
"Also note that pointer comparison is valid only if the two pointers are pointing to same array/memory block" what do you mean?Pang
@curiousguy: int x = 1, y = 2, *px = &x, *py = &y; -- since x and y are not in the same array, if ( px < py ) does not yield a defined result.Auld
If you compare a[i] with &b, the result is not defined by the standard. The standard allows for magic, so that even if they compare equal, they don't have to point to the same memory.Margaretemargaretha
If an implementation defines the details of its type-layout rules (as most do), and if those rules imply that a one-past pointer for one object will hold the address of another, in what sense is the fact that the pointer holds the second object's address "coincidental"? The Standard may not require that an implementation define its layout rules such that the addresses would be equal, but if an implementation does describe its rules thusly, the fact that the addresses match is hardly coincidence.Ululant
Unfortunately, the Standard fails to make a distinction between constructs that are defined only on implementations which document their internals well enough to imply how they must behave, versus those which should be considered undefined even on implementations whose internals are documented enough to describe how they would behave absent compiler optimizations.Ululant
@Ululant Even if the standard leaves something as UB it is perfectly fine for an implementation to provide implementation-defined behaviour. The implementation can also state that the behaviour depends on the optimisation level. And a program can depend on such platform-specific behaviour, but comes with the cost of reduced portability.Margaretemargaretha
@KlasLindbäck: The problem is that very few specifications for anything define more than a tiny fraction of the aspects upon which consumers rely. The only way specs can be useful is if consumers are entitled to assume that if there's no plausible reason that something might not to be/do X, it is/does X. If in the future a reason emerges that an object shouldn't do X, the proper resolution is to offer consumers a way of using an object that's just as good as having it do X, and then deprecate the old approach using X once the alternative is established.Ululant
@KlasLindbäck: If as a cost-saving measure, shirt vendor started selling "long-sleeved shirts" which only had one sleeve, should consumers who want two-sleeved shirts but didn't specify that be out of luck, or should the consumers be entitled to a refund on the basis that they ordered "normal" shirts, and it has long been recognized that normal shirts for humans have two sleeves.Ululant
@Ululant The vendors that sell you longsleeved shirts with only one sleeve will go out of business. The same is true for c compilers. For example. if you read the documentation for gcc you will find that it not only allows you to choose sleeve length (optimization level), it also allows you to turn on warnings for code that might break when you enable a higher optimization level.Margaretemargaretha
@KlasLindbäck: From what I can tell, the documentation for gcc doesn't actually promise that settings like -fno-strict-aliasing or -fno-strict-overflow will actually yield the predictable behaviors they would seem to imply, and the authors of gcc have said they don't feel bound by any precedent that isn't explicitly documented. BTW, as of 6.2 gcc has aliasing bugs that make it botch code which is unambiguously defined by the Standard.Ululant
"The compiler optimizer can see" Yes, I know they often do that in practice. But whether they can legally do so is the question. So far, nobody could point to the specific statement that violates the standard.Pang
@Pang I've expanded on my answer. The strict aliasing concept is inferred in the standard so there is no single line or paragraph in the standard to refer to.Margaretemargaretha
@Pang Bo Persson's answer implies that using memcpy to copy a pointer doesn't necessarily result in a valid pointer.Margaretemargaretha
"valid pointers can only be obtained" That limited list implies that a cast doesn't produce a "valid pointer". It can't be right!Pang
@Pang A cast doesn't create a valid pointer. It can be used in some cases to transform a valid pointer to a valid pointer of another type. the source must still be a valid pointer.Margaretemargaretha
@KlasLindbäck "A cast doesn't create a valid pointer" It certainly does. You want to say that a cast must be from a valid pointer, and creates another pointer with the same numerical value. There is no new numerical value, like a byte wise copy from a valid pointer to another pointer object.Pang
@KlasLindbäck "The standard allows for magic," I get that this is the intent of the standard that magic is allowed. What I don't get is that bitwise copy is still possible on magical things. Accessing the representation breaks the magic (that is the whole points of these questions). Unless memcpy is magic, but then I will access the bytes, even the bits, until magic is broken. Unless there is a magic concept of copying the representation, but then copying is not well defined.Pang
@Pang The standard doesn't require that memcpy will correctly copy a pointer. As you can see in Bo Persson's answer, memcpy can break the magic in some cases on some platforms. If you aim for perfect portability you need to account for that. On the most common platforms, pointers are not magic and store the complete address and nothing else. On those platforms memcpy will correctly copy a pointer.Margaretemargaretha
@KlasLindbäck 1) When will memcpy not copy the pointer correctly?2) Is GNU/linux with GCC not a common platform?Pang
@Pang Bo Persson has given an example of a platform where memcpy would not copy all pointers correctly, 16 bit DOS/Windows. On gcc/Linux, pointers store the entire address and nothing else, so memcpy of a pointer does the same as assignment. Only difference is you don't need a cast if the pointers have different types when using memcpy.Margaretemargaretha
@KlasLindbäck I guess memcpy would work on "DOS/Windows" (in old 16 bits mode) on pointers into the same segment.Pang
@Pang Yes.Margaretemargaretha
@KlasLindbäck An implementation where memcpy() did not copy plain “pointers” correctly would violate the section of the C11 standard that enumerates, “Two pointers compare equal if and only if [....]” or the one that says any objects with the same object representation compare equal. An implementation with near and far pointers would need to make far the default, to conform to C11. Of course, the historical compilers long predate the 2011 Standard and cannot be expected to conform to it.Excitor
S
2

You have proven that it seems to work on a specific implementation. That doesn't mean that it works in general. In fact, it is undefined behavior where one possible outcome is exactly "seems to work".

If, we go back to the MS-DOS era we had near pointers (relative to a specific segment) and far pointers (containing both a segment and an offset).

Large arrays were often allocated in their own segment and only the offset was used as a pointer. The compiler already knew what segment contained a specific array, so it could combine the pointer with the proper segment register.

In that case, you could have two pointers with the same bit-pattern, where one pointer pointed into an array segment (pa) and another pointer pointed into the stack segment (pb). The pointers compared equal, but still pointed to different things.

To make it worse, far pointers with a segment:offset pair could be formed with overlapping segments so that different bit-patterns still pointed to the same physical memory address. For example 0100:0210 is the same address as 0120:0010.

The C and C++ languages are designed so that this can work. That's why we have rules that comparing pointers only works (gives a total order) within the same array, and that pointers might not point to the same thing, even if they contain the same bit-pattern.

Struve answered 17/8, 2015 at 12:27 Comment(2)
So the compiler would have to track which pointers pointed to what array?Pang
@curiousguy: In the small and medium memory models, pointers with a far qualifier took four bytes to store and could access anything; those without a qualifier took two bytes to store and could access all objects created within the code via any means other than special "far malloc" calls; accesses via near pointers were often 2-3 times as fast as accesses via "far" pointers; while having to specify far pointers in some places was a bit of a nuisance, the performance benefits from using small or medium model instead of large model were often pretty huge.Ululant
U
2

Prior to C99, implementations were expected to behave as though the value of every variable of any type was stored a sequence of unsigned char values; if the underlying representations of two variables of the same type were examined and found to be equal, that would imply that unless Undefined Behavior had already occurred, their values would generally be equal and interchangeable. There was a little bit of ambiguity in a couple places, e.g. given

char *p,*q;
p = malloc(1);
free(p);
q = malloc(1);
if (!memcmp(&p, &q, sizeof p))
  p[0] = 1;

every version of C has made abundantly clear that q may or may not equal to p, and if q isn't equal to p code should expect that anything might happen when p[0] is written. While the C89 Standard does not explicitly say that an implementation may only have p compare bitwise equal to q if a write to p would be equivalent to a write to q, such behavior would generally be implied by the model of variables being fully encapsulated in sequences of unsigned char values.

C99 added a number of situations where variables may compare bitwise equal but not be equivalent. Consider, for example:

extern int doSomething(char *p1, char *p2);
int act1(char * restrict p1, char * restrict p2)
  { return doSomething(p1,p2); }
int act2(char * restrict p)
  { return doSomething(p,p); }
int x[4];
int act3a(void) { return act1(x,x); }
int act3b(void) { return act2(x); }
int act3c(void) { return doSomething(x,x); }

Calling act3a, act3b, or act3c will cause doSomething() to be invoked with two pointers that compare equal to x, but if invoked through act3a, any element of x which is written within doSomething must be accessed exclusively using x, exclusively using p1, or exclusively using p2. If invoked through act3b, the method would gain the freedom to write elements using p1 and access them via p2 or vice versa. If accessed through act3c, the method could use p1, p2, and x interchangeably. Nothing in the binary representations of p1 or p2 would indicate whether they could be used interchangeably with x, but a compiler would be allowed to in-line expand doSomething within act1 and act2 and have the behavior of those expansions vary according to what pointer accesses were allowed and forbidden.

Ululant answered 17/8, 2015 at 16:19 Comment(5)
n1548 allows for one and only one situation in which objects with the same object representation may legally be compared at all and do not compare equal: “Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.” It would be UB to compare these two pointers at all, which is what I think you meant, but the MCVE in the question doesn’t do that. It only compares the object representations.Excitor
@Davislor: In the first example above, the comparison of the bit representations is not UB, and if the representations of p and q don't match there will be no UB. If the comparisons do match, however, code would attempt to access p[0], which would invoke UB despite the fact that p's representation would have been observed to match q's.Ululant
Since the question requested direct citations, could you expand on what kind of UB that would invoke? Thanks.Excitor
@Davislor: N1570 6.2,4p2: "The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime." and Annex J2: "The value of a pointer to an object whose lifetime has ended is used (6.2.4).". Once the free() is executed, the value of p becomes indeterminate. Nothing in the Standard would imply any means via which a comparison could cause the value of p to cease being indeterminate.Ululant
Ah! I apologize for the miscommunication: I was thinking of the MCVE in the OP, and you were thinking of the example in your answer. Yes, that is UB. I should have noticed that the variable names were different.Excitor
E
1

No. We cannot even infer that either branch of this code works given any particular result of memcmp(). The object representations that you compare with memcmp() might be different even if the pointers would be equivalent, and the pointers might be different even if the object representations match. (I’ve changed my mind about this since I originally posted.)

You try to compare an address one-past-the-end of an array with the address of an object outside the array. The Standard (§6.5.8.5 of draft n1548, emphasis added) has this to say:

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

It repeats this warning that the result of comparing the pointers is undefined, in appendix J.

Also undefined behavior:

An object which has been modified is accessed through a restrict qualified pointer to a const-qualified type, or through a restrict-qualified pointer and another pointer that are not both based on the same object

However, none of the pointers in your program are restrict-qualified. Neither do you do illegal pointer arithmetic.

You try to get around this undefined behavior by using memcmp() instead. The relevant part of the specification (§7.23.4.1) says:

The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2.

So, memcmp() compares the bits of the object representations. Already, the bits of pa1 and pb will be the same on some implementations, but not others.

§6.2.6.1 of the Standard makes the following guarantee:

Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

What does it mean for pointer values to compare equal? §6.5.9.6 tells us:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

That last clause, I think, is the clincher. Not only can two pointers that compare equal have different object representations, but two pointers with the same object representation might not be equivalent if one of them is a one-past-the-end pointer like &a[0]+1 and another is a pointer to an object outside the array, like &b. Which is exactly the case here.

Excitor answered 11/12, 2018 at 23:52 Comment(15)
There is one thing in the last paragraph that has been proven false over and over again. memcmp() == 0 implies only bit-pattern equivalence, but does neither imply value equivalence, nor is required for it. See the committee response to DR 260: "They may also treat pointers based on different origins as distinct even though they are bitwise identical."Scrotum
After re-reading, I’ve changed my mind. Since a one-past-the-end pointer might compare equal to a different object, but not necessarily alias it, a conforming compiler could cause this code to fail even if the branch is taken.Excitor
@AnttiHaapala I've removed the paragraph you objected to and extensively revised.Excitor
@Davislor: The Standard deliberately refrains from trying to forbid compilers that are specialized for narrow purposes from behaving in ways that make them unsuitable for most others. It consequently makes no effort to forbid compilers from behaving in ways that would make them unsuitable for any purpose whatsoever. Instead, it expects compilers writers to make a bona fide effort to uphold the Spirit of C described in the charter and rationale documents, including "Don't prevent the programmer from doing what needs to be done".Ululant
@Davislor: The same problem alluded to here with memcpy also exists with round-trip conversions through uintptr_t. The Standard guarantees that pointers that are converted through uintptr_t will compare equal to the original, but doesn't mandate that they'll be usable for any particular purpose because they expected that implementations would allow such pointers to be used in cases that make sense, without having to be ordered to do so.Ululant
@Ululant As we have discussed before, the Standard does guarantee that the pointers compare equal, which is one purpose. §6.5.9.6 additionally enumerates the only cases in which pointers are allowed to compare equal. This question discusses the special case enumerated there. In all other cases, you are overstating things. For example, a pointer to an element in the middle of an array cannot possibly, even under the most deliberately-sadistic gotcha implementation, be subject to that special case, as it is neither one past the end of nor at the beginning of any array.Excitor
@Ululant Agreed that round-trip casts present the same issue.Excitor
"The object representations that you compare with memcmp() might be different even if the pointers would be equivalent" Indeed that's a given, and the reason why the question explicitly states "The only interesting part is in the "pa1 == pb" branch.". The question is only meaningful if they have the same representation.Pang
@Pang I see the code fragment has been heavily modified, but the answer stands: we cannot guarantee that the branch is taken, not even on a conforming compiler where the pointers would compare equal. So the rewrite does not guarantee truthiness of the guard expression. Furthermore, even in the case where the branch is taken, we cannot conclude that pa1 points to b. All we’re guaranteed, even then, is that the pointers would compare equal, but a one-past-the-end pointer and a pointer to a different array might compare equal and not be equivalent.Excitor
@Pang That said, any sane implementation would treat two pointers that compare equal as aliasing the same object.Excitor
@Excitor "but a one-past-the-end pointer and a pointer to a different array might compare equal and not be equivalent" I have no problem with that. The issue is whether pointer equality is (according to spec) or should be the same as math equality.Pang
@Pang “values that compare equal may have different object representations.”Excitor
@Pang So, the implication does not work in either direction: two pointers might compare equal but have different object representations, and also, two pointers might have the same object representation but might not compare equal.Excitor
@Excitor "two STUFF might have the same object representation but might not compare equal" Is there a proper name for that stuff? Which other types have this property?Pang
@Pang That’s a good question. The text explicitly gives one other case: NaNs. I can immediately think of one other (an implementation might support negative zero, and it is “unspecified” when negative zero converts to positive zero). Another corner case: a pointer might have a different object representation when converted to void*, which might cause bugs if code assumes the object representations of p and (void*)p are identical. These cannot happen on a mainstream implementation, though. I think some compilers track the provenance of integral values derived from pointers.Excitor
N
0

I say no, without resorting to the UB tarpit. From the following code:

extern int f(int x[3], int y[4]);

....
    int   a[7];
    return f(a, a) + f(a+4, a+3);
...

The C standard should not prevent me from writing a compiler which performs bounds checking; there are several available. A bounds checking compiler would have to fatten the pointers by augmenting them with bounds information (*). So when we get to f():

....
    if (x == y) {
....

F() would be interested in the C notion of equality, that is do they point at the same location, not do they have identical types. If you aren’t happy with this, suppose f() called g(int *s, int *t), and it contained a similar test. The compiler would perform the comparison without comparing the fat.

The pointer size sizeof(int *), would have to include the fat, so memcmp of two pointers would compare it as well, thus providing a different result from the compare.

  • = Yes, you could store such info in a dynamic associative array; which could result in the program aborting because of resource shortfalls, and may introduce tracking problems with memcpy, alloc & free.

PS: should we introduce a new tag for navel gazing?

Necrophobia answered 13/12, 2018 at 5:21 Comment(5)
PSS: there is no requirement that int a[1], b need occupy contiguous memory locations; except for within a structure, the compiler is under no contract to layout independent variables in any way.Necrophobia
int a[1]. b need not be contiguous in a struct, just increasing in addresses. Arbitrary number of padding bytes is allowed.Scrotum
not arbitrary. since a[1] is aligned to int, which is the size of a[1], then b is contiguous in a world of ints. nitpicking, I agree, but arbitrary it is not. If you use some alignment attribute, it can become arbitrary, but that isn’t C, it is a language of your own design that looks like C.Necrophobia
it is true in practice on most platforms, which does not mean that it is intrinsically true. For example a compiler could insert a canary after a[1] to ensure that it is not overwritten, and track the values it finds there... the standard does not forbid, just like you said :DScrotum
1) I agree that bound checking is useful and allowing such implementations is desirable. 2) Storing information about C objects in a global association map does not seem compatible with allowing objects to be copied via memcpy, or a user re-implementation of that function via unsigned char *.Pang
M
0

The question, as I understand it, is:

Is memcpy of a pointer the same as assignment?

And my answer would be, yes.

memcpy is basically an optimized assignment for variable length data that has no memory alignment requirements. It's pretty much the same as:

void slow_memcpy(void * target, void * src, int len) {
  char * t = target;
  char * s = src;
  for (int i = 0; i < len; ++i)
  {
    t[i] = s[i];
  }
}

is a pointer's semantic "value" (its behavior according to the specification) determined only by its numerical value (the numerical address it contains), for a pointer of a given type?

Yes. There are no hidden data fields is C, so the pointer's behavior is totally dependant on it's numerical data content.

However, pointer arithmetics is resolved by the compiler and depends on the pointer's type.

A char * str pointer arithmetics will be using char units (i.e., str[1] is one char away from str[0]), while an int * p_num pointer arithmetics will be using int units (i.e., p_num[1] is one int away from p_num[0]).

Are two pointers with identical bit patterns allowed to have different behavior? (edit)

Yes and no.

They point to the same location in the memory and in this sense they are identical.

However, pointer resolution might depend on the pointer's type.

For example, by dereferencing a uint8_t *, only 8 bits are read from the memory (usually). However, when dereferencing a uint64_t *, 64 bits are read from the memory address.

Another difference is pointer arithmetics, as described above.

However, when using functions such as memcpy or memcmp, than the pointers will behave the same.


So why does everybody say "No"?

Well, that's because the code in your example doesn't reflect the question in the title. The code’s behavior is undefined, as clearly explained by the many answers.

(edit):

The issues with the code have little to do with the actual question.

Consider, for example, the following line:

int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;

In this case, pa points to a[1], which is out of bounds.

This pretty much throws the code into undefined behavior territory, which distracted many answers away from the actual question.

Margueritamarguerite answered 16/12, 2018 at 17:14 Comment(10)
@Pang I updated my answer to address the question about the code (or the issues people might have found with it) and to add more details about the different behaviors that identical pointers might possess... but for all intents a purposes, the value of a pointer is it's numerical value, any difference in behavior is compiler controlled.Margueritamarguerite
@Pang , it's a basic idiom used to calculate the out of bounds boundary... but, you're testing it's value for a different purposes, in a way that is compiler and systems dependent... as explained by others in their answers (I'm just pointing out that this distracted them from the original question, don't blame the messenger).Margueritamarguerite
When are the pointers identical? The answer isn’t language specific, it’s system and compiler specific (or, in C language terms, the behavior wasn’t defined by the C specifications, hence, it’s undefined behavior). The answers that ignore the original question focus on this point which is clearly besides the point.Margueritamarguerite
The pointers have identical representations when they do. Then the comparison of their representation (via memcmp) indicates they have identical representation. Are you saying that such test is disallowed in C?Pang
@Pang , it’s not disallowed, but it’s results are undefined, resulting in code that acts differently on different systems, compilers or different optimization levels (with the same compiler). This is considered to be undesirable in code. For example, one compiler might optimize away the pointers and leave only a and b on the stack. One system might place b before a in the memory. Suddenly, the code is unpredictable.Margueritamarguerite
People write code that branches on stuff that's implementation specific all the time. It's often hard to avoid, and impossible to avoid while writing efficient code; what matters is that the code still runs correctly in all cases.Pang
@Pang , I’m not arguing with you about your code. I was just explaining where other answers might have gone off topic. I’m pretty sure your argument isn’t with me.Margueritamarguerite
The issue isn't whether any memory layout and distance between distinct unrelated objects is guaranteed: it's NOT. Definitely. The issue is whether two pointers to the same byte, of the same type, can be semantically different.Pang
Are you saying that doing byte wise comparison on any object which isn't a char array, in C, makes the behavior of the program undefined? Or just non deterministic?Pang
@Pang , I feel that you are deeply misunderstanding what I wrote. All I said was that, in practice, the bit value of pointers defines their content, so that pointers with the same bit value will point to the same location in the memory. I'm not sure why my answer appears unclear, but I'm sorry if that's the case.Margueritamarguerite

© 2022 - 2024 — McMap. All rights reserved.