C17 says the following about memcpy
[7.24.2.1p2]:
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
The common interpretation is that you may not copy overlapping regions of memory. But that is not quite the same thing, since it is possible to have non-overlapping regions that are part of the same object.
Imagine a typical implementation where sizeof(unsigned int) == 4
and sizeof(unsigned short) == 2
. For simplicity, suppose further that there are no trap representations, and that alignof(unsigned short) <= alignof(unsigned int)
. Consider:
unsigned int x = 0xdeadbeef;
unsigned short *p = (unsigned short *)&x;
memcpy(&x, p+1, 2);
Interpretation #1: I copied between x
and x
, and the object x
of course overlaps itself, so I caused UB.
Interpretation #2: Since the <string.h>
functions manipulate "objects treated as arrays of character type" [7.24.1p1], I am effectively treating x
as an array unsigned char[4]
, and I just copied elements 2 and 3 of that array to elements 0 and 1. Those unsigned char
objects do not overlap in any way, so I have not caused UB. I get the same effect as if I did unsigned char *q = (unsigned char *)&x; q[0] = q[2]; q[1] = q[3];
. (Of course the resulting value of x
will be implementation-defined and could be 0xdeaddead
or 0xbeefbeef
or something else.)
Which interpretation is correct (or none of them)?
Does it make any difference if I instead write memcpy(p, p+1, 2)
, in which case I am arguably copying between the non-overlapping objects p[1]
and p[0]
?
In case there is something wrong with using unsigned short
in the example above, then consider instead
unsigned int x = 0xdeadbeef;
unsigned char *p = (unsigned char *)&x;
memcpy(&x, p+2, 2);
// or
memcpy(p, p+2, 2);
to which all the same arguments should apply.
(This question is following up on a comment from ixSci on another question.)
object
is "region of memory"? – Bamboozlex
may "live" in a memory region where sub 4-byte write involves a read-modify-write and the state of the other bytes are not assumable to be stable or accessible to read from. This is a fairly esoteric memory model, so I doubtmemcpy(&x, p+1, 2);
will be a problem on common architectures. Interesting question. – Concomitantchar
s that they consist of. What makes me think this is the deliberate wording "objects treated as arrays of character type" and "copies n characters from the object pointed to by s2 into the object pointed to by s1". It could've easily replaced "object" with "char array" if it just meant copying plainchar
arrays. – Benildasunsigned*
tounsigned short*
allowed by the Standard? I know C++ would generally disallow that, not sure about C though. – Benildasalignof(unsigned short) <= alignof(unsigned int)
, which I guess I should add as an assumption but would be true on any halfway reasonable implementation. Dereferencingp
would violate strict aliasing but I didn't do that. But if worried you can changeunsigned short
tounsigned char
, adjust sizes accordingly, and the basic issue remains. – Strategicunsigned char[2]
. – Strategic