Can you memcpy between non-overlapping regions of the same object?
Asked Answered
S

0

9

C17 says the following about memcpy [7.24.2.1p2]:

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

The common interpretation is that you may not copy overlapping regions of memory. But that is not quite the same thing, since it is possible to have non-overlapping regions that are part of the same object.

Imagine a typical implementation where sizeof(unsigned int) == 4 and sizeof(unsigned short) == 2. For simplicity, suppose further that there are no trap representations, and that alignof(unsigned short) <= alignof(unsigned int). Consider:

unsigned int x = 0xdeadbeef;
unsigned short *p = (unsigned short *)&x;
memcpy(&x, p+1, 2);

Interpretation #1: I copied between x and x, and the object x of course overlaps itself, so I caused UB.

Interpretation #2: Since the <string.h> functions manipulate "objects treated as arrays of character type" [7.24.1p1], I am effectively treating x as an array unsigned char[4], and I just copied elements 2 and 3 of that array to elements 0 and 1. Those unsigned char objects do not overlap in any way, so I have not caused UB. I get the same effect as if I did unsigned char *q = (unsigned char *)&x; q[0] = q[2]; q[1] = q[3];. (Of course the resulting value of x will be implementation-defined and could be 0xdeaddead or 0xbeefbeef or something else.)

Which interpretation is correct (or none of them)?

Does it make any difference if I instead write memcpy(p, p+1, 2), in which case I am arguably copying between the non-overlapping objects p[1] and p[0]?


In case there is something wrong with using unsigned short in the example above, then consider instead

unsigned int x = 0xdeadbeef;
unsigned char *p = (unsigned char *)&x;
memcpy(&x, p+2, 2);
// or
memcpy(p, p+2, 2);

to which all the same arguments should apply.


(This question is following up on a comment from ixSci on another question.)

Strategic answered 24/5, 2022 at 2:22 Comment(10)
Does it make any difference if the definition of object is "region of memory"?Bamboozle
Nate, I suspect it remains UB (#1) as 4-byte x may "live" in a memory region where sub 4-byte write involves a read-modify-write and the state of the other bytes are not assumable to be stable or accessible to read from. This is a fairly esoteric memory model, so I doubt memcpy(&x, p+1, 2); will be a problem on common architectures. Interesting question.Concomitant
I'm no expert but just reading the Standard at face value, it seems to me like the "objects" referred to are the original objects that truly live at the memory location, not the chars that they consist of. What makes me think this is the deliberate wording "objects treated as arrays of character type" and "copies n characters from the object pointed to by s2 into the object pointed to by s1". It could've easily replaced "object" with "char array" if it just meant copying plain char arrays.Benildas
Also, I'm not sure if this is orthogonal to the issue at hand, but is the cast from unsigned* to unsigned short* allowed by the Standard? I know C++ would generally disallow that, not sure about C though.Benildas
@MCΔT: The cast should be fine [6.3.2.3p7], provided that alignof(unsigned short) <= alignof(unsigned int), which I guess I should add as an assumption but would be true on any halfway reasonable implementation. Dereferencing p would violate strict aliasing but I didn't do that. But if worried you can change unsigned short to unsigned char, adjust sizes accordingly, and the basic issue remains.Strategic
I think this question is better to be requalified to be about C++ because it might be easier to answer it in terms of C++ which has explicit object model, lifetime etc. I also think that this is one of those examples where both C & C++ tags are appropriate because C++ inherits memcpy from C and C++ has a stricter Standard wording.Cryptonymous
Well, personally I'm more interested in the answer for C. You could post a new question for C++, or wait to see if this gets an answer and then check whether its logic would apply to C++ too.Strategic
Dup of Question 1 from open-std.org/jtc1/sc22/wg14/www/docs/dr_042.htmlRanders
@LanguageLawyer: So I guess that argues for interpretation #2? With a slightly different reasoning, that the two non-overlapping objects in question are essentially of type unsigned char[2].Strategic
@LanguageLawyer after reading that I'm not sure the writers of the Standard could have written their intent in a more confusing manner! "an object is a region of data storage ... composed of contiguous sequences of one or more bytes", thus apparently any contiguous sequence of bytes is an object? That comes as a surprise to me...Benildas

© 2022 - 2024 — McMap. All rights reserved.