How to implement memmove in standard C without an intermediate copy?
Asked Answered
C

5

38

From the man page on my system:

void *memmove(void *dst, const void *src, size_t len);

DESCRIPTION
The memmove() function copies len bytes from string src to string dst.
The two strings may overlap; the copy is always done in a non-destructive
manner.

From the C99 standard:

6.5.8.5 When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, theycompare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

The emphasis is mine.

The arguments dst and src can be converted to pointers to char so as to alleviate strict aliasing problems, but is it possible to compare two pointers that may point inside different blocks, so as to do the copy in the correct order in case they point inside the same block?

The obvious solution is if (src < dst), but that is undefined if src and dst point to different blocks. "Undefined" means you should not even assume that the condition returns 0 or 1 (this would have been called "unspecified" in the standard's vocabulary).

An alternative is if ((uintptr_t)src < (uintptr_t)dst), which is at least unspecified, but I am not sure that the standard guarantees that when src < dst is defined, it is equivalent to (uintptr_t)src < (uintptr_t)dst). Pointer comparison is defined from pointer arithmetic. For instance, when I read section 6.5.6 on addition, it seems to me that pointer arithmetic could go in the direction opposite to uintptr_t arithmetic, that is, that a compliant compiler might have, when p is of type char*:

((uintptr_t)p)+1==((uintptr_t)(p-1)

This is only an example. Generally speaking very little seems to be guaranteed when converting pointers to integers.

This is a purely academic question, because memmove is provided together with the compiler. In practice, the compiler authors can simply promote undefined pointer comparison to unspecified behavior, or use the relevant pragma to force their compiler to compile their memmove correctly. For instance, this implementation has this snippet:

if ((uintptr_t)dst < (uintptr_t)src) {
            /*
             * As author/maintainer of libc, take advantage of the
             * fact that we know memcpy copies forwards.
             */
            return memcpy(dst, src, len);
    }

I would still like to use this example as proof that the standard goes too far with undefined behaviors, if it is true that memmove cannot be implemented efficiently in standard C. For instance, no-one ticked when answering this SO question.

Cavitation answered 26/10, 2010 at 11:49 Comment(3)
6.5.8.5 seems very badly worded, one way or another. It starts with a simple statement: "he result depends on the relative locations in the address space of the objects pointed to." THEN it runs off all prescriptive without being at all clear what happens if one of the prescriptive rules conflicts with the relative-locations-in-address-space rule. Is it trying to prescribe how structs should be laid out, or how pointers should compare?Tonkin
+1 for well-explained and interesting question.Ingrid
This is not only academic, a blitter with support for overlapping regions has exactly this issue, see for example my patch at winehq.org/pipermail/wine-patches/2008-March/051766.html (it uses memmove for each line, and an outer for loop with step and starting lines selected by a pointer comparison).Tiatiana
M
24

I think you're right, it's not possible to implement memmove efficiently in standard C.

The only truly portable way to test whether the regions overlap, I think, is something like this:

for (size_t l = 0; l < len; ++l) {
    if (src + l == dst) || (src + l == dst + len - 1) {
      // they overlap, so now we can use comparison,
      // and copy forwards or backwards as appropriate.
      ...
      return dst;
    }
}
// No overlap, doesn't matter which direction we copy
return memcpy(dst, src, len);

You can't implement either memcpy or memmove all that efficiently in portable code, because the platform-specific implementation is likely to kick your butt whatever you do. But a portable memcpy at least looks plausible.

C++ introduced a pointer specialization of std::less, which is defined to work for any two pointers of the same type. It might in theory be slower than <, but obviously on a non-segmented architecture it isn't.

C has no such thing, so in a sense, the C++ standard agrees with you that C doesn't have enough defined behaviour. But then, C++ needs it for std::map and so on. It's much more likely that you'd want to implement std::map (or something like it) without knowledge of the implementation than that you'd want to implement memmove (or something like it) without knowledge of the implementation.

Mcdermott answered 26/10, 2010 at 12:18 Comment(6)
Oh, beautiful (comparison for equality of only valid pointers).Cavitation
I hope that by "beautiful", you mean "My eyes! The goggles do nothing!" ;-)Mcdermott
No, I mean "beautiful" because the pseudo-code is technically, as far as I can see, an answer to the question, rather than a discussion about why the C standards are the way they are.Cavitation
Notice that this same code is likely not correct in C++, where comparing a one-past-the-end pointer to another equal pointer does not have to return the "correct" answer. But you claimed only C, after all. (However, AFAIK in practice compilers optimize these one-past-the-end pointers the same way in C and C++, so this would still be a problem.)Eke
Yeah, you're correct. It should be dst + len - 1 at the end of the second line.Toxicosis
The reason being, dst + len would give false positives for two sequential arrays of size 1 in memory.Toxicosis
P
8

For two memory areas to be valid and overlapping, I believe you would need to be in one of the defined situations of 6.5.8.5. That is, two areas of an array, union, struct, etc.

The reason other situations are undefined are because two different objects might not even be in the same kind of memory, with the same kind of pointer. On PC architectures, addresses are usually just 32-bit address into virtual memory, but C supports all kinds of bizarre architectures, where memory is nothing like that.

The reason that C leaves things undefined is to give leeway to the compiler writers when the situation doesn't need to be defined. The way to read 6.5.8.5 is a paragraph carefully describing architectures that C wants to support where pointer comparison doesn't make sense unless it's inside the same object.

Also, the reason memmove and memcpy are provided by the compiler is that they are sometimes written in tuned assembly for the target CPU, using a specialized instruction. They are not meant to be able to be implemented in C with the same efficiency.

Provided answered 26/10, 2010 at 11:59 Comment(4)
Yes, but the specification of memmove does not say that it is invalid to call it with pointers inside different blocks. If it is invalid, the specification should say it so that programmers can use memcpy instead in this case.Cavitation
@Pascal- It shouldn't be invalid to do that. Your implementation of memmove should be smart enough to detect this and run memcpy instead. The programmer shouldn't have to worry about which function to call.Divinize
There is no way to detect that in just C, but yes, it needs to work in both cases. It isn't a requirement that memcpy and memmove need to be implementable with just C -- in fact, perfect versions probably cannot be, which is all the more reason why the standard wants compiler writers to provide it.Provided
@Lou Franco: +1 Standard headers and the standard library are required to implement functions and macros to do many things which cannot otherwise be done in portable C. As you note, that's a big part of why the standard macros and libraries exist in the first place. If there were some other efficient and portable way to copy blocks of memory that may or may not overlap, memmove would be redundant. One wouldn't expect to provide the functionality of stdarg.h or setjmp.h in a way that works on all platforms, without using stdarg.h; why should memmove() be any different?Morvin
D
2

For starters, the C standard is notorious for having problems in the details like this. Part of the problem is because C is used on multiple platforms and the standard attempts to be abstract enough to cover all current and future platforms (which might use some convoluted memory layout that's beyond anything we've ever seen). There is a lot of undefined or implementation-specific behavior in order for compiler writers to "do the right thing" for the target platform. Including details for every platform would be impractical (and constantly out-of-date); instead, the C standard leaves it up to the compiler writer to document what happens in these cases. "Unspecified" behavior only means that the C standard doesn't specify what happens, not necessarily that the outcome cannot be predicted. The outcome is usually still predictable if you read the documentation for your target platform and your compiler.

Since determining if two pointers point to the same block, memory segment, or address space depends on how the memory for that platform is laid out, the spec does not define a way to make that determination. It assumes that the compiler knows how to make this determination. The part of the spec you quoted said that result of pointer comparison depends on the pointers' "relative location in the address space". Notice that "address space" is singular here. This section is only referring to pointers that are in the same address space; that is, pointers that are directly comparable. If the pointers are in different address spaces, then the result is undefined by the C standard and is instead defined by the requirements of the target platform.

In the case of memmove, the implementor generally determines first if the addresses are directly comparable. If not, then the rest of the function is platform-specific. Most of the time, being in different memory spaces is enough to ensure that the regions don't overlap and the function turns into a memcpy. If the addresses are directly comparable, then it's just a simple byte copy process starting from the first byte and going forward or from the last byte and going backwards (whichever one will safely copy the data without clobbering anything).

All in all, the C standard leaves a lot intentionally unspecified where it can't write a simple rule that works on any target platform. However, the standard writers could have done a better job explaining why some things are not defined and used more descriptive terms like "architecture-dependent".

Divinize answered 26/10, 2010 at 13:13 Comment(7)
I understand the compromises of defining a standard for existing and future implementations. My question was not a complaint about C (millions of programmers can't be all wrong), but having to make one own's classification between undefined-but-acceptable constructs and plain undefined constructs is a burden on each low-level C programmer (and on me!). And I can imagine that sometimes, you have to write your own memmove-like function because standard memmove is not exactly what you need; you'd like to write it in C, and you'd like it to avoid the intermediate copy.Cavitation
Also, the C standard leaves things to define and document to the compiler writer, but some compilers are written by people who do not care to provide additional guarantees to those of the standard. I have heard of compiler writers wriggling out of penalties for generating incorrect code when accessing an union field other than the last one written to by invoking the standard. That shouldn't be allowed (gcc promotes this from undefined to implementation-defined, as an example of what compiler writers should typically document).Cavitation
@Pascal: there are potential good reasons for that union example - in order to make cast-through-union work, you might have to make exceptions to optimisations that rely on the strict aliasing rules. I don't think that compiler writers "should" do this (except that because they want a lot of existing incorrect code to work, they have to): I think programmers shouldn't use the idiom.Mcdermott
@Steve Jessop That was an embedded compiler. It had to provide a way to convert some bits from a type to another (again, without requiring an intermediate copy) or it was unusable for its intended purpose. Gcc's choice to allow this when done through an union and disallowing it through pointer casts at least gives you a way to do it.Cavitation
@Pascal: why does it have to provide a way to do it without a copy? That said, I think there are some cases where a union cast is valid in C99, assuming that you know the representations of the type involved. I can just never remember what they are.Mcdermott
@SteveJessop: Why does a language "have to" provide anything? If there are useful behaviors which compilers for the vast majority of platforms could make available to programmers without having to do anything other than refrain from assuming that programmers won't take advantage of them, is it more helpful to say that compilers should make such behaviors available unless programmers waive them, or at minimum define a standardized means by which programmers can specify when they need such behaviors [and which compilers that would make such behaviors available anyway could simply ignore]...Morvin
...or to say that programmers should write code which refrains from using the natural behavior in cases where there's a 100% chance that the code will be longer and more cumbersome and harder to compile efficiently, and a strong likelihood that the code will end up slower, all to allow for the possibility that such efforts might allow some compilers to yield more efficient code than if programmers weren't required to jump through such hoops?Morvin
I
2

Here's another idea, but I don't know if it's correct. To avoid the O(len) loop in Steve's answer, one could put it in the #else clause of an #ifdef UINTPTR_MAX with the cast-to-uintptr_t implementation. Provided that cast of unsigned char * to uintptr_t commutes with adding integer offsets whenever the offset is valid with the pointer, this makes the pointer comparison well-defined.

I'm not sure whether this commutativity is defined by the standard, but it would make sense, as it works even if only the lower bits of a pointer are an actual numeric address and the upper bits are some sort of black box.

Ingrid answered 26/10, 2010 at 15:45 Comment(3)
It doesn't necessarily work though if the upper bits of the integer value resulting from the converted pointer are the address, and the lower bits are the black box. Imagine a hypothetical system on which the smallest addressable unit of memory from assembly code is 4 bits. Then I think it's valid, and it's certainly fairly natural, for the conversion to uintptr_t to result in (uintptr_t)(ptr+1) == ((uintptr_t)ptr) + 2. The "black box" at the bottom here is just a single 0 bit, and could result in a false negative of the test "do they overlap?"Mcdermott
@Steve Jessop: Adding the offset to the (char*) before doing the integer cast would solve problems on at least some systems where pointers aren't linear (real-world example: on 80x86 systems, pointers when using "huge" mode (or pointers explicitly declared as "huge") are stored as 32 bits, and access 1MB of addressing space using the top 16 bits and the bottom 4 bits). I don't know of implementations where portable code could produce two pointers where integer and pointer comparison would yield different results (but not UB), but...Morvin
...I wouldn't be surprised if such systems exist, especially within the world of embedded microcontrollers. It's certainly possible to produce pointers on 80x86 systems using the "large" model (or pointers explicitly declared "far") which, when cast to integer, will appear not to overlap, and such pointers can be useful for various reasons, but casting such pointers to "huge" and then to integer, will yield consistent comparisons (such casts are, of course, non-portable, but so is the situation that creates their necessity).Morvin
O
0

I would still like to use this example as proof that the standard goes too far with undefined behaviors, if it is true that memmove cannot be implemented efficiently in standard C

But it's not proof. There's absolutely no way to guarantee that you can compare two arbitrary pointers on an arbitrary machine architecture. The behaviour of such a pointer comparison cannot be legislated by the C standard or even a compiler. I could imagine a machine with a segmented architecture that might produce a different result depending on how the segments are organised in RAM or might even choose to throw an exception when pointers into different segments are compared. This is why the behaviour is "undefined". The exact same program on the exact same machine might give different results from run to run.

The oft given "solution" of memmove() using the relationship of the two pointers to choose whether to copy from the beginning to the end or from the end to the beginning only works if all memory blocks are allocated from the same address space. Fortunately, this is usually the case although it wasn't in the days of 16 bit x86 code.

Oxbow answered 26/10, 2010 at 13:54 Comment(7)
I think it is proof that even if the language retains the fast comparison for pointers known to be in the same block, it must also provide a comparison that works as long as the pointers are of the same type. Steve Jessop's answer convinced me to switch to C++ :)Cavitation
@Pascal Cuoq: Why restrict all of the pointers to objects of the same type to coming from the same address space? It works for modern flat address space architectures, but in the general case, may not even be logically possible.Oxbow
"it must also provide a comparison" - well, what's actually needed for memmove is a way to test whether the memory regions overlap. If one points to "red" segment, and the other points to "blue" segment, then you know they don't overlap, which is enough that memmove can delegate to memcpy. You don't need there to be an order relationship between "red" and "blue" (although an implementation presumably could choose one, perhaps arbitrarily).Mcdermott
@Steve: I think the complaint most people make is that the implementation is not forced to impose such an order, and some people would like it to be forced to. Of course that would impose a huge performance hit to pointer comparison on such implementations. On the other hand, my complaint is that C still allows legacy things like non-flat memory. Thankfully POSIX adds some sanity.Ingrid
@R.. Agree with you except that the non flat memory model is not quite as legacy as those of us who program modern general purpose computers would like to think.Oxbow
@JeremyP: IMHO, the standard should regard comparisons of unrelated pointers as returning an unspecified (arbitrary) value from the set {0,1} rather than being undefined behavior. I'm certainly aware of architectures where comparisons of unrelated pointers would be meaningless (and may even not be transitive), but none which would lack a means of safely comparing pointers that may or may not be in range of each other (with the proviso that such comparisons may yield arbitrary results).Morvin
@JeremyP: Given such comparison operators, one could use the result of a comparison to do either do a top-down or bottom-up copy operation. If top-down copies are slower than bottom-up, the code could end up needlessly using the slow form half the time on non-overlapping blocks, but the code should at least be portable.Morvin

© 2022 - 2024 — McMap. All rights reserved.