Is it guaranteed to be safe to perform memcpy(0,0,0)?

Asked 9/3, 2011 at 8:17 Answered 12/8, 2017 at 8:30

Solved c memcpy language-lawyer null-pointer

I am not so well-versed in the C standard, so please bear with me.

I would like to know if it is guaranteed, by the standard, that memcpy(0,0,0) is safe.

The only restriction I could find is that if the memory regions overlap, then the behavior is undefined...

But can we consider that the memory regions overlap here ?

Descendent answered 9/3, 2011 at 8:17 Comment(10)

Mathematically the intersection of two empty sets is empty. – Zaragoza 9/3, 2011 at 8:22

I wanted to check for you want (x)libC does for you, but as it's asm (elibc/glibc here), it's a bit too complicated for an early morning :) – Purely 9/3, 2011 at 8:25

Why would you do that? By the way, overlapping memory regions are not the only reason for UB with memcpy. – Chiapas 9/3, 2011 at 8:27

+1 I love this question both because it's such a strange edge case and because I think memcpy(0,0,0) is one of the weirdest pieces of C code I've seen. – Selfemployed 9/3, 2011 at 8:29

@eq Do you really want to know, or are you implying that there are no situations when you would want it? Have you considered that the actual call might be, say, memcpy(outp, inp, len)? And that this could occur in code where outp and inp are dynamically allocated and are initially 0? This works, e.g., with p = realloc(p, len+n) when p and len are 0. I myself have used such a memcpy call -- while it is technically UB, I've never encountered an implementation where it isn't a no-op and don't ever expect to. – Shelf 9/3, 2011 at 8:50

@Selfemployed memcpy(0, 0, 0) is most likely intended to represent a dynamic, not static invocation ... i.e., those parameter values need not be literals. – Shelf 9/3, 2011 at 8:53

@eq, @templatetypedef: They are not literal but dynamic values... in a 3rd party software :/ – Descendent 9/3, 2011 at 9:12

@Jim Balter: Of course I have (thought about that). Whenever I use C, I try to stick to writing portable (i.e. well-defined) C, even if I have to write a few more conditionals - they are unlikely be a performance bottleneck (and I would only remove them if they were, writing them isn't that hard), and the only way to find out is .. well, finding out. Whether they will one day 'save the day' or not is, really, irrelevant. – Chiapas 9/3, 2011 at 15:43

Unlike C++, C doesn't treat null as a pointer to an array. (This is very strange.) – Monzonite 9/8, 2015 at 3:41

Related: youtu.be/I8QJLGI0GOE?t=2100 – Tanaka 24/10, 2022 at 20:0

I have a draft version of the C standard (ISO/IEC 9899:1999), and it has some fun things to say about that call. For starters, it mentions (§7.21.1/2) in regards to memcpy that

Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters.

The reference indicated here points to this:

If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.

So it looks like according to the C spec, calling

memcpy(0, 0, 0)

results in undefined behavior, because null pointers are considered "invalid values."

That said, I would be utterly astonished if any actual implementation of memcpy broke if you did this, since most of the intuitive implementations I can think of would do nothing at all if you said to copy zero bytes.

Selfemployed answered 9/3, 2011 at 8:24 Comment(20)

+1 I missed that paragraph and went straight for memcpy. Silly me. I should have known better than to assume that programmers would repeat that kind of information for every single function description. – Julienne 9/3, 2011 at 8:27

I can affirm that the quoted parts from the draft standard are identical in the final document. There shouldn't be any troubles with such a call, but it would still be undefined behaviour you're relying on. So the answer to "is it guaranteed" is "no". – Sorce 9/3, 2011 at 8:33

@DevSolar- Thanks for confirming this! It would have been really embarrassing if everything I said was completely wrong. :-) – Selfemployed 9/3, 2011 at 8:38

No implementation that you will ever use in production will produce anything other than a no-op for such a call, but implementations that do otherwise are allowed and are reasonable ... e.g., a C interpreter or augmented compiler with error checking that rejects the call because it's non-conforming. Of course that wouldn't be reasonable if the Standard did allow the call, as it does for realloc(0, 0). The use cases are similar, and I've used them both (see my comment under the question). It's pointless and unfortunate that the Standard makes this UB. – Shelf 9/3, 2011 at 9:1

@Chris Many programmers might, but not Dave Prosser, who was primarily responsible for the text. – Shelf 9/3, 2011 at 9:8

@templatetypedef: Well, wrt to real word implementation which break on this, I think that SUSE 10sp3 libc.so.6 does... or so it seems. – Descendent 9/3, 2011 at 9:16

@Matthieu M.- Cool! What does it do that breaks? – Selfemployed 9/3, 2011 at 9:22

@templatetypedef: it produces a memory coredump, through a call to abort and a nice backtrace. – Descendent 9/3, 2011 at 9:29

@Matthieu Does the backtrace show abort being called by memcpy? A call to abort sounds like an assert failing. It's possible that the memcpy library code has such an assert, although that would be quite silly. – Shelf 9/3, 2011 at 9:45

@Jim: That was an assumption actually, the coredump is produced by a segfault, but the backtrace is nearly unexploitable (apart from the memcpy call) as the debug symbols have been stripped. And I don't really want to look at the assembly. – Descendent 9/3, 2011 at 10:6

@Matthieu Ok, the segfault makes a lot more sense but is still shocking. I don't know the x86 hardware well, but apparently the movsb instruction is accessing the addressed memory even though the count is 0. – Shelf 9/3, 2011 at 10:15

"I would be utterly astonished if any actual implementation of memcpy broke if you did this" - I've used one that would; in fact if you passed length 0 with valid pointers, it actually copied 65536 bytes. (Its loop decremented the length and then tested). – Idioglossia 12/7, 2014 at 6:1

@MattMcNabb That implementation is broken. – Shelf 13/7, 2014 at 22:16

@MattMcNabb: Add "correct" to "actual", maybe. I think we all have not-so-fond memories of old, ghetto C libraries and I'm not sure how many of us appreciate those memories being recalled. :) – Thorman 9/9, 2014 at 11:20

Does one-past-the-end of an array count as a valid value? – Idioglossia 30/8, 2015 at 5:34

memcpy can ask the CPU to prefetch the source memory, before doing any checks on the size. – Ouphe 24/10, 2018 at 3:20

There doesn't appear to be a clear definition of invalid value in the C11 standard. I don't think the parenthesized example listing in 7.1.4 implies that a null pointer is an invalid value in all contexts (note 102's expression of "invalid values for dereferencing" further reinforces the notion that invalid values are contextual). Furthermore, the standard mentions "shall not be a null pointer" a great many times but memcpy doesn't come with it. Must be because nullpointers should be valid arguments to memcpy when the size argument is zero. – Blinkers 14/8, 2019 at 10:58

@PSkocik: The authors of the Standard make little effort to consider situations where all implementations would be expected to behave a certain way absent a clear and compelling reason to do otherwise, and few if any implementations would be have any compelling reason to do otherwise. The only situation where such consideration would matter would be if some implementations had a compelling reason to do otherwise, and if the authors of the Standard don't know of any implementations that do anything unusual, they would have no way of judging whether their reasons for doing so were compelling. – Stingo 14/6, 2020 at 18:59

@PSkocik: IMHO, the authors of the Standard didn't want to require programmers who were writing functions that receive pointer+size pairs on conventional target platforms to add explicit logic to prevent zero-byte reads or writes to/from null. On the other hand, they also likely didn't want to require that compilers targeting obscure platforms add extra code if their natural means of performing memcpy operations would otherwise have unwanted side effects. Most likely, they figured that the question was sufficiently unlikely to matter that there was no need to discuss it. – Stingo 14/6, 2020 at 19:7

This answer makes no sense. The pointers have to be passed a valid value as opposed to nonsense because on certain non-flat architectures you will observe a fault as soon as you load the pointer in to the pointer register. NULL does not fault though so that's not the issue it's talking about. – Pointsman 10/10, 2022 at 16:31

Just for fun, the release-notes for gcc-4.9 indicate that its optimizer makes use of these rules, and for example can remove the conditional in

int copy (int* dest, int* src, size_t nbytes) {
    memmove (dest, src, nbytes);
    if (src != NULL)
        return *src;
    return 0;
}

which then gives unexpected results when copy(0,0,0) is called (see https://gcc.gnu.org/gcc-4.9/porting_to.html).

I am somewhat ambivalent about the gcc-4.9 behaviour; the behaviour might be standards compliant, but being able to call memmove(0,0,0) is sometimes a useful extension to those standards.

Centrepiece answered 12/7, 2014 at 5:32 Comment(26)

Interesting. I understand your ambivalence but this is the heart of optimizations in C: the compiler assumes that developers follow certain rules and thus deduces that some optimizations are valid (which they are if the rules are followed). – Descendent 19/7, 2014 at 12:45

@MatthieuM.: Yes, but this is particularly stupid. A rough corner was added to the specification of memcpy that doesn't exist in any implementation I know of, not to increase uniformity, but to deliberately break from it. – Thorman 9/9, 2014 at 11:19

@tmyklebu: It would be reasonable on some platforms to implement memcpy as something like char *end=src+count;while(src<end){*dest++=*src++;} [saving the need to modify count during the loop]. Passing an invalid src pointer would trigger UB even if count is zero. – Stingo 6/12, 2014 at 19:3

@supercat: I do not see where the UB would be triggered if count == 0. – Thorman 6/12, 2014 at 21:12

@tmyklebu: Given char *p = 0; int i=something;, evaluation of the expression (p+i) will yield Undefined Behavior even when i is zero. – Stingo 6/12, 2014 at 21:32

@supercat: That's bizarre. (And, in my opinion, stupid.) But 6.5.6 paragraph 8 does indeed imply that adding 0 to the null pointer is UB. – Thorman 6/12, 2014 at 23:24

@tmyklebu: Having implementations trap at attempts to add a non-zero value to a null pointer would generally be a good thing if it could be done cheaply; having such a trap not occur when adding zero would add to the cost. – Stingo 10/12, 2014 at 5:4

@supercat: I would probably find such an implementation bizarre and stupid. Remember Clippy from MS Word? Remember how he'd get up in your face when you were minding your own business trying to be productive? Wasn't that annoying? That's basically how I feel about surprising sharp corners in language specs. – Thorman 10/12, 2014 at 5:15

@tmyklebu: Having all pointer arithmetic (other than comparisons) on a null pointer trap would IMHO be a good thing; whether memcpy() should be allowed to perform any pointer arithmetic on its arguments prior to ensuring a non-zero count is another question [if I were designing the standards, I would probably specify that if p is null, p+0 could trap, but memcpy(p,p,0) would do nothing]. A much bigger problem, IMHO, is the open-endedness of most Undefined Behavior. While there are some things which really should represent Undefined Behavior (e.g. calling free(p)... – Stingo 10/12, 2014 at 16:43

...and subsequently performing p[0]=1;) there are a lot of things which should be specified as yielding indeterminate result (e.g. a relational comparison between unrelated pointers should not be specified as being consistent with any other comparison, but should be specified as yielding either a 0 or a 1), or should be specified as yielding a behavior slightly looser than implementation-defined (compilers should be required to document all possible consequences of e.g. integer overflow, but not specify which consequence would occur in any particular case). – Stingo 10/12, 2014 at 16:52

@supercat: I agree with you on every one of your points except having arithmetic on null pointers trap. I just don't like the idea that constructing an invalid pointer, or even dereferencing it to product an lvalue that doesn't refer to an object, is bad. (Lvalue-to-rvalue conversion when the lvalue doesn't refer to an object is bad, sure, but I don't see why we need to go beyond that at all.) – Thorman 10/12, 2014 at 17:0

@tmyklebu: I particularly dislike the notion that Undefined Behavior should be usable as a form of compiler-exploitable assertion exempt from normal the rules of causality--that to me is what represents "Clippy"-level annoyance. Given uint16_t foo=f(); int bar=0; if (foo > 50000) bar=3; if (foo*foo > 16383) bar |= 1;, what good can come from allowing 32-bit compiler to omit the first if? – Stingo 10/12, 2014 at 17:3

@tmyklebu: Having arithmetic on null-pointers trap would in many cases be cheaper than ensuring that such arithmetic will not yield something that could be mistaken for a valid pointer. Given char *p; int flag;, what should char *q = p+123456789; int32_t d=q-p; if (flag) *q = 0; do if p is null and flag is zero? If non-zero? If the assignment to q asserts that p is non-null, then d can be assigned the literal 123456789, but there may not exist any trap pointer representation from which one could subtract a null pointer to yield 123456789. – Stingo 10/12, 2014 at 17:18

@tmyklebu: Given a choice between having the assignment to q trap, or overwriting memory location 123456789, I'd much prefer the former. Having arithmetic on a null-pointer value yield a null result might be an acceptable alternate behavior, but that would make the value of d dependent upon the value of p. – Stingo 10/12, 2014 at 17:20

@supercat: I don't understand why you care whether NULL + foo is a valid pointer to such an extent that you want to ensure that it never is. Messing with null pointer arithmetic in the way you describe would ruin the identity p + (q-p) == q that I'd want to hold in any hypothetical safe dialect of C. – Thorman 10/12, 2014 at 17:57

@tmyklebu: Nasal demons are bad. If *q=0; can be executed without trapping when q holds an invalid pointer, it will cause nasal demons. I would posit that the value of preventing *q=0; from causing nasal demons whenever possible outweighs the value of being able to perform arithmetic on null pointers without it trapping. It's virtually impossible for a practical C-standard-compliant platform to guard against absolutely all conceivable nasal demons, but stores to null-derived invalid pointers are both common and easy to check for. – Stingo 10/12, 2014 at 18:42

@tmyklebu: The only "use" I know of for arithmetic on null pointers is to do things which should really be done using standard macros like offsetof. Can you suggest any others? – Stingo 10/12, 2014 at 18:49

@supercat: offsetof is one legitimate use. But I do not think that declaring things illegal in the absence of an argument for why they should be legal is sound language design policy. – Thorman 10/12, 2014 at 18:52

@supercat: I've seen "legitimate" uses of two-past-the-end and one-before-the-beginning pointers. For instance, if you want to advance through a string by doing *++str, then you want to start with a one-before-the-beginning pointer. mmap likes to use (void *)-1 as an error return value. It just doesn't seem sensible to forbid computation of invalid pointers when they're never used to access memory. – Thorman 10/12, 2014 at 19:4

@tmyklebu: User code should not care how offsetof is implemented; an implementation where subtracting null from the address of a field of a null object happens to yield that field's offset may implement offsetof with such an expression, but an implementation would be free to implement it other ways. Although many implementations of C store pointers as linear memory addresses, nothing in the standard requires any such thing, and tricks such as you describe may fail on architectures which store pointers other ways [e.g. a system might store pointers using a pair of numbers, such that... – Stingo 10/12, 2014 at 19:9

...a statement like p->q=123 is internally implemented as something analogous to memory_regions[p_base][p_ofs+offsetof(p.q)] = 123;; such a design would make it possible ensure that something like char *p=malloc(100); p[200]=0; will trap rather than overwriting some arbitrary object, and will allow the compiler to relocate malloc'ed memory regions if free space becomes fragmented, but may break if code tries to manipulate pointers in ways that assume a linear address space. – Stingo 10/12, 2014 at 19:13

@supercat: I agree that writing to or reading from unallocated storage is and should be UB. I have no problem with simply computing an invalid address. I think subtraction of unrelated pointers should be implementation-defined, or at least "undefined but with a strong encouragement to implementations to define sensibly," and I see no reason why computing invalid pointers should trap. I don't buy your argument that the current wording allows implementations to reorganise storage, and, even if true, I don't see that as a compelling advantage. – Thorman 10/12, 2014 at 19:18

@tmyklebu: Little is gained from allowing computations on invalid pointers to yield other pointers without being trapped; the only cases where such things would have any realistic hope of working are system-specific, and could be done in implementation-defined (rather than undefined) fashion by by casting to intptr_t, performing the appropriate computations on the resulting value, and casting back. I would see great advantage to an extension to C which would specify p-q in three situations where it presently does not... – Stingo 24/4, 2015 at 16:1

...specifically saying that if p and q are valid and p is not contained within the first s bytes of object q, then p-q may be any value not in the range 0..s-1, saying that if p and q were part of the same formerly-valid object, p-q will have the same valid as when the object was valid, and saying that given p=realloc(q); p-q will hold an arbitrary non-zero value if the object was moved. – Stingo 24/4, 2015 at 16:4

@tmyklebu: The latter could be replaced by a rule allowing == to be used (which should also be valid) but all three represent useful operations which are essentially impossible in purely-standard C. People trying to turn C into something it never was may howl at the idea of extending the set of formally-defined behaviors to include things that 99% of compilers supported anyway, but I hope those who want the language to be useful can get them anyway. – Stingo 24/4, 2015 at 16:9

someone please tell me, why don't I get a stackoverflow badge "started a flame war" :-) – Centrepiece 20/6, 2015 at 8:28

You can also consider this usage of memmove seen in Git 2.14.x (Q3 2017)

See commit 168e635 (16 Jul 2017), and commit 1773664, commit f331ab9, commit 5783980 (15 Jul 2017) by René Scharfe (rscharfe).
^{(Merged by Junio C Hamano -- gitster -- in commit 32f9025, 11 Aug 2017)}

It uses an helper macro MOVE_ARRAY which calculates the size based on the specified number of elements for us and supports NULL pointers when that number is zero.
Raw memmove(3) calls with NULL can cause the compiler to (over-eagerly) optimize out later NULL checks.

MOVE_ARRAY adds a safe and convenient helper for moving potentially overlapping ranges of array entries.
It infers the element size, multiplies automatically and safely to get the size in bytes, does a basic type safety check by comparing element sizes and unlike memmove(3) it supports NULL pointers iff 0 elements are to be moved.

#define MOVE_ARRAY(dst, src, n) move_array((dst), (src), (n), sizeof(*(dst)) + \
    BUILD_ASSERT_OR_ZERO(sizeof(*(dst)) == sizeof(*(src))))
static inline void move_array(void *dst, const void *src, size_t n, size_t size)
{
    if (n)
        memmove(dst, src, st_mult(size, n));
}

Examples:

- memmove(dst, src, (n) * sizeof(*dst));
+ MOVE_ARRAY(dst, src, n);

It uses the macro BUILD_ASSERT_OR_ZERO which asserts a build-time dependency, as an expression (with @cond being the compile-time condition which must be true).
The compilation will fail if the condition isn't true, or can't be evaluated by the compiler.

#define BUILD_ASSERT_OR_ZERO(cond) \
(sizeof(char [1 - 2*!(cond)]) - 1)

Example:

#define foo_to_char(foo)                \
     ((char *)(foo)                     \
      + BUILD_ASSERT_OR_ZERO(offsetof(struct foo, string) == 0))

As noted by user16217248 in the comments:

In C11 the BUILD_ASSERT_OR_ZERO macro is not necessary because we have static_assert()

Understate answered 12/8, 2017 at 8:30 Comment(3)

The existence of optimizers that think "clever" and "dumb" are antonyms makes the test for n necessary, but more efficient code would generally be possible on an implementation that guaranteed that memmove(any,any,0) would be a no-op. Unless a compiler can replace a call to memmove() with a call to memmoveAtLeastOneByte(), the workaround to guard against clever/stupid compilers' "optimization" will generally result in an extra comparison a compiler won't be able to eliminate. – Stingo 14/8, 2017 at 23:34

Note: In C11 the BUILD_ASSERT_OR_ZERO macro is not necessary because we have static_assert() – Beaty 25/10, 2023 at 23:14

@user16217248 Good point, thank you for this feedback. I have included your comment in the answer for more visibility. – Understate 26/10, 2023 at 8:29

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags