Writing memcpy conformant with strict aliasing
Asked Answered
P

6

11

The general answer when asking "how does one implement memcpy function conformant with strict aliasing rules" is something along the lines of

void *memcpy(void *dest, const void *src, size_t n)
{
    for (size_t i = 0; i < n; i++)
        ((char*)dest)[i] = ((const char*)src)[i];
    return dest;
}

However, if I understand correctly, compiler is free to reorder call to memcpy and access to the dest, because it can reorder writes to char* with reads from any other pointer type (strict aliasing rules prevent only reordering of reads from char* with writes to any other pointer type).

Is this correct and if yes, are there any ways to correctly implement memcpy, or should we just rely on builtin memcpy?

Please note, that this question concerns not only memcpy but any deserialization/decoding function.

Potful answered 31/7, 2014 at 13:57 Comment(8)
Compilers tend to recognize memcpy as a built-in function and do the right thing. As for how it works in standard C, you implement it with character types, as you mentioned. Anything else will be implementation-specific.Yl
Real life memcpy is usually way more complicated, with copying in chunks of processor word size.Jamin
@CodyGray, the question is, is the implementation with chars correct? From what I've understood (e.g. from #23848688), compilers can reorder writes to char* with reads from other pointer type, therefore they can simply swap the second and third lines in the following code: SomeData *dest, *src; memcpy(dest, src); dest->...Potful
@OlegAndreev: You've misunderstood those answers. If you have a foo, you can read and write to it as a char array. If you have a char array, treating it as a foo is undefined behaviour. There's a notion of the underlying type of an object, and it must be compatible with the type of the pointer through which you access the object. The reason the compiler can reorder stuff in the other question is that there is UB.Semidiurnal
@OlegAndreev: It's exactly the other way round. Access through char* is always well defined, access through the correct type is well defined, access through a signed/unsigned variant of the correct type is well-defined. But anyway, memcpy itself is by definition undefined for overlapping source and destination so this doesn't matter for a memcpy implementation.Sulfate
@Semidiurnal Looks like I've misunderstood, yes, thank you. So, just to clarify: if I have a pointer to SomeObject, I can cast it to char* and reads/writes to the latter pointer will correctly affect the value of SomeObject, but if I originally have a char* pointer, it is incorrect to cast it to SomeObject*?Potful
@OlegAndreev: You can cast it to SomeObject*. You cannot access it through the SomeObject*.Semidiurnal
@Semidiurnal You can cast, but if the memory was not aligned, the resulting pointer will be meaningless, you can't even cast it back and round trip.Headrest
H
6

The strict aliasing rule specifically excludes casts to char types (see last bullet point below), so the compiler will do the correct thing in your case. Type punning is only a problem when converting things like int to short. Here the compiler may make assumptions that will cause undefined behavior.

C99 §6.5/7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.
Hako answered 31/7, 2014 at 14:6 Comment(0)
D
5

Since both (char*)dest and (char const*)src point to char, the compiler must assume that they might alias. Plus, there is a rule that says that a pointer to a character type can alias anything.

All of which is irrelevant for memcpy, since the actual signature is:

void* memcpy( void* restrict dest, void* restrict src, size_t n );

which tells the compiler that there cannot be aliasing, because the user guarantees it. You cannot use memcpy to copy overlapping areas without incurring undefined behavior.

At any rate, there's no problem with the given implementation.

Defame answered 31/7, 2014 at 14:15 Comment(2)
The OP is concerned about a compiler, on the basis of the implementation shown, reordering a call to their implementation of memcpy with read accesses to the zone pointed by dest.Seminar
@PascalCuoq Which is covered by my first paragraph.Defame
S
1

IANALL, but I don't think the compiler is allowed to mess things up in the way you describe. Strict aliasing is "implemented" in the spec by rendering undefined accesses to an object through an illegal pointer type, rather than by specifying another complicated partial order on object accesses.

Semidiurnal answered 31/7, 2014 at 14:2 Comment(1)
I'm not sure why you think specifying an order on object accesses is complicated. One could ditch the concepts of Effective Type as well as the performance-robbing "character type exception" if one recognizes that use of an lvalue of one type to create another opens up a window to use the new lvalue until the next time an old lvalue is used to access the storage in conflicting fashion or code enters a context where that occurs. Actions on the new lvalue which are within that window should be sequenced between other actions that precede it and other actions that follow it.Naturalism
A
1

What everyone seems to be missing here, is that strict aliasing (6.5/7) depends on the term effective type (6.5/6). And effective type has explicit, special rules for the function memcpy (6.5/6):

If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.

So therefore I don't think it even makes sense to speak of strict aliasing inside the memcpy function. You can only speak of strict aliasing if you know the effective type. Now, how do you determine that, based on the above? Is the internals of memcpy a copy with memcpy or not?

It's like saying "in order to understand which effective type that is used in memcpy, you must first understand which effective type that is used in memcpy".

So I don't quite see how the question, or any of the answers posted, make any sense.

Advocation answered 14/10, 2016 at 11:54 Comment(1)
The rules there are really horrible since they don't define a means via which someone can examine code that uses character-type pointers and say whether it is "copying as an array of character type", nor do they specify exactly to what length code would need to go to be assured that the destination would be left with no effective type and could thus be read as any type.Naturalism
C
0

Yes, you're missing something. The compiler may to reorder writes to dest and reads to dest. Now, since reads from src happen-before writes to dest, and your hypothethical read from desthappens-after the write to dest, it follows that the read from dest happens-after the read from src.

Coterie answered 31/7, 2014 at 14:3 Comment(0)
N
0

If an object has no declared type, any effective type it may acquire will only be effective until the next time the object is modified. Writing to an object using a pointer of character type counts as modifying it, thus unsetting the old type, but writing it via character-type pointer does not set a new type unless such operation occurs as part of "copying as an array of character type", whatever that means. Objects which have no effective type may be legally read with any type.

Since the effective-type semantics for "copying as an array of character type" would be the same as those for memcpy, a memcpy implementation could be written using character pointers for reading and writing. It may not set the effective type of the destination the way memcpy would be allowed to, but any behavior which would be defined when using memcpy would be defined identically if the destination were left with no effective type [as IMHO should have been the case with memcpy].

I'm not sure who came up with the idea that a compiler can assume that storage which has acquired an effective type keeps that effective type when it is modified using a char*, but nothing in the Standard justifies it. If you need your code to work with gcc, specify that it must be use with the -fno-strict-aliasing flag unless or until gcc starts honoring the Standard. There's no reason to bend over backward trying to support a compiler whose authors continually seek out new cases to ignore aliasing even in cases where the Standard would require them to recognize it.

Naturalism answered 28/10, 2016 at 15:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.