Do any compilers transfer effective type through memcpy/memmove

Asked 1/12, 2015 at 1:34 Answered 8/9, 2016 at 23:28

c language-lawyer memcpy strict-aliasing

According to N1570 6.5/6:

If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.

That would suggest that even on a system where "long" and some other integer type have the same representation, the following would invoke Undefined Behavior:

#if ~0UL == ~0U
  #define long_equiv int
#elif ~0UL == ~0ULL
  #define long_equiv long long
#else
#error Oops
#endif
long blah(void)
{
  long l;
  long_equiv l2;
  long_equiv *p = malloc(sizeof (long));
  l = 1234;
  memcpy(p, &l, sizeof (long));
  l2 = *p;
  free(p); // Added to address complaint about leak
  return l2;
}

since the data pointed to by l clearly has effective type long and the object pointed to by p has no declared type, the memcpy should set the effective type of the storage to long. Since reading use of an lvalue of type long_equiv to read an object with effective type of long is not allowed, the code would invoke Undefined Behavior.

Given that prior to C99 memcpy was one of the standard ways to copy data of one type to storage of another type, the new rules about memcpy cause a lot of existing code to invoke Undefined Behavior. If the rule had instead been that using memcpy to write to allocated storage leaves the destination without any effective type, the behavior would be defined.

Are there any compilers which do not behave as though memcpy leaves the effective type of the destination unset when used to copy information to allocated storage, or should use of memcpy for purposes of data translation be considered "safe"? If some compilers do apply effective type of the source to the destination, what would be the proper way of copying data in type-agnostic fashion? What is meant by "copied as an array of character type"?

Airworthy answered 1/12, 2015 at 1:34 Comment(17)

Your code already invokes UB, because you don't free(p), but loose the pointer on return. Your point is not quite clear. IIRC, it was already UB in C89 to copy one type to another that way. It just happened to work and likely still works. The effective type is not "applied" by the compiler, but assumed. Like const qualified objects, the programmer guarantees this and the compiler just relies on it. For the last sentence, please read the description of memmove & memcpy. They are supposed to copy char-wise. – Armed 1/12, 2015 at 2:25

Did you mean ~0UL? Because on two's complement, ~0, ~0L, and ~0LL are all -1, and all equal to each other, regardless of representational size – Wilden 1/12, 2015 at 4:3

@Olaf: A memory leak is not undefined behavior. – Wilden 1/12, 2015 at 4:4

@BenVoigt: Thanks for the catch on ~0L vs ~0LL. – Airworthy 1/12, 2015 at 5:58

@Olaf: In C89 behavior would have been defined on a compiler for which int and long had the same representation; while it would be legal for an implementation to use different representations (even if they both have the same maximum value), such an implementation would be required to document that fact. It is permissible to read or write objects of any effective type using "unsigned char", and in C89 memcpy was defined as writing the unsigned char values at the destination to match those of the source, which would as noted be fully defined if the representations of "int" and "long" match. – Airworthy 1/12, 2015 at 6:6

@Airworthy my only concern here is that we are lifting N1570 6.5/6 out of the criteria for determining whether an issue exists with strict-alias violation. I'm not sure that focusing on one of the allowable criteria exceptions to strict aliasing is meant to provide carte-blanche to use the criteria as a stand-alone rule in and of itself. I read through all the criteria recently and the impression I got was those specific exceptions where strict aliasing was not violated was not intended to act as a recommendation (or validation) of that approach. The limit on modifying values would control – Heriberto 1/12, 2015 at 6:16

@DavidC.Rankin: As I read it, a compiler always has the option to ignore or "forget" the effective type of any object, but the quoted passage would allow a compiler to assume that no object of one type may be affected by a memcpy operation whose source is a value of another. I'm not sure in what contexts that would be useful, since in cases where the source and destination types are the same I would expect code would use the assignment operator with that type. I can't find a link to C89 rules, but I don't think they mentioned memcpy; they were also vague with regard to allocated storage. – Airworthy 1/12, 2015 at 6:37

I agree. I distinctly remember, after reading those sections twice, that they were clear-as-mud, because they seemed to fully contradict what 50% of the Questions/Answers on SO say with regard to strict aliasing. I even have a post I did that is still 80% complete (not yet filed) on that precise subject just because it was so ambiguously clear. – Heriberto 1/12, 2015 at 7:42

@BenVoigt: Point taken. It is just (very) bad style. But what if malloc returned a null pointer? OP does not catch that. – Armed 1/12, 2015 at 12:52

@Olaf: My point was that the Standard says the code engages in Undefined Behavior even if malloc returns successfully, and even if the types between which data is transferred have identical representations, but it's not clear how code is supposed to convert data from one type to another in representation-preserving fashion (in cases where representations are defined as being compatible) if the use of memcpy for that purpose is no longer allowed. – Airworthy 1/12, 2015 at 15:35

@supercat: I understand the question very well. As I understand the standard, it actually is UB and the C11 standard is quite clear about that. Actually, I do not see much sense in explioting this anyway. Maybe you provide a practical example where this is actually useful. The example given looks quite synthetical to me. If you need a specific size, for instance, you should use stdint.h types. – Armed 1/12, 2015 at 18:47

@Olaf: For some libraries, it makes more sense to define inputs and outputs in terms of the size of "int" [i.e. on machines were "int" is larger or smaller than 32 bits, the library should use that size]. For others, it makes sense to define them as being 32 bits regardless of the size of "int". On machines where "int" is known to be 32 bits, it should be possible to have a method that accepts an array of 32-bit integers of either type, interchangeably. On many compilers, using memcpy to read and write elements from array would be awkward, but could be optimized to yield good code. – Airworthy 1/12, 2015 at 19:19

@supercat: I disagree. If you need 32 bit types, use (u)int32_t. Still no reason to use different types for in/out. If in doubt, typedef your own integer type and use that alias throughout your code. And if you need a "points to anything" pointer, use void *. – Armed 1/12, 2015 at 19:21

@Olaf: If programmers have to jump through hoops to prevent compilers from noticing that what they're trying to do is store either a an int or an int32_t in such a way that a compiler would not be required to allow access via the other type, that would suggest that the generated code is apt to be terrible. Other problematic situations arise when one has a pointer which is known to point to one of several structure types with a common initial sequence. In C89, one could memcpy the header to a struct containing just the initial sequence, work with that, and if need be memcpy it back. – Airworthy 1/12, 2015 at 19:23

@supercat: For the structs: This is still allowed for a union type or ientical anonymous structs as the very first field. Alternatively, you can use a struct with the common fields first and an anonymous union of structs for the rest. To me, "terrible" would be a code which wildly converts between types, not a code which consistently uses one single type. – Armed 1/12, 2015 at 19:30

@Olaf: The common-initial sequence rule is applicable to struct access via unions; it also used to be applicable to struct copying via memcpy. Using unions requires that (1) all code which uses any type in the union must know about all types that might need to participate; if those might potentially include union { int length; short dat[1];} moe;, union { int size; short dat[2];} larry; etc. up to some unknown limit, that could be rather inconvenient. Further, all accesses to union members must be done through the union. It is not legitimate to use pointers to union members... – Airworthy 1/12, 2015 at 20:17

...to access them unless one goes through character types, and the legality of even that is unclear in C99 due to the rules about access via "character arrays". – Airworthy 1/12, 2015 at 20:18

The C standard says that the effective type is transferred. Therefore, by definition, all conforming compilers transfer the effective type.

Your code sample causes undefined behaviour by violating the strict aliasing rule, because a value of effective type long is read by an lvalue of type long long.

This was also true in C89, I'm not sure what you refer to about "new rules in C99" (other than the fact that long long was not in C89).

It is true that when C was standardized, some existing code had undefined behaviour. And it is also true that people continue to write code with undefined behaviour.

What is meant by "copied as an array of character type"?

This means copying character-by-character using a character type.

what would be the proper way of copying data in type-agnostic fashion?

It's not possible to "erase effective type", so far as I know. To correctly read a value using a long long *, you must be pointing to a location of effective type long long.

In your code, for example:

// If we have confirmed that long and long long have the same size and representation
long long x;
memcpy(&x, p, sizeof x);
return x;

Union aliasing is another option.

If you don't like all this then compile with -fno-strict-aliasing.

Esteresterase answered 1/12, 2015 at 3:4 Comment(8)

In C89, invoking memcpy on non-overlapping operands was semantically equivalent to a loop which copied every character of the source operand to the destination. Since operations which set the effective type of allocated storage do so only until the next access which modifies the stored value, and since character-type writes do not set the effective type, writing to allocated storage using a character-type pointer would erase the effective type thereof, and memcpy would do so as well in C99 if not for the quoted rule which so far as I know has no equivalent in C89. – Airworthy 1/12, 2015 at 5:46

@Airworthy i'll check the text and get back to you later – Esteresterase 1/12, 2015 at 5:47

If copying character-by-character transfers the effective type, rather than erasing the effective type (objects with no effective type can be read as any type) then what degree of machinations must one do to make the "character-type" exception to the aliasing rules useful? – Airworthy 1/12, 2015 at 5:52

@supercat: "... and since character-type writes do not set the effective type" contradicts "... or is copied as an array of character type" from the cited paragraph. If you copy one object to another as an array of char (like memcpy&memmove, you do transfer the type. It does not matter if you copy linear or use a random sequence of single assignments to copy. And note that the mem-functions don't take a char *, but void *. – Armed 1/12, 2015 at 13:9

@supercat: And the exception for char is actually a legacy. It certainly would be better to have a distinct byte type or - possibly - allow void * to be bytes-pointers (like gcc does). But that would definitively break a lot of used code. – Armed 1/12, 2015 at 13:13

@Olaf: The proper thing for the Committee to have done would have been to define an intrinsic which must be used between a read and write of different non-character types, and should be used with character types, and then deprecate the use of character types to copy data without the intrinsic. As it is, a lot of loops that use character types run 2-8 times as many iterations as those using other types, and have their performance killed by the implicit aliasing barriers that get inserted on every iteration; an intrinsic, by contrast, could be placed by the programmer outside the loop. – Airworthy 1/12, 2015 at 14:7

@supercat: You really thing a modern compiler will not optimise? Do you think memcpy always copies byte-wise, too? The stansard only requires the code behave "as-if". – Armed 1/12, 2015 at 14:15

@Olaf: Of course memcpy will physically use means other than character-by-character copying. The question is whether compilers are allowed to use knowledge of the source operand type to decide that the destination won't alias any other type, notwithstanding the fact that in many cases where programmers use memcpy rather than assignment the whole purpose is to be able to bypass the aliasing rules. – Airworthy 1/12, 2015 at 15:32

Experimentally, gcc 6.2 behaves in ways which would only be justifiable by regarding memmove as transferring the effective type of the source to the destination. If gcc can determine that the source and destination pointers match, it will treat the memory operand as only being readable via its earlier Effective Type, rather than as memory which was last written using a character type and may thus be accessed using any type. Such behavior would be unjustifiable without the rule that allows memcpy to transfer effective-type information.

On the other hand, gcc's behavior is sometimes not justifiable under any rule, so it's not necessarily clear whether gcc's behavior is a consequence of its authors' interpretation of the Standard, or whether it's simply broken. For example, if it can determine that the destination target of memcpy contains the same constant bit pattern as the source, it will treat the memcpy as a no-op even if the source held the type that would next be used to read the destination storage, and the destination held a different type the compiler had decided couldn't alias the next read.

Airworthy answered 8/9, 2016 at 23:28 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags