Is it undefined behaviour to memcpy from an uninitialized variable?
Asked Answered
K

3

26

Is using an uninitialized variable as the src for memcpy undefined behaviour in C?

void foo(int *to)
{
  int from;
  memcpy(to, &from, sizeof(from));
}
Kenley answered 28/10, 2015 at 14:20 Comment(5)
The behaviour is not undefined. Note, however, that the compiler may remove the memcpy altogether, since the value of *to at the end of the function is undefined anyway.Raskind
I assume that using *to later is undefined behaviour, just like using from directly, but this question is about the memcpy itself. I think *to = from would be undefined behaviour, even though the value is not "really" used.Kenley
Related question, for C++: https://mcmap.net/q/23290/-is-copying-trivially-copyable-objects-always-defined-in-c-14/103167Lovell
@Raskind that doesn't make any sense. There is no such thing as "undefined values".Dudleyduds
@Raskind So you are saying that the post condition of copying bytes is NOT that the copy is equal to the original, right?Spatterdash
Q
17

The C committee proposed response to defect report 451: instability of uninitialized automatic variables is:

The answer to question 3 is that library functions will exhibit undefined behavior when used on indeterminate values.

The question in the defect had sought an exemption for memcpy and fwrite if this was indeed the case saying:

[...] The fact that one wants to be able to copy uninitialized padding bytes in structs using memcpy without undefined behavior is the reason that using the value of an uninitialized object is not undefined behavior. This seems to suggest that an fwrite of a struct with uninitialized padding bytes should not exhibit undefined behavior.

This part of the propose response seems to be aimed at that concern over uninitialized padding:

The committee also notes that padding bytes within structures are possibly a distinct form of "wobbly" representation.

We can see form defect report 338: C99 seems to exclude indeterminate value from being an uninitialized register this is somewhat of a change from past expectations. It says amongst other things:

[...] I believe the intent of excluding type unsigned char from having trap representations was to allow it to be used to copy (via memcpy) arbitrary memory, in the case that memory might contain trap representations for some types.[...]

The blog post Reading indeterminate contents might as well be undefined covers the evolution of reading indeterminate values in C well and make some more sense of the changes I mention above.

It is worth noting this differs from C++ where reading an indeterminate value from a narrow unsigned char is not undefined behavior and defect report 240 notes this difference:

The C committee is dealing with a similar issue in their DR338. According to this analysis, they plan to take almost the opposite approach to the one described above by augmenting the description of their version of the lvalue-to-rvalue conversion. The CWG did not consider that access to an unsigned char might still trap if it is allocated in a register and needs to reevaluate the proposed resolution in that light. See also issue 129.

Quilt answered 28/10, 2015 at 15:10 Comment(20)
I think this is the best answer so far, but ultimately the committee never really answers whether having memcpy read from an uninitialized variable is undefined behaviour by itself.Kenley
@TorKlingberg no it does not directly address that, although since the main concern with memcpy is padding and there is a bullet that covers that concern it does address it indirectly.Quilt
IMHO the C11 description of indeterminate values should be viewed as a bugfix for what was poorly or badly specified in earlier revisions (although the Itanium exception is a new addition).Dudleyduds
@Dudleyduds it is kind of funny since you are quoting the same article in your answer.Quilt
@ShafikYaghmour yeah, I quoted it as reference for the rationale about the Itanium clause, not for the rest of it! I guess you are doing the sameDudleyduds
@Dudleyduds we are kind of going off-topic here but I think the main point still stands, I don't personally have any issues with the post but then I like his work in general so perhaps I am biased.Quilt
@Dudleyduds multiplying an unspecified value by 2 and taking the result modulo 2^32 produces an even result. Only undefined behavior explains an odd value appearing as result of that computation. What you describe as “j *= 2 leaves the indeterminate value indeterminate” is not how “indeterminate” is described to work in the published standards, although it is how a proposed separate notion of “wobbly values” would work.Gustative
@PascalCuoq according to DR 451's Proposed Committee Response, "any operation performed on indeterminate values will have an indeterminate value as its result". So the result of 2 * indeterminate is indeterminate. In other words, indeterminate values are unstable and inspecting their contents may give different results from time to time. For rationale see the bullet point "Strong sentiment formed..." near the bottom. Compilers already perform this optimization.Dudleyduds
@Dudleyduds First, if that moderates your “rubbish” sentiment of the blog post, note that it predates DR 451 by several months. Second, in C99 the adjective “indeterminate” applies only to memory locations, not to the result of computations, so that a sentence like “the result of 2 * indeterminate is indeterminate” is nonsensical and cannot be said to clarify the intent of the standard (not that it was written with respect to that version of the standard). I personally find it amazing how much the committee has managed to break in the standard in what was supposed to be a conservative update.Gustative
OK, removed that comment. "indeterminate" is short for "indeterminate value", not just something that applies to memory locations. In fact the Itanium register issue reinforces that. I think DR451 shows that C11 was underspecified. You seem to be interpreting the definition of "indeterminate value" as "either an unspecified value or a trap representation" to mean that once a value is declared without initializer, it takes an unspecified value and is no longer indeterminate. However, checking use of the term "indeterminate value" throughout the rest of the standard, it's clear that [cont]Dudleyduds
uninitialized values continue to be indeterminate afterwards, until a non-indeterminate value is stored in them; so the result of reading such a value is unspecified each time (not just the first time). I doubt it was the committee's intent that uninitialized values be stable, or that int x; requires that x - x == 0 be true. (In C99 it was just undefined, C11 attempts to provide some definition, but not ideally so as evinced by the existence of DR451)Dudleyduds
@M.M: For unsigned types with no padding, I would suggest that while there is optimization value in allowing the lifetime of a variable to be deferred until the first time it is read or written, the value of any optimizations that could be achieved by allowing it to remain indeterminate even after it is observed is limited compared to the semantic value that could have been obtained by having the first read convert the value from "Indeterminate" to "Unspecified". Signed values are a different story, and it might be useful from both an optimization and semantic perspective to have...Epirus
...cardinal-number types whose range would be the same as unsigned types, but whose behavior would be more like those of signed types [e.g. if an unsigned-int-sized cardinal-number variable held UINT_MAX, incrementing it would not be required to wrap]. I wonder if the people who are deciding to grant all these freedoms to compiler writers have any appreciation for the foregone value of the semantic guarantees they're tossing out the window.Epirus
@Epirus I don't think there's any semantic value to be gained by making compilers emit code for working on uninitialized variables. Except possibly that sparse two-way array thing.Dudleyduds
@M.M: There is IMHO significant semantic value in saying that things may either behave predictably or trap in implementation-defined fashion, at the compiler's discretion, but they must do one or the other; there is also in some cases some value in being able to specify that certain initial values are "don't care" but should behave consistently. The two-way array thing is semantically useful, but it will be undermined if a compiler can take unsigned int x=arr1[index]; if (x < arr2_size && arr2[x]==index) ...act on valid item and decide that since x is indeterminate, it can be...Epirus
...zero when compared against size, and a value larger than arr2 when used as an array index. Do you see any realistically-useful optimizations which are gained by breaking the two-way array trick? (I wouldn't mind requiring code to use an intrinsic before using an indeterminate value to prevent a trap, btw).Epirus
The Itanium example comes to mind (the one that may trap even though there are no trap representations) . Are you suggesting something like int x = __unspecified; or something? Which means that x is not indeterminate, and an unspecified value is selected at that point and the variable is no longer 'wobbly' or optimizable?Dudleyduds
@M.M: If an implementation defines the trapping behavior on the Itanium that would satisfy "...or trap in implementation-defined fashion". I would be suggesting something like __unspecified for individual values, or __set_unspecified(ptr, size) to do a bunch at once. I wish the Standards Committee would be more pro-active about such things, because if different compilers implement different semantics on the meaning of __unspecified [as would be highly plausible] it may be hard for the Standards Committee to standardize a meaning which isn't inconsistent with any existing behaviors.Epirus
@M.M: More generally, I would suggest that adding optimization-focused directives but changing most forms of Undefined Behavior to Testably-Constrained behaviors (implementations must provide standard means of determining what is or is not guaranteed, but if they report that nothing is guaranteed they would be under no further obligation) would allow more optimizations than are presently possible with UB alone, while making code much more robust.Epirus
@PascalCuoq it is especially interesting to note the difference between how the C++ committee and the C committee choose to resolve this and it is noted in one of the WG21 defects. It makes me wonder what the compilers implementers will do with this. Will gcc and clang treat these forms of UB differently in C then in C++, that seems like an awful result.Quilt
G
5

This is defined behaviour with respect to the action of copying, except if int has a trap representation in your system. Memory was allocated on the stack when int from was defined. The contents of this int is whatever happened to be on that location in the stack at that moment. Therefore the end result, the value of the int that is being copied to to is not defined (indeterminate).

Other answers have quotes from the C standard that undefined behaviour occurs when the value of an uninitialised variable is "used". Which obviously doesn't apply if you don't use the value. There is another mention in the C11 standard undefined behaviour while copying/assigning uninitialised variables :

6.3.2.1p2

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

This also doesn't affect your code because the address of from is taken when you call memcpy

Another relevant part of the C11 standard is 6.2.6.1

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined) Such a representation is called a trap representation.

Some very old processors could have a trap representation for an int either software-visible parity bits or "negative zero" in non-twos-complement architectures. x86 processors for example don't have trap representations for int.

Gemini answered 28/10, 2015 at 14:23 Comment(9)
I understand that the value in *to will be unspecified or uninitialised, but assuming *to is never actually used, is this code valid?Kenley
Yes the code is valid. Once you declare int from you can legally copy the (random) value in contains.Gemini
Please provide a source for that statement. An ideal how C should behave isn't sufficient. In particular, does copying the same uninitialized variable twice result in the same value twice? After all, if it's truly "random data" being copied, it might copy a random bit of the stack, the same stack which memcpy probably uses.Yesterday
@Yesterday Copying the same initialized variable twice is not guaranteed to copy the same value twice. While this variable is being copied another thread may be writing to it. I am perfectly clear in my answer that there are no guarantees regarding the value of the variable being copied. I don't understand your objection is to that. I will add a quote from the standard that wasn't mentioned in the other answers.Gemini
"The contents of this int is whatever happened to be on that location in the stack at that moment. " - this is an implementation detail. In general there might not be a stack (I have used one implementation that did not have a stack). In Standard C the int has indeterminate value; not some particular value that happened to be somewhere.Dudleyduds
@Dudleyduds I agree that the stack is an implementation detail. Even if the implementation doesn't have a stack it still has some sort of memory and the uninitialised variable occupies some space at a specific address (somewhere) in there.Gemini
It still doesn't have a particular value. For example int x; int y = x - x; does not produce y = 0 in generalDudleyduds
Correct. Since the address of x is never taken that is undefined behaviour in C11 (6.3.2.1p2)Gemini
@ManosNikolaidis: The issue isn't that another thread may be writing to it; the issue is that the Committee decided to give compilers room to make optimizations which would be unlikely to have any real value, at the expense of semantic sanity.Epirus
B
3

( Earlier -Didn't notice address of from is passed) . No , that won't cause undefined behaviour , just that from has indeterminate value. As far as you don't intend to use the value of uninitialized variable , program will have a well defined behaviour.

(As allocating space and not initializing variables is not UB .)

Burlington answered 28/10, 2015 at 14:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.