(Why) is using an uninitialized variable undefined behavior?
Asked Answered
M

7

106

If I have:

unsigned int x;
x -= x;

it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).

Two questions:

  • Is the behavior of this code indeed undefined?
    (E.g. Might the code crash [or worse] on a compliant system?)

  • If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?

    i.e. What is the advantage given by not defining the behavior here?

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?

Marauding answered 14/8, 2012 at 23:47 Comment(17)
possible duplicate of Why does the C standard leave use of indeterminate variables undefined?Timoshenko
@W'rkncacnter: If you look at the answer there, it's answering a slightly different question (why C doesn't initialize variables), not why the behavior is undefined.Marauding
@W'rkncacnter I disagree with that being a dupe. Regardless of whether what value it takes, the OP expects it to be zero after x -= x. The question is why accessing uninitialized values at all is UB.Detrude
It's interesting that the statement x=0; is typically converted to xor x,x in assembly. It's almost the same as what you are trying to do here, but with xor instead of subtraction.Boisterous
There's also What happens to a declared, uninitialized variable in C -- does it have a value, whose accepted answer definitely does address UB.Timoshenko
'i.e. What is the advantage given by not defining the behavior here? ' -- I would have thought that the advantage of the standard not listing the infinity of expressions with values that don't depend on one or more variables to be obvious. At the same time, @Paul, such a change to the standard would not make programs and libraries any bigger.Intestinal
Similar: stackoverflow.com/questions/25074180/…Teaspoon
@MattMcNabb: You should probably link the other one to this one, considering this one came 2 years earlier.Marauding
@Mehrdad OK, did a comment link. Both questions have good and established answers so closing as duplicate is probably not appropriate, although perhaps a moderator could do a merge.Teaspoon
@MattMcNabb: Yeah we can probably leave them as is, they're not quite duplicates I think.Marauding
@JimBalter: Allowing indeterminate values to behave strangely can allow useful optimizations. For example, given uint16_t foo(void) {uint16_t result; , followed by various statements, each of which may or may not write result and then return result;}, it may be helpful to have the compiler keep result in a 32-bit register and then return that. If anything stores a value to result, the compiler will ensure the value stored is 0..65535, but if nothing writes to result, keeping the return value within that range would require adding an extra instruction.Wrench
@Wrench One of your typical 4 year late non sequiturs. My comment was specifically about "expressions with values that don't depend on one or more variables" -- in this case, x - x. Were the Standard to specify that uint16_t foo(void) {uint16_t result; result -= result; return result;} returns 0, this would not make conformant programs and libraries bigger. We don't worry about buggy code producing larger binaries. We do want the compiler to be able to optimize conformant programs by taking advantage of undefined behavior, and the added specification wouldn't change that.Intestinal
This question was discussed on HackerNews, with responses from C experts, at news.ycombinator.com/item?id=22867059Marc
@MaxBarraclough: Wow, thanks a ton for sharing. This page they linked to was pretty enlightening. So, for anyone else reading this, the tl;dr seems to be that (a) the code is undefined; (b) if you take the address of the source, then it's unclear according to the standard whether it'd be undefined, but (c) compilers treat that as undefined too, so we might as well.Marauding
@Marauding I don't think your b) is accurate, see my comments in the HackerNews thread. Also see the comment by msebor, a C expert, which makes no mention of taking the address.Marc
@MaxBarraclough: I saw his comments; they don't contain any quotes from the standard to back them up, whereas people here have been quoting the standard. Note that another similar C expert there actually misremembered what the standard said about type-punning, and someone had to correct him. Did you see this comment below? It said this question was on the C committee's mailing list in 2015 and there was disagreement between the spec and their intentions. I think my summary captured it pretty darn accurately..Marauding
Does this answer your question? Why does the C standard leave use of indeterminate variables undefined?Halftimbered
G
109

Yes this behavior is undefined but for different reasons than most people are aware of.

First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.

What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.

Edit: The relevant phrase of the standard is 6.3.2.1p2:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

And to make it clearer, the following code is legal under all circumstances:

unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
  • Here the addresses of a and b are taken, so their value is just indeterminate.
  • Since unsigned char never has trap representations that indeterminate value is just unspecified, any value of unsigned char could happen.
  • At the end a must hold the value 0.

Edit2: a and b have unspecified values:

3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.

Garzon answered 15/8, 2012 at 7:13 Comment(48)
Regarding your last point: I see why it's reasonable that a becomes 0 but how is that guaranteed by the standard? Before the assignment the value of a is indeterminate. Doesn't that include that two accesses might return different values? Or does the C standard guarantee that an indeterminate value stays the same indeterminate value between two accesses?Resemblance
@NikolaiRuhe. No it is not indeterminate, it is unspecified. Basically this means that the standard doesn't impose any particular value but it has a valid value in that range, usually this corresponds just to the bit pattern that is found at that address. This value is subtracted from itself, so result is 0.Garzon
The standard requires writing to a variable must cause all of the unsigned char constituent parts to be written with non-trap values. Does it require that variables which are not written must be non-trap forms? I would think a compiler running on a machine with parity-checked memory (e.g. the original IBM PC) should be allowed to fill undefined memory with trap vales if it were so inclined, such that any fetch would trigger a trap.Wrench
@supercat, a particular bit pattern may constitute a trap value only for a particular type, and be a regular value when interpreted as another type. So yes, under such architecture that you describe the individual bytes wouldn't be traps, and the composition of all these bytes when interpreted as int could be a trap. If with "trigger a trap" you mean "raise an implementation defined signal", then yes, an implementation could implement int like that.Garzon
Perhaps I'm missing something, but it seems to me that unsigneds can sure have trap representations. Can you point to the part of the standard that says so? I see in §6.2.6.2/1 the following: "For unsigned integer types other than unsigned char, the bits of the objectrepresentation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). ... this shall beknown as the value representation. The values of any padding bits are unspecified. ⁴⁴⁾" with the comment saying: "⁴⁴⁾ Some combinations of padding bits might generate trap representations".Constantan
Continuing the comment: "Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types." - That's great once we have a valid value to work with, but the indeterminate value might be a trap representation before being initialized (e.g. parity bit set wrong).Constantan
@Constantan You're correct for all types other than unsigned char, but this answer is using unsigned char. Note though: a strictly conforming program can calculate sizeof(unsigned) * CHAR_BIT and determine, based on UINT_MAX, that particular implementations cannot possibly have trap representations for unsigned. After that program has made that determination, it can then proceed to do exactly what this answer does with unsigned char.Dorena
Can you explain how is that memcpy defined in regards with the first standard rule (6.3.2.1p2) you posted. I think your reasoning is not correct because you think that if an automatic variable has its address actually taken then it is exempt from the rule. My reasoning; even it's address is taken it still could have been declared with register, even if it wasn't in this case, therefore the behavior is undefined. The rule applies to any automatic object: that could have been declared with the register storage class. That doesn't mean it has to be. (I hope my comment was clear.) Thoughts?Paperhanger
@JensGustedt: Isn't the memcpy a distraction, i.e. wouldn't your example still apply if it were replaced by *&a = *&b;.Expander
@R.. I am not sure anymore. There is an ongoing discussion on the mailing list of the C committee, and it seems that all of this is a big mess, namely a large gap between what is (or has been) intended behavior and what is actually written up. What is clear though, is that accessing the memory as unsigned char and thus memcpy helps, the one for *& is less clear. I'll report once this settles down.Garzon
Just to add to the discussion: blogs.msdn.microsoft.com/oldnewthing/20040119-00/?p=41003. As far as I understand, UB trumps all other guarantees, including the guarantee that unsigned char has no trap representation.Glucoprotein
@Vlad, I don't have the impression that this has much to say. First it seems to be mostly about C++, and that is certainly different, here. And then this seem to be MS compilers, no? As I said, the intent expressed by the C committee seems to be that access as bytes (any of the character types) always has defined behavior.Garzon
@Jens: Okay. There is a discussion about NaT here, this comment suggests that NaT is not a trap representation. Given your claim "Accessing [indeterminate value] then is UB if the value happens to be a trap representation for the type" (did you mean "if and only if"?), there seems to be a contradiction. There is a [linked defect report, suggesting changes about unsigned char guarantees.]Glucoprotein
@Jens: So all of this might still be related to the discussed topic.Glucoprotein
The NaT is a state of a hardware register. I think this is the origin of this idea of an object "could have been declared with the register storage class". As soon as you access the data through memory as bytes, it can't have the NaT state, that's the whole idea.Garzon
@JensGustedt: Do you happen to have a link to the C committee mailing list email that you mentioned? (if it's publicly visible)Marauding
@Vlad: There are times when useful optimizations could be achieved by allowing a read of an Indeterminate value to behave in a fashion contrary to any defined behavior for values of that type, even for types like uint16_t where every possible bit pattern for the underlying storage would have defined behavior. If such things aren't trap representations, what else could they be?Wrench
@supercat: Well, depends on the exact Standard wording. If the Standard requires that every uint16_t value is either a valid bit combination or a trap representation even in the presence of undefined behavior, than you are right. If however UB voids all the other requirements, then it can be anything including a nasal demon instance.Glucoprotein
@Vlad: If behavior would be defined unless an lvalue is read, but reading the lvalue might behavior inconsistent with its type (e.g. a uint16_t holding 65536), that would imply that the act of reading the lvalue would trigger UB. To me, that would in turn suggest that the lvalue held a trap representation.Wrench
@Paperhanger "even it's (sic) address is taken it still could have been declared with register" -- no, it couldn't. Just try reading the part of the Standard quoted, which reflects the constraints on the & operator: "could have been declared with the register storage class (never had its address taken)"Intestinal
In at least the draft of the C11 standard, Annex J.2 includes "the value of an object with automatic storage duration is used while it is indeterminate" in the list of undefined behavior. Now this annex isn't normative, and it's not clear that the standards body agrees at the cited sections, so maybe it is claiming too much in J.2. Is that your position? Because I read J.2 as saying that even the memcpy example would have UB.Chumley
After reading more, the story gets even more complicated. The C Committee Response to Defect Report #451 (and #260, linked there) indicate that indeterminite values are allowed to appear to change without direct actions of the program. That and other statements in the committee response would, I'd imagine, mean that a -= a would still result in an indeterminite value even if it's not true UB. Do you disagree, and think I'm off base there?Chumley
In your example the result will be unspecified, and not 0. See: open-std.org/Jtc1/sc22/WG14/www/docs/dr_451.htm Note that this also applies for unspecified values.Vulva
@EvanED: What is needed to allow optimization without losing semantics is a recognition of non-deterministic values and ways of forcing them partially or fully determinate. I think it unfortunate that while some people think that if x is indeterminate, x & 15 should be fully determinate, others think it should be fully indeterminate. The former would impede optimizations more than necessary, while the latter would force programmers to clutter their source with code to block optimizations more than necessary. The solution IMHO would be to say...Wrench
...that a variable of type X holds at least one value of type X, but might hold more; if x and y are both of type uint32_t, then (x & y) would be allowed to yield any non-empty subset of the values formed by combinations of possible values for x and y. If x and y start out fully indeterminate, then after "xx = x 3;" xx would hold one or more of {0,1,2,3} and after "yy = y & 10;", yy would hold one or more of {0,2,8,10}. The expression xx+yy would then yield one or more of {0,1,2,3,4,5,8,9,10,11,12,13}. While it might seem hard for compilers to track that...Wrench
...the main usefulness of indeterminate values would be to allow for compilers to use symbolic substitution to reorder operations, so that if e.g. a compiler which is given something "z=xx+yy;" followed sometime later by "w=z;" and later still by another "w=z;" it might replace the latter assignments with "w=(x & 3)+(y & 10);". If "x" or "y" changes in unexpected fashion, that might cause the two assignments to store different values, but it wouldn't cause any value outside the aforementioned set.Wrench
@Vulva in fact, under DR 451, a -= a results in a still being indeterminate (not merely unspecified): under that resultion, the apparent value is unspecified at each observation (aka. "wobbly")Teaspoon
@Teaspoon The report says this: From 3.19.2 it follows that if a type has no trap values, then indeterminate and unspecified values are the same. And in 3.19.3, it is stated explicitly that an unspecified value is chosen. Which implies that the value - after having been chosen - cannot change anymore. This is wrong. An unspecified value can clearly can change at any 'observation': 3.19.3 1 unspecified value valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instanceVulva
@M.M: The more interesting question is the effect of a=a; a -= a;. If the second statement were performed in isolation, the two reads of a might potentially yield different values, since even after the first read nothing would have "set" the value of a. If a read of an Indeterminate value is guaranteed to yield some particular arbitrary value, then after a=a;, a should hold some possibly-unknown but no longer Unspecified value, so the subtract should yield 0. Unfortunately, some compilers don't recognize any way of forcing the compiler to turn a "wobbly" value into a usable one.Wrench
@Wrench you can turn a wobbly-valued variable into a usable one by assigning a non-wobbly value to it . There's very little use case for wanting non-wobbly garbageTeaspoon
@M.M: For various sparse-array and hash table algorithms do a key lookup via uint32_t index = map[key]; if (index < numItems && values[index]=key) ItemFound(...) else ItemNotFound(,,.); If key isn't in the table, map[key] could return any non-wobbly uint32_t value and code would correctly report that it's not found. If index gets assigned a wobbly value, though, there's no way to prevent an out-of-bounds array fetch.Wrench
@Wrench that's why I said "very little" instead of "none at all". And on modern OSs there is no penalty to making a large zero-initialized allocation.Teaspoon
@M.M: C is often used in freestanding implementations where there is no "modern OS" [or any OS for that matter], or where the purpose of the compiled code is to be the OS.Wrench
@Wrench typically, embedded devices would not require a sparse array so large that the initialization time is a measurable problem. Not saying never but it would be a very rare use case.Teaspoon
@M.M: I said or where the purpose of the compiled code is to be the OS, which could extend up to some rather large systems. In cases where an "optimizing" compiler would require a programmer to force the computer to do otherwise-unnecessary work, the value of any potential optimizations may be negated by the unnecessary work. A compiler that could achieve 90% of the optimizations while requiring 0% of the needless work would allow for more efficient code.Wrench
But shouldn't this behavior occur only for auto variables?Donte
@AlphaGoku: Yes, but a function might reasonably create structures of automatic duration without populating all the members. There are purposes where the benefits of being able to statically prove that all struct members will be written would outweigh the performance benefits of omitting writes that wouldn't be necessary at the machine level for a program to meet requirements, but there are others where the performance benefits of eliminating needless writes would be worth more. The Standard is intended to let implementations choose whichever approach would best serve their customers, not...Wrench
...to imply any judgment about which approach would be more useful in any particular situation.Wrench
I don't quite understand the sudden "that could have been declared with the register storage class" in the context of the code shown; it seems to clearly just bee automatic storage that is not register.Nighttime
Oh wait, now I realize, but I don't like itNighttime
The "does not have character type" implies that uint32_t (for example) can have trap representation (if width of character type != width of uint32_t). However, uint32_t is guaranteed to have "no padding bits, and a two’s complement representation" (C11, 7.20.1.1). Then how uint32_t can have trap representation? Any examples?Coreen
You claimed "using an unitialized value is by itself not undefined behavior" but the C Standard says that using an indeterminate value of a variable with automatic storage duration (which this undoubtedly is) always classifies as undefined behavior. See section J.2 "Undefined Behavior".Trilogy
@BenVoigt, Annex J is not normative. And in fact you should read that one as "in some cases ..."Garzon
@JensGustedt: Indeed, in C89 the behavior would have been defined for all types which don't have trap representations, though even in the C89 days many implementations would not always have behaved in a manner consistent with storage holding an arbitrary bit pattern. Unfortunately, the Standard has no vocabulary to characterize a behavior which is less specific than behaving as though an object holds an unspecified bit pattern, but is more specific than "anything can happen" Undefined Behavior. Treating partially-initialized aggregates with loose semantics could allow optimizations...Wrench
...that would not be possible if all objects had to be regarded as holding some (possibly initially unknown and arbitrary) bit pattern, but only if programmers who only needed such loose semantics could leave objects partially uninitialized. Given struct foo t; extern struct foo x,y;, along with code that partially initializes t, saying that x=t; y=t; may leave portions of x and y that weren't set in t holding independent Unspecified bit patterns would seem better than having to either require that x and y match, or that such action would trigger "anything can happen" UB.Wrench
@Wrench Can you address my comment above about uint32_t? Note: I agree with Jens that "the NaT is a state of a hardware register", the NaT is not contained in the object ("region of data storage in the execution environment, the contents of which can represent values").Coreen
@pmor: The authors of the Standard made no particular effort to ensure that no constructs which should have a defined meaning were characterized as UB, but did seek to avoid saying more than they had to about the behavior of non-portable programs. There shouldn't be anything special about character types beyond the fact that they're the only types that are guaranteed to (1) exist, (2) have no trap representations, and (3) have no alignment requirements. The proposition that that other types exist with the first two characteristics would represent a non-portable assumption, though one...Wrench
...which would in practice have to be true on any platform where type uint32_t exists. I think NaT is a read herring, since even in C89 days there have been implementations where e.g. unsigned short x; unsigned long y; ...code that doesn't affect x ... y=x; could set y to a value beyond the range of an unsigned short, and I wouldn't be surprised if on some of that could even happen with if (x < 65536) y=x;.Wrench
H
26

The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.

Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.

Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment

unsigned x, y, z;   /* 0 */
y = 0;              /* 1 */
z = 4;              /* 2 */
x = - x;            /* 3 */
y = y + z;          /* 4 */
x = y + 1;          /* 5 */

When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:

r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;

The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.

For a more elaborate example, consider the following code fragment:

unsigned i, x;
for (i = 0; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}

Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written

unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}

The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:

  • during the first loop iteration, x is uninitialized by the time -x is evaluated.
  • -x has undefined behavior, so its value is whatever-is-convenient.
  • The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.

When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.

I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)

This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.

Hufnagel answered 15/8, 2012 at 0:51 Comment(5)
+1 Nice explanation, the standard is taking special care of that situation. For a continuation of that story see my answer below. (too long to have as a comment).Garzon
@JensGustedt Oh, your answer makes a very important point that I (and others) missed: unless the type has trap values, which for an unsigned type requires “wasting” at least one bit, x has an uninitialized value but the behavior on accessing would be defined if x didn't have register-like behavior.Ladysmith
@Gilles: at least clang makes the kind of optimizations you mentioned: (1), (2), (3).Glucoprotein
What practical advantage is there to having clang process things in that fashion? If downstream code never uses the value of x, then all operations on it could be omitted whether its value had been defined or not. If code following e.g. if (volatile1) x=volatile2; ... x = (x+volatile3) & 255; would be equally happy with any value 0-255 that x might contain in the case where volatile1 had yielded zero, I would think an implementation that would allow the programmer to omit an unnecessary write to x should be regarded as higher quality than one which would behave...Wrench
...in totally unpredictable fashion in that case. An implementation that would reliably raise an implementation-defined trap in that case might, for certain purposes, be regarded as being of higher quality yet, but behaving totally unpredictably seems to me like the lowest-quality form of behavior for pretty much any purpose.Wrench
D
17

Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.

(6.2.6.1 on a late C11 draft says) Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)

Dumm answered 14/8, 2012 at 23:50 Comment(16)
Can you give at least one example of a bit pattern for an integer that can drive CPU crazy?Bookworm
@VladLazarenko: This is about C, not particular CPUs. Anyone can trivially design a CPU that has bit patterns for integers that drive it crazy. Consider a CPU that has a "crazy bit" in its registers.Locust
@VladLazarenko, depends on the CPU. There are none for integers on x86.Dumm
So can I say, then, that the behavior is well defined in case of integers and x86?Bookworm
@VladLazarenko: Well, if it's undefined, then no -- unless your compiler specifically says it's defined, you can't assume it's defined, because you can't assume the compiler will emit the instructions you expect (it will likely avoid doing so, for optimization).Marauding
Well, theoretically you could have a compiler which decided to only use 28-bits integers (on x86) and add specific code to handle each addition, multiplication (an so forth) and ensure that these 4 bits go unused (or emit a SIGSEGV otherwise). An uninitalized value could cause this.Dumm
I hate when someone insults everyone else because that someone doesn't understand the issue. Whether the behavior is undefined is entirely a matter of what the standard says. Oh, and there is nothing at all practical about eq's scenario ... it's entirely contrived.Intestinal
P.S. David Schwartz's idea under the other answer is a more practical idea and suggests another ... suppose that physical memory isn't allocated to virtual addresses until initialized or written to; then accessing an uninitialized variable could result in an access violation.Intestinal
@Glucoprotein Lazarenko: Itanium CPUs have a NaT (Not a Thing) flag for each integer register. The NaT Flag is used to control speculative execution and may linger in registers which aren't properly initialized before usage. Reading from such a register with a NaT bit set yields an exception. See blogs.msdn.com/b/oldnewthing/archive/2004/01/19/60162.aspxNoisy
This explanation is insufficient, it only states half of the story for the case that the value happens to be a trap representation. It still is UB by the standard, but for another reason. Please see my answer.Garzon
@Dumm you are just not given the good reasons. In this case the UB has nothing to do with trap representations. It comes from the fact that the address of the variable is never taken. So I take it back, you are not telling half the story, you are telling the wrong story.Garzon
@JensGustedt, too much real-world thinking spoils good theoretical issues :(Dumm
@Dumm probably my English isn't good enough to capture what you are trying to say. This is not a theoretical issue. Unspecific values may be used under certain circumstances, in my answer I have given valid code for that.Garzon
@JensGustedt, my example, it seems, is more of a theoretical issue for all but theoretical implementations of unsigned integer types.Dumm
This answer is incorrect where it states “So yes, the behavior is indeed undefined.” As the answers of myself and Jens Gustedt show (with citations from the C standard, which this answer does not provide), taking the value of an uninitialized object does not by itself cause uninitialized behavior. In C 1999, undefined behavior only occurs if certain other conditions are met, and those conditions are not met for integer types on most common systems. See Jens Gustedt’s answer for the C 2011 situation.Coldiron
@EricPostpischil: It is not uncommon for uninitialized variables to behave as though they have values outside the range of their type. IMHO, there should be a category of behavior to cover such things, which would--unlike Implementation-Defined behavior--not require an implementation to define what would happen in detail, but--unlike UB--would not grant compilers unlimited latitude either.Wrench
C
13

(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)

The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)

3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.

So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.

Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).

If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.

Coldiron answered 15/8, 2012 at 0:54 Comment(12)
If x has a trap representation, then x -= x might trap, right? Still, +1 for pointing out unsigned integers with no extra bits must have defined behavior -- it's clearly the opposite of the other answers and (according to the quote) it seems to be what the standard implies.Marauding
Yes, if the type of x has a trap representation, then x -= x might trap. Even simply x used as a value might trap. (It is safe to use x as an lvalue; writing into an object will not be affected by a trap representation that is in it.)Coldiron
unsigned types rarely have a trap representationGarzon
Quoting Raymond Chen, "On the ia64, each 64-bit register is actually 65 bits. The extra bit is called “NaT” which stands for “not a thing”. The bit is set when the register does not contain a valid value. Think of it as the integer version of the floating point NaN. ... if you have a register whose value is NaT and you so much as breathe on it the wrong way (for example, try to save its value to memory), the processor will raise a STATUS_REG_NAT_CONSUMPTION exception". I.e., a trap bit can be completely outside the value.Ossy
−1 The statement "If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior." fails to consider schemes like the x64 NaT bits.Ossy
@Cheersandhth.-Alf: Even on conventional 32-bit machines, it would not be unusual for an uninitialized variable of type uint16_t to have a value outside the range 0..65535, and for a function of return type uint16_t that returns that variable to pass its value through to the caller without masking.Wrench
@supercat: uint16_t (from <stdint.h>) is an exact width type. And the C++ standard only permits tree possible value encodings, which for 16 bits cannot produce "a value outside the range 0..65535", which you claim is "not unusual". I.e. you're just wrong about that. The problem isn't exceeding the value range, and in practice not even trap representations, but possible additional information about the value, or rather, about the lack of a specified value.Ossy
@Cheersandhth.-Alf: I've seen a number of compilers, including gcc's ARM compilers, generate code where registers allocated to uninitialized variables can hold arbitrary values which need not fit the variables' range. E.g. ARM gcc 4.8.2 given uint16_t foo(uint32_t x, uint32_t y, uint32_t z) { uint16_t q; if (x) q=x; return q; } will generate code that, if invoked from outside code, will return all 32 bits of z if x is zero.Wrench
@Cheersandhth.-Alf: If use of such variables is UB, such code is conforming. If it's not, such code might or might not be conforming, but it's been commonplace behavior for a long time and it allows more efficient code than would otherwise be possible [though in the above case gcc generates needlessly-inefficient code].Wrench
@supercat: I see what you mean, that bits outside the variable can be affected. And if e.g. the result of foo() is converted to 32 bits under an assumption that the higher bits of its 32-bit location are zero, then oops. So it's a real problem that I didn't think of.Ossy
@Cheersandhth.-Alf: I think the Standard regards use of Indeterminate Value as UB because that's easier than trying to describe everything that can happen, but I think that's unfortunate because there are many cases where code "passes through" values that may or may not be meaningful to recipients that may or may not use them (but who won't use them if they're not meaningful), and making any rvalue conversion of Indeterminate Values invoke Undefined Behavior makes it necessary to add code to ensure that Indeterminate Values can't get passed through.Wrench
@Cheersandhth.-Alf: I would like to see the Standard recognize the concept of storage locations holding a non-deterministic union of values, such that operations that must yield a definite result (e.g. an "if" test) can behave as though the storage location held any value it might hold, and other operations (like "+") can yield a non-deterministic union of all values that could have been yielded by source operands.Wrench
B
13

For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:

  • In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
  • Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
  • Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].

    The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.


[1]: C11 6.3.2.1:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

[2]: C11 6.2.6.1:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

[3] C11:

3.19.2
indeterminate value
either an unspecified value or a trap representation

3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.

3.19.4
trap representation
an object representation that need not represent a value of the object type

Barrus answered 18/11, 2016 at 10:35 Comment(18)
I would argue this resolves to "It is always undefined behavior" as the C abstract machine -can- have trap representations. Just because your implementation does not use them does not make the code defined. In fact a strict reading would not even insist the trap representations have to be in hardware from what I cant tell I don't see why a compiler could not decide a specific bit pattern is a trap, check for it every time the variable is read and invoke UB.Concentric
Note that possibly unsigned char is exempt from this for reasons mentioned above.Concentric
@Concentric In the real world, 99.9999% of all computers are two's complement CPUs without trap representations. Therefore no trap representation is the norm and discussing the behavior on such real-world computers is highly relevant. To assume that wildly exotic computers is the norm isn't helpful. Trap representations in the real world are so rare that the presence of the term trap representation in the standard is mostly to be regarded as a standard defect inherited from the 1980s. As is support for one's complement and sign & magnitude computers.Barrus
By the way, this is an excellent reason why stdint.h should always be used instead of the native types of C. Because stdint.h enforces 2's complement and no padding bits. In other words, the stdint.h types aren't allowed to be full of crap.Barrus
Again the committee response to the defect report says that: "The answer to question 2 is that any operation performed on indeterminate values will have an indeterminate value as a result." and "The answer to question 3 is that library functions will exhibit undefined behavior when used on indeterminate values."Husch
DRs 451 and 260Husch
@AnttiHaapala Yes I know of that DR. It doesn't contradict this answer. You may get an indeterminate value when reading an uninitialized memory location and it is not necessarily the same value every time. But that is unspecified behavior, not undefined behavior.Barrus
Detail on <stdint.h>. The optional Exact-width integer types are no padding bits, and a two’s complement representation. The header also has required Minimum-width integer types and Fastest minimum-width integer types are not specified to be no padding bits, and a two’s complement representation.Noel
@AnttiHaapala: The authors of the Standard expect that people seeking to produce quality implementations will try to support behavioral guarantees beyond those mandated by the Standard in cases where the benefit to users would exceed the cost. For many implementations' intended purposes, it would be useful, and cost almost nothing, to guarantee that when the bytes of an object hold Indeterminate Values, every read of the object will behave as though those bytes at worst held a (possibly different) Unspecified Value. That may not be true of all implementations, however.Wrench
@AnttiHaapala: The DR says that implementations aren't required to offer such guarantees, which by default makes the question of whether to support them a Quality of Implementation issue. Neither implementations for purposes which would be incompatible with such guarantees, nor garbage-quality-but-conforming implementations, should be expected to uphold such guarantees, but the Standard is silent on the issue of whether general-purpose implementations that don't should be recognized as being of inferior quality. I think they should, but that's a matter of judgment.Wrench
gcc.godbolt.org/z/84f76PnWY I do not think that “unspecified behavior” applies to this example. You might as well call it undefined behavior, it would be shorter than trying to make a sentence to explain this. (The GCC developers are aware of this example and say that their compilation of it i according to the spirit of the standard, something something wobbly values that are still not described in C23.)Micmac
@PascalCuoq Sure it does, that example is making assumptions regarding the value of a variable with indeterminate value. The compiler is free to always assign the very same value to it, in case it fancies, and optimise accordingly.Barrus
@Barrus This sounds like the description of another example. In my example, when f is applied to 0, an unsigned char variable (the unsigned char type never has trap representations) has a value greater than 500. Or the word “value” is meaningless, to the point that one might as well save words and call the situation UB.Micmac
@PascalCuoq What I meant is that the compiler is allowed to assume that the contents of the unsigned char is always a certain fixed, unspecified value, known to the compiler but not to the programmer. This is the very definition of an unspecified value. Unspecified behavior means just that: the compiler is allowed to implement deterministic but perhaps surprising behavior if relied upon by the programmer, and it need not document how. Notably your example yields the same machine code for i>100 too, so this could as well be a gcc bug. clang generates different code.Barrus
@PascalCuoq What's important here from the "language lawyer" perspective is that nothing in the C standard says that accessing an indeterminate value results in UB, given the premises listed in this answer. What certain compilers do and don't is a conformance and/or quality of implementation question for that specific compiler. To just yell "undefined behavior" and run off into the woods, as gcc appears to do, is not conforming IMO.Barrus
@Lundin: One issue is that the Standard allows, probably deliberately, for compilers to assign a 32-bit register to hold an 8-bit unsigned char; value and then, if the value is used before initialization, behave as though it somehow holds a value outside the range 0-255. It also clearly allows a compiler that ensures that any register used for an unsigned char always holds a value 0-255 to omit code that be irrelevant if it does. As a general rule, the Standard makes no attempt to consider when compilers should or should not be allowed to combine optimizations which would be harmless...Wrench
...if applied separately, but would cause a program to completely unravel if combined. It would be useful if the Standard could recognize a category of implementations which will recognize when aspects of behavior that would have been irrelevant in code as written become relevant as a result of compiler optimization. Given e.g. unsigned char a; ... unsigned q=a; proc1(q); if (q < 256) proc2(q);, such an implementation could generate code that might store an unmasked value into q, pass it to proc1, and then skip proc2 if the value was over 255, or it could mask the value, but...Wrench
...it would not be allowed to generate code that could invoke proc2 if q is greater than 255.Wrench
L
11

Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?

Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.

Locust answered 14/8, 2012 at 23:50 Comment(13)
The compiler doesn't have to have special code for this either, though. Simply allocating the space (as always) and not intializing the variable gives it the correct behavior. I don't think that needs special logic.Marauding
@Mehrdad: That's completely false. Consider two cases: 1) Floating point numbers that have representations that don't return zero when subtracted from themselves such as NaNs. 2) Hardware that treats uninitialized memory specially. (In any event, that's not a problem. If you think no special code is needed, then great. The standard doesn't require any. So perfect. If any is needed though, the standard doesn't require the compiler to do it.)Locust
(1) They could've just said implementation-defined, or maybe required it for (unsigned?) integral types, since it isn't any extra work to "leave the contents as-is" anyway. (2) Hmm... I'm not sure I know what you mean. Like how would it treat uninitialized memory specially, and why could that be useful?Marauding
1) Sure, they could have. But I can't think of any argument that would make that any better. 2) The platform knows that the value of uninitialized memory cannot be relied on, so it's free to change it. For example, it can zero uninitialized memory in the background to have zeroed pages ready for use when needed. (Consider if this happens: 1) We read the value to subtract, say we get 3. 2) The page gets zeroed because it's uninitialized, changing the value to 0. 3) We do an atomic subtract, allocating the page and making the value -3. Oops.)Locust
Oooooh, very interesting! That makes a lot of sense, thanks! :)Marauding
Note that even unsigned types are allowed to have padding bits and thus trap representations.Expander
@Mehrdad: Something important to remember though is that those who designed the standard likely didn't have any specific scenario in mind. They could only guess what future computers and hardware would be like and couldn't reliably predict what effect their decisions would have. So they only required behavior that they felt they needed to require to allow people to build correct programs and they viewed every requirement as having potential cost that had to be justified by a benefit. Requiring any predictable behavior for uninitialized data failed that test, in their opinion.Locust
@DavidSchwartz please add your example to the answer --- it's one of the best I've seen.Tomfool
-1 because you give no justification for your claim at all. There are situations where it would be valid to expect that the compiler just takes the value that is written in the memory location.Garzon
@JensGustedt: I don't understand your comment. Can you please clarify?Locust
Because you just claim that there is a general rule, without refering to it. As such it is just an attempt of "proof by authority" which is not what I expect on SO. And for not effectively arguing why this couldn't be an unspecific value. The sole reason that this is UB in the general case is that x could be declared as register, that is that its address is never taken. I don't know if you were aware of that (if, you were hiding it effectively) but a correct answer must mention it.Garzon
This answer is incorrect where it states “Yes, it's undefined.” As the answers of myself and Jens Gustedt show (with citations from the C standard, which this answer does not provide), taking the value of an uninitialized object does not by itself cause uninitialized behavior. In C 1999, undefined behavior only occurs if certain other conditions are met, and those conditions are not met for integer types on most common systems. See Jens Gustedt’s answer for the C 2011 situation.Coldiron
@EricPostpischil: On many real compilers for 32-bit machines, an uninitialized variable of type uint16_t may hold values outside the range 0-65535. How would that be allowable if such values were not considered to be trap representations?Wrench
W
0

While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:

volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
  uint16_t temp;
  if (a)
    temp = y;
  else if (b)
    temp = z;
  return temp;  
}

a compiler for a platform like the ARM where all instructions other than loads and stores operate on 32-bit registers might reasonably process the code in a fashion equivalent to:

volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
  // Since x is never used past this point, and since the return value
  // will need to be in r0, a compiler could map temp to r0
  uint32_t temp;
  if (a)
    temp = y;
  else if (b)
    temp = z & 0xFFFF;
  return temp;  
}

If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

Wrench answered 8/8, 2016 at 16:51 Comment(14)
Interesting. So are you saying the accepted answer is wrong? Or are you saying it's right in theory but in practice compilers may do weirder things?Marauding
@Mehrdad: It is common for implementations to have behavior which goes beyond the bounds of what would be possible in the absence of UB. I think it would be helpful if the Standard recognized the concept of a partially-indeterminate value whose "allocated" bits will behave in a fashion that is, at worst, unspecified, but with additional upper bits that behave non-deterministically (e.g. if the result of the above function is stored to a variable of type uint16_t, that variable might sometimes read as 123 and sometimes 6553623). If the result ends up being ignored...Wrench
...or used in such a way that any possible ways it might be read would all yield final results meeting requirements, the existence of partially-indeterminate value shouldn't be a problem. On the other hand, there is nothing in the Standard which would allow for the existence of partially-indeterminate values in any circumstances where the Standard would impose any behavioral requirements whatsoever.Wrench
It seems to me that what you are describing is exactly what is in the accepted answer -- that if a variable could have been declared with register, then it may have extra bits that make the behavior potentially undefined. That's exactly what you're saying, right?Marauding
@Mehrdad: The accepted answer focuses on architectures whose registers have an extra "uninitialized" state, and trap if an uninitialized register is loaded. Such architectures exist, but are not commonplace. I describe a scenario where commonplace hardware may exhibit behavior which is outside the realm of anything contemplated by the C Standard, but would be usefully constrained if a compiler doesn't add its own additional wackiness to the mix. For example, if a function has a parameter that selects an operation to perform, and some operations return useful data but others don't,...Wrench
...then in the cases where a caller specifies an operation that doesn't return useful data, being able to return an unitialized value may allow slightly more efficient code generation than having to load a meaningless value.Wrench
I think if you read the accepted answer carefully, it does not say that this behavior only exists on architectures with trap representations. Rather, it says that IF such an architecture would have such a problem with a register variable, then the code has undefined behavior -- even if that's not the architecture you're actually targeting. Try re-reading it and let me know if you disagree.Marauding
@Mehrdad: From the accepted answer: Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain. If a 32-bit value used for a uint16_t has its upper bits set, that would represent a state outside the domain of uint16_t, but the processor would neither know nor care that the register was being used for a uint16_t, and would thus see nothing special about the value in the register.Wrench
That quote is talking about the treatment of variables, a C concept. What you just said is about the CPU's treatment of registers, am external concept. So of course the CPU doesn't necessarily know the variables data type, but that's not what the quote is saying. The quote is saying, "when some values may be outside the domain on SOME architectures, the behavior is undefined in the language (i.e. everywhere), because on those particular architectures, it could have been a trap representation".Marauding
@Mehrdad: The quote suggests that the primary reason the behavior is undefined is the existence of hardware registers which recognize an "uninitialized" state. Much of the C Standard was predicated on a philosophy that if behavior was defined on some platforms but not others before C89 was published, leaving it undefined in the Standard should preserve that status quo; such a philosophy still holds in much of the world of commercial embedded compilers (excluding gcc), so the possibility of weird "natural" behavior may be very important in such contexts.Wrench
@Mehrdad supercat likes to post extended comments as answers. This non-answer has no bearing on the question or its accepted answer.Intestinal
@JimBalter: The question asks "why" the Standard says such actions invoke UB. For almost any question "Why does document X say Y" the obvious correct-but-unhelpful answer would be "Because that's what the authors wrote", but that would immediately prompt "What reasons would the authors have had for writing that". I therefore regard questions that ask why document X says Y as implicitly asking "What reasons would the authors of document X have had for saying Y". Do you regard such inferences as inappropriate?Wrench
A C compiler must implement the C language. How it does so is up to the compiler. If the rules of the language on a particular arch rule out undefined behaviour, the compiler has to implement it. For example, if a translation results in a potential trap representation where there must not be one according to C, the compiler has mistranslated the program. So on such architectures, the compiler might have to generate a different, but correct, sequence instead.Paresis
@RememberMonica: Indeed, but the authors of the Standard sought to deliberately classify as Undefined Behavior any situation where there might exist some implementation where a trap representation might cause weirdness, even if 99% of implementations were expected to behave in at least somewhat predictable fashion (e.g. yield some kind of likely-meaningless value).Wrench

© 2022 - 2024 — McMap. All rights reserved.