Is a^a or a-a undefined behaviour if a is not initialized?
Asked Answered
H

3

78

Consider this program:

#include <stdio.h>

int main(void)
{
    unsigned int a;
    printf("%u %u\n", a^a, a-a);
    return 0;
}

Is it undefined behaviour?

On the face of it, a is an uninitialized variable. So that points to undefined behaviour. But a^a and a-a are equal to 0 for all values of a, at least I think that is the case. Is it possible that there is some way to argue that the behaviour is well defined?

Hellbox answered 1/8, 2014 at 6:30 Comment(10)
I would expect this to be well-defined as the value of a is unknown but fixed and it should not change. The question is whether the compiler would allocate the space for a and subsequently read from the garbage sitting there. If not, then the behaviour is undefined.Fleda
Hmm so long as the variable isn't marked volatile then I would accept that as being defined behaviour. a ^= a, is exactly equivalent to a = 0Sabadell
@martin: It is not fixed. The value is allowed to change. This is a very practical consideration. A variable can be assigned to a CPU register, but while it is uninitialized (i.e. its effective value-lifetime hasn't begun yet), that same CPU register can be occupied by a different variable. The changes in that other variable will be seen as an "unstable" value of this uninitialized variable. This is something that is often observed in practice with uninitialized variables.Finality
@AndreyT this is a nice explanationFleda
Never mind, found it, my mistake: stackoverflow.com/questions/20300665/…, and it was in fact for C.Copulate
@Copulate Yes, that all seems quite similar. But most of the discussion is in comments there, and the question was about what ^ means, the UB was accidental and incidental to the question. Here the focus of the question is all about the UB.Hellbox
@DavidHeffernan Absolutely, I was not suggesting to close as duplicate (the question linked is fairly low quality anyway), just that it could be worth a look.Copulate
@Copulate Thanks. Nice to have some more discussion on the topic from others. Appreciated.Hellbox
Testing for unsigned int a; printf("%d\n", !a); would even be closer to the crux of this post by eliminating the multiple access issues.Oscillograph
Similar: stackoverflow.com/questions/11962457/…Darkle
D
77

In C11:

  • It's explicitly undefined according to 6.3.2.1/2 if a never has its address taken (quoted below)
  • It could be a trap representation (which causes UB when accessed). 6.2.6.1/5:

Certain object representations need not represent a value of the object type.

Unsigned ints can have trap representations (e.g. if it has 15 precision bits and 1 parity bit, accessing a could cause a parity fault).

6.2.4/6 says that the initial value is indeterminate and the definition of that under 3.19.2 is either an unspecified value or a trap representation.

Further: in C11 6.3.2.1/2, as pointed out by Pascal Cuoq:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

This doesn't have the exception for character types, so this clause appears to supersede the preceding discussion; accessing x is immediately undefined even if no trap representations exist. This clause was added to C11 to support Itanium CPUs which do actually have a trap state for registers.


Systems without trap representations: But what if we throw in &x; so that that 6.3.2.1/2's objection no longer applies, and we are on a system that is known to have no trap representations? Then the value is an unspecified value. The definition of unspecified value in 3.19.3 is a bit vague, however it is clarified by DR 451, which concludes:

  • An uninitialized value under the conditions described can appear to change its value.
  • Any operation performed on indeterminate values will have an indeterminate value as a result.
  • Library functions will exhibit undefined behavior when used on indeterminate values.
  • These answers are appropriate for all types that do not have trap representations.

Under this resolution, int a; &a; int b = a - a; results in b having indeterminate value still.

Note that if the indeterminate value is not passed to a library function, we are still in the realm of unspecified behaviour (not undefined behaviour). The results may be weird, e.g. if ( j != j ) foo(); could call foo, but the demons must remain ensconced in the nasal cavity.

Darkle answered 1/8, 2014 at 6:36 Comment(5)
Supposing that we knew there were no trap values, could we argue defined behaviour then?Hellbox
@DavidHeffernan You might as well treat access to indeterminate data as UB, because your compiler might, too, even if there are no trap values. Please see blog.frama-c.com/index.php?post/2013/03/13/…Kristoferkristoffer
@Pascal I get that now. That's the final para of Andrey's answer.Hellbox
@DavidHeffernan The examples go as far as 2 * j being odd, which is slightly worse than even the picture in Andrey's answer, but you get the idea.Kristoferkristoffer
When the C89 Standard was written, it was expected that implementations would specify many things that the Standard did not, and the authors of the Standard saw no reason to detail all the cases where an actions should be considered defined on implementations that specify certain things (e.g. the fact that "unsigned int" has no trap representations) but undefined on implementations that don't (e.g. where reading an indeterminate bit pattern as an "unsigned int" might yield a trap representation).Dissident
F
33

Yes, it is undefined behavior.

Firstly, any uninitialized variable can have "broken" (aka "trap") representation. Even a single attempt to access that representation triggers undefined behavior. Moreover, even objects of non-trapping types (like unsigned char) can still acquire special platform-dependent states (like NaT - Not-A-Thing - on Itanium) that might appear as a manifestation of their "indeterminate value".

Secondly, an uninitialized variable is not guaranteed to have a stable value. Two sequential accesses to the same uninitialized variable can read completely different values, which is why, even if both accesses in a - a are "successful" (not trapping), it is still not guaranteed that a - a will evaluate to zero.

Finality answered 1/8, 2014 at 6:37 Comment(10)
Have you got a citation for that final paragraph? If that is so, then we needn't even consider traps.Hellbox
@Matt McNabb: That applies exclusively to unsigned char type - the only type that has no trap representations. So, this is more of an exception from the general case.Finality
An example of an actual, real-life trap representation would be great.Moot
@Matt McNabb: Well, this might be an issue that was resolved differently through different vesrions of the language spec. But the resolution for the DR#260 (open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm) states it clearly an explicitly that variables with indeterminate values can change arbitrarily "by themselves".Finality
@ANdreyT that DR is from 2001, however C11 changed things again (as noted in Pascal's link). Maybe Resolution 3 could be considered to still hold though.Darkle
@Matt McNabb: DR#451 reasserted essentially the same decisions from DR#260 in both Oct 2013 and Apr 2014 open-std.org/Jtc1/sc22/WG14/www/docs/dr_451.htm . The commitree response for DR#451 explicitly states "This viewpoint reaffirms the C99 DR260 position"Finality
@AndreyT thanks, have updated my post to include DR451 .Darkle
@Moot The closest to a trap representation you may have at hand is signaling NaNs. en.wikipedia.org/wiki/NaN#Signaling_NaN Otherwise you need to get a computer with explicit parity bits, a sign-magnitude computer where -0 is considered a trap value, or something equally exotic.Kristoferkristoffer
Concerning "uninitialized variable is not guaranteed to have a stable value". Say code is unsigned int a, b; b = a; printf("%u %u\n", b^b, b-b); Here b is initialized, but to an unknown value, but would be stable. b = a may fire a trap. But without a trap, should not the print result in "0 0"?Oscillograph
@chux: No. There is nothing that restricts undefined behavior to "does what you think, but if not, traps". Literally any behavior is permitted.Hatfield
D
2

If an object has automatic storage duration and its address is not taken, attempting to read it will yield Undefined Behavior. Taking the address of such an object and using pointers of type "unsigned char" to read out the bytes thereof it is guaranteed by the Standard to yield a value of type "unsigned char", but not all compilers adhere to the Standard in that regard. ARM GCC 5.1, for example, when given:

  #include <stdint.h>
  #include <string.h>
  struct q { uint16_t x,y; };
  volatile uint16_t zz;
  int32_t foo(uint32_t x, uint32_t y)
  {
    struct q temp1,temp2;
    temp1.x = 3;
    if (y & 1)
      temp1.y = zz;
    memmove(&temp2,&temp1,sizeof temp1);
    return temp2.y;
  }

will generate code that will return x if y is zero, even if x is outside the range 0-65535. The Standard makes clear that unsigned character reads of Indeterminate Value are guaranteed to yield a value within the range of unsigned char, and the behavior of memmove is defined as equivalent to a sequence of character reads and writes. Thus, temp2 should have a value that could be stored into it via sequence of character writes, but gcc is deciding to replace the memmove with an assignment and ignore the fact that code took the address of temp1 and temp2.

Having a means of forcing a compiler to regard a variable as holding a arbitrary value of its type, in cases where any such value would be equally acceptable, would be helpful, but the Standard doesn't specify a clean means of doing so (save for storing some particular value which would work, but often be needlessly slow). Even operations which should logically force a variable to hold a value that would be representable as some combination of bits cannot be relied upon to work on all compilers. Consequently, nothing useful can be guaranteed about such variables.

Dissident answered 8/5, 2017 at 17:33 Comment(13)
To be fair, there is a defect report linked above about exactly what you can do with an indeterminate value, and part of the decision was to specify that passing an indeterminate value to any library function is UB. memmove is a library function so that would apply here.Laurentia
@BeeOnRope: If the authors of the Standard had included a means of resolving indeterminate values into at-worst-unspecified values, it would have been reasonable to require the use of such means before passing otherwise-indeterminate values to library functions. Given the lack of such means, the only thing I can read into their decision is that they are more interested in making a language "easy to optimize" than in maximizing its usefulness.Dissident
@BeeOnRope: Their rationale is that making behavior undefined shouldn't prevent compilers from defining behaviors when targeting processors and application fields where it would be practical and useful to do so. Unfortunately, whether or not such decisions by the Committee should have such an effect, it's obvious that they do.Dissident
I suppose, yes, they could have introduced some kind of T std::freeze(T v) method that would turn a "wobbly" indeterminate value into an unspecified-but-stable value. It would have "third order" usefulness though: using indeterminate value is already obscure and very rarely used, so adding a special construct just to solidify such values would seem to be just going further down the rabbit hole of what is already an obscure corner of the standard, and it would have to be supported in the core transformation/optimization phases of many compilers.Laurentia
@BeeOnRope: The ability to freeze values would have essentially zero cost outside those situations where it would be essential, and trying to debug optimized code in its absence is a sure path to insanity. If one writes foo=moo; if (foo < 100) bar(foo); and moo gets changed unexpectedly by some other thread, trying to diagnose when and where things went wrong may be essentially impossible. Being able to say foo=moo; freeze(foo); if (foo < 100) bar(foo); and have the compiler commit to a value for foo would make things a lot more robust.Dissident
@BeeOnRope: In addition, a fundamental tenet of secure systems programming often requires that behavior be constrained even in cases where client code does things it shouldn't such as passing a pointer to an object while another thread is modifying it. Having code use freeze in those particular places it's needed would be much cheaper than using more "sledge-hammer"-ish approaches or having to disable optimizations altogether.Dissident
I was thinking in the context of a single-thread were indeterminism is introduced, e.g., by reads of uninitialized variables, but sure that's a good point about concurrent writes. In other languages a "non atomic" like that into a local will certainly "lock in" the value. C and C++ are hamstrung by being implemented via compiled-to-native code across a wide variety of architectures. Decisions made to "support" architectures that died (like Itanium) sometimes later look pretty stupid.Laurentia
... although on most practical hardware, std::memory_order_relaxed gets you exactly what you want.Laurentia
@BeeOnRope: I find it odd the "Itanium clause" is so called, when the real issue goes back a lot further. On a typical processor that uses 32-bit instructions for everything but loads and stores (e.g. ARM, though the principle goes back far earlier), the simplest code for volatile uint16_t x; uint16_t test(uint32_t q, int mode) { uint16_t result; if (mode) result=x; return result; } would return q if mode is zero, even if q is greater than 65535.Dissident
@BeeOnRope: If nothing uses the result of such a function, that won't matter. Having a uint16_t hold values greater than 65,535, however, could cause weird behaviors downstream. A freeze could be used to guard against such things by forcing a uint16_t object to be clipped to that range.Dissident
@Laurentia Re: "wobbly": +1. See relevant DRs: 451, and (extra) 260.Apocrypha
@pmor: A fundamental problem with the Standard, which is responsible for most controversies related to it, is that it fails to accommodate the idea of optimizations which may cause behavior to be inconsistent, in limited ways, with a sequential-execution model, or to recognize that some tasks would require behavioral guarantees that are stronger than would be needed for other tasks. Classifying an action whose behavior would otherwise be defined as "anything can happen" UB only allows more optimizations than would allowing more limited freedom in cases where nothing a program might do...Dissident
...in response to even maliciously-contrived input would be considered unacceptable. While there are some tasks that will either be completely shielded from potentially malicious inputs, or run in an environment where nothing they could do would cause unacceptable harm, such tasks represent an extreme minority of the tasks for which people use the C language. Granting compilers more limited freedom to deviate from a "precise sequential execution" model would vastly increase the range of correct programs where compilers would be able to benefit from optimization-friendly rules.Dissident

© 2022 - 2024 — McMap. All rights reserved.