In C++26, are implementations required to "initialize" uninitialized variables to some fixed byte pattern?
Asked Answered
O

2

10

In C++26, reading uninitialized variables is no longer undefined, it's "erroneous" now (What is erroneous behavior? How is it different from undefined behavior?).

However, the wording for this confuses me:

[basic.indet]/1.2

otherwise, the bytes have erroneous values, where each value is determined by the implementation independently of the state of the program.

(bold mine)

To me, this reads like the implementation must overwrite the values with something (e.g. 0xBEBEBEBE), because leaving them truly uninitialized might make them dependent on the "state of the program", contradicting the bold part.

Is my interpretation correct? Are implementations forced to overwrite uninitialized variables now?

Objection answered 25/7 at 9:39 Comment(2)
These days with dataflow analysis in compilers so good I'd expect local use of uninitialised variables to be a compile time error rather than a runtime one. Many compilers in debugging mode do use a distinctive fill pattern for uninitialised variables but I'd expect optimising compilers to at least have an option not to bother. I'm not sure how the spec can be met in practice either since there will be at least one legitimate state of the program where a variable has been correctly initialised to the magic "uninitialised" fill value.Cortex
Under the proposal for C++26, an implementation would also be permitted to diagnose erroneous behaviour - either during translation or during program execution. Most modern compilers can already issue warnings about usage of uninitialised variables (albeit, for historic reasons those warnings are typically disabled by default) so, if this is ratified in C++26, I would expect most compilers to simply issue a diagnostic if the value of an uninitialised variable is accessed.Futch
H
3

The linked P2795R5 says under Performance and security implications:

  • The automatic storage for an automatic variable is always fully initialized, which has potential performance implications. P2723R1 discusses the costs in some detail. Note that this cost even applies when a class-type variable is constructed that has no padding and whose default constructor initializes all members.
  • In particular, unions are fully initialized. ...

It also points out that although automatic locals can be annotated [[indeterminate]] to suppress this initialization, there's no way to avoid it for any temporaries.

So it seems like your interpretation is correct.

Oddly, it doesn't seem important what this magic value is - or even whether this initialization really happens - except that it can't be a trap pattern. As already pointed out there's no magic value of a byte that is unambiguously erroneous at runtime and still safe to load, copy, and compare.


Edit - why do I say it doesn't seem to matter what the magic value is, or even whether this initialization really happens?

  1. The motivation is to stop evaluation (ie. glvalue-to-prvalue conversion) of uninitialized automatic variables being Undefined Behaviour. Instead it will be Erroneous Behaviour which implementations are encouraged to diagnose.

    • If an implementation doesn't diagnose the erroneous behaviour, the result of the evaluation is valid.
  2. The above can't be contingent on a specific bit pattern if that bit pattern could ever be produced by a valid expression, without the risk of misfiring diagnostics.

    • No usual primitives have such magic bit patterns, except for the now-uncommon trap representation.

    • eg. you couldn't use either quiet or signalling NaN to mark erroneous values, because if

      double fine = std::numeric_limits<double>::quiet_NaN;
      double errn;
      
      std::isnan(fine); // not erroneous
      std::isnan(errn); // erroneous behaviour
      

      needs to treat both values differently, it can't be based on the bit pattern.

    • The same is trivially true for integer types, and anyway [basic.indet/2] says

      Except in the following cases, ... if an erroneous value is produced by an evaluation, the behavior is erroneous and the result of the evaluation is the value so produced but is not erroneous

      where all the exclusions are related to "unsigned ordinary character type" and std::byte, so in:

      int errn;      // erroneous value
      foo(errn ^ 0); // 1, 2
      foo(errn);     // 3
      
      1. the XOR has erroneous behaviour, but if not diagnosed must produce a non-erroneous value with exactly the same bit-pattern as the erroneous input
      2. the call to foo with the non-erroneous value must not be diagnosed
      3. the call to foo with exactly the same bit-pattern may be diagnosed
  3. If the only goal is to prevent evaluation of uninitialized (automatic) variables escaping to UB, it's sufficient to require this kind of initialization only for types with trap representations.

    It may also be required to disable (or guard with diagnostic checks) some optimizations previously allowed by UB, but it's neither necessary nor sufficient for that to depend on a specific bit pattern.

Haphazard answered 25/7 at 10:49 Comment(5)
"or even whether this initialization really happens" Hmm, are you saying the wording looks like it doesn't force initialization? Because to me "independently of the state of the program" sounds like you can't keep the existing value.Objection
Agreed, but it also doesn't seem to require any behaviour that actually depends on the value. It definitely says it must happen, but it's far from clear why.Haphazard
To be honest, I don't understand the discussion about "important ... whether this initialization really happens". It's either required or not required, and it appears to be required.Objection
Initialization of erroneous values is required, and it's implied this is connected to erroneous behaviour, but it's not at all clear why. Requiring more work at runtime that doesn't achieve anything seems weird, which is why I was hoping someone would chip in with a justification.Haphazard
I can guess some possible justifications. It might prevent certain vulnerabilities based on pre-filling an uninitialized variable with certain values. It can also help debugging (even if there's no automatic reporting, if you see 0xBEBEBEBE (or something) in a variable, it immediately tells you something is wrong). It also removes random behavior from uninitialized reads, making the bugs caused by them reproducible.Objection
P
3

It depends on what you mean by "some fixed byte pattern".

Implementations can choose values that are dependent on the type of the variable, because that's a static property of the program, not dependent on the program's state. In cases where you declare e.g. an aligned buffer of unsigned char or std::byte and then later placement new objects into them, note that it is the first step, obtaining the storage, that writes erroneous values into the memory; the placement new step will be a no-op if it performs a trivial default initialization. But the compiler could choose to do some static analysis to see what type you are going to provide storage for using the buffer, and choose the erroneous values accordingly.

It's also theoretically possible that an implementation could choose the initial erroneous values using a true random number generator, but obviously the performance of that would be terrible, so it's unlikely for an actual implementation to do that.

Note that initial erroneous values do not apply to objects with static or thread storage duration (which are zero-initialized) or dynamic storage duration. It's also possible to opt out using the [[indeterminate]] attribute.

Perfectly answered 25/7 at 23:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.