What happens to a declared, uninitialized variable in C? Does it have a value?
Asked Answered
H

9

175

If in C I write:

int num;

Before I assign anything to num, is the value of num indeterminate?

Hilaria answered 20/10, 2009 at 21:24 Comment(4)
Um, isn't that a defined variable, not a declared one? (I'm sorry if that's my C++ shining through...)Anticipant
No. I can declare a variable without defining it: extern int x; However defining always implies declaring. This is not true in C++, with static class member variables one can define without declaring, as the declaration must be in the class definition (not declaration!) and the definition must be outside of the class definition.Chestnut
ee.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.4.html Looks like defined means you have to initialize it, too.Hilaria
Related: Where do the values of uninitialized variables come from, in practice on real CPUs?. (And Is uninitialized local variable the fastest random number generator? - no, not remotely safe.)Jeanninejeans
C
234

Static variables (file scope and function static) are initialized to zero:

int x; // zero
int y = 0; // also zero

void foo() {
    static int x; // also zero
}

Non-static variables (local variables) are indeterminate. Reading them prior to assigning a value results in undefined behavior.

void foo() {
    int x;
    printf("%d", x); // the compiler is free to crash here
}

In practice, they tend to just have some nonsensical value in there initially - some compilers may even put in specific, fixed values to make it obvious when looking in a debugger - but strictly speaking, the compiler is free to do anything from crashing to summoning demons through your nasal passages.

As for why it's undefined behavior instead of simply "undefined/arbitrary value", there are a number of CPU architectures that have additional flag bits in their representation for various types. A modern example would be the Itanium, which has a "Not a Thing" bit in its registers the Itanium, which has a "Not a Thing" bit in its registers; of course, the C standard drafters were considering some older architectures.

Attempting to work with a value with these flag bits set can result in a CPU exception in an operation that really shouldn't fail (eg, integer addition, or assigning to another variable). And if you go and leave a variable uninitialized, the compiler might pick up some random garbage with these flag bits set - meaning touching that uninitialized variable may be deadly.

Chestnut answered 20/10, 2009 at 21:27 Comment(22)
oh no they aren't. They might be, in debug mode, when you aren't in front of a customer, on months with an R in, if you're luckyDynamics
what aren't? the static initialization is required by the standard; see ISO/IEC 9899:1999 6.7.8 #10Chestnut
first example is fine as far as I can tell. I'm less as to why the compiler might crash in the second one though :)Vendee
@Stuart: there's a thing called "trap representation", which is basically a bit pattern that does not denote a valid value, and may cause e.g. hardware exceptions at runtime. The only C type for which there's a guarantee that any bit pattern is a valid value is char; all others can have trap representations. Alternatively - since accessing uninitialized variable is U.B. anyway - a conforming compiler might simply do some checking and decide to signal the problem.Tampon
bdonian is correct. C has always been specified fairly precisely. Prior to C89 and C99, a paper by dmr specified all these things in the early 1970's. Even in the crudest embedded system, it only takes one memset() to do things right, so there is no excuse for a nonconforming environment. I have cited the standard in my answer.Villous
@Pavel: Fascinating - I'd love to know, by way of example, what the trap representation value is for a 32 bit unsigned integer on a 32 bit processor. I am having trouble imagining that there is some special reserved value for an int which can't be used and which no one has ever told us poor programmers about.Kelleykelli
@Software Monkey: Modern processors don't have trap representations for integers, but older processors may have had them - eg, there may be parity bits, or if the signed number representation includes signed zero, negative zero may be a trap representation. I'm not sure what processors actually have had integer trap representations in the past, but the C standard makes allowances for it, so I suppose there must have been something...Chestnut
Oh, and actually, on the modern end, there's that link to an example on the Itanium of a real modern trap representation in my answer, too :)Chestnut
@SoftwareMonkey: On many PC's which had uncached memory systems (generally pre-80486), it was common for each byte of memory to be stored as 9 bits, with the ninth bit being a parity for the other eight. An attempt to read a byte from memory whose parity was set wrong would trigger an NMI; the normal NMI vector would reset the screen to 40-column mode, display a message like "PARITY CHECK 1", and hang the computer. The original PC allowed code to explicitly force the state of the parity bits when writing to memory (the power-on-self-test did this to test the parity-check hardware).Wheen
@SoftwareMonkey: I don't know that any language compilers/debugging systems of that era added code to mis-set the parity bits of uninitialized memory, but they certainly could have done. In more modern times, the Visual Studio C++ debugger will often stop on code which attempts to access unitialized stack data; I don't know the exact means it uses to do.Wheen
@SoftwareMonkey You ask “I'd love to know, by way of example, what the trap representation value is for a 32 bit unsigned integer on a 32 bit processor.” One of these trap representations that an indeterminate int can have is the number that produces an odd result when multiplied by two: blog.frama-c.com/index.php?post/2013/03/13/…Smaltite
@PascalCuoq: What value of the valid range of 0 - 4G that 32 bits can represent does that? I buy the arguments around a incorrect parity bit or some such, but not that there's some magic "bit pattern" to which an unsigned int can be set that flags that it's "uninitialized".Kelleykelli
@SoftwareMonkey Did you read the link? The example is built so that no (unspecified) value does that (the example carefully uses unsigned int). This is the key argument in the claim of the linked blog post that reading from indeterminate memory is treated as undefined behavior by optimizing compilers, even if the architecture is known not to have trap values. I thought that was what we were discussing here.Smaltite
That result from the program (of 1) is not emitted because j*2 results in 1, but because J*2 is thrown away by the compiler and the conditional is effectively eliminated. This does not prove that some magic bit pattern produces 1 when multiplied by 2, only that the behavior is indeed undefined and the compiler can do whatever the hell it likes.Kelleykelli
Will the indeterminate value only depend on the compiler's choices and be determinate for a given binary? Or can the indeterminate value also depend on the whole system's state when running the program?Jobe
Reading them prior to assigning a value results in undefined behavior.: Disagree.Sarcoma
@haccks, since the value is indeterminate, and the value may therefore be a trap representation, and accessing a trap representation is UB, it follows that UB is among the set of permissible behaviors from accessing an uninitialized variable - and therefore, in general, it's UB. If your compiler does not have trap representations you may be able to make some assumptions about the behavior of the program that would not be possible with "true" UB, however.Chestnut
Whenever I try to print an uninitialized variable (int a;) in the compiler I’m using (gcc), it always says 0. Whenever I use the reference of any variable (&x or &a) before of after that printf call in the same function (and the uninitialized variable isn’t global), it prints things like 32765 or 128339264. I guess this could be the compiler putting in “specific, fixed values to make it obvious when looking in a debugger”, but it’s quite counter-intuitive when you’re new to C and just testing a few cases.Dewie
Is the compiler crashing a valid result of undefined behaviour?Cybil
@immibis everything is a valid result of undefined behaviour. The standard specifically does not impose a requirement on the compiler to not crash upong seeing undefined behaviour.Mandelbaum
This answer is an oversimplification. Reading an uninitialized variable with automatic storage duration may result in UB, if and only if: the variable did not have its address taken OR the system is a highly fictional/obsolete one with trap representations. Otherwise no UB. Details here: https://mcmap.net/q/15716/-why-is-using-an-uninitialized-variable-undefined-behaviorArgus
You can tell the age of this answer by the fact that it uses Itaniun as an example of a modern architecture. :)Phosphate
V
64

0 if static or global, indeterminate if storage class is auto

C has always been very specific about the initial values of objects. If global or static, they will be zeroed. If auto, the value is indeterminate.

This was the case in pre-C89 compilers and was so specified by K&R and in DMR's original C report.

This was the case in C89, see section 6.5.7 Initialization.

If an object that has automatic storage duration is not initialized explicitely, its value is indeterminate. If an object that has static storage duration is not initialized explicitely, it is initialized implicitely as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant.

This was the case in C99, see section 6.7.8 Initialization.

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.

As to what exactly indeterminate means, I'm not sure for C89, C99 says:

3.17.2
indeterminate value

either an unspecified value or a trap representation

But regardless of what standards say, in real life, each stack page actually does start off as zero, but when your program looks at any auto storage class values, it sees whatever was left behind by your own program when it last used those stack addresses. If you allocate a lot of auto arrays you will see them eventually start neatly with zeroes.

You might wonder, why is it this way? A different SO answer deals with that question, see: https://mcmap.net/q/15749/-why-are-global-and-static-variables-initialized-to-their-default-values

Villous answered 20/10, 2009 at 21:38 Comment(5)
indeterminate usually (used to?) means it can do anything. It can be zero, it can be the value that was in there, it can crash the program, it can make the computer produce blueberry pancakes out of the CD slot. you have absolutely no guarantees. It might cause the destruction of the planet. At least as far as the spec goes... anyone who made a compiler that actually did anything like that would be highly frowned upon B-)Seaborne
In C11 N1570 draft, definition of indeterminate value can be found at 3.19.2.Borborygmus
Is it so that it always depends upon the compiler or the OS that what value it sets for static variable? For example, if someone writes an OS or a compiler of my own, and if they also set initial value by default for statics as indeterminate, is that possible?Jeunesse
@AdityaSingh, the OS can make it easier on the compiler but ultimately it's the compiler's primary responsibility to run the world's existing catalog of C code, and a secondary responsibility to meet the standards. It would certainly be possible to do it differently, but, why? Also, it's tricky to make static data indeterminate, because the OS will really want to zero the pages first for security reasons. (Auto variables are only superficially unpredictable because your own program has usually been using those stack addresses at an earlier point.)Villous
@BrianPostow No, that is not correct. See https://mcmap.net/q/15716/-why-is-using-an-uninitialized-variable-undefined-behavior. Using an indeterminate value causes unspecified behavior, not undefined behavior, save for the case of trap representations.Argus
K
15

It depends on the storage duration of the variable. A variable with static storage duration is always implicitly initialized with zero.

As for automatic (local) variables, an uninitialized variable has indeterminate value. Indeterminate value, among other things, mean that whatever "value" you might "see" in that variable is not only unpredictable, it is not even guaranteed to be stable. For example, in practice (i.e. ignoring the UB for a second) this code

int num;
int a = num;
int b = num;

does not guarantee that variables a and b will receive identical values. Interestingly, this is not some pedantic theoretical concept, this readily happens in practice as consequence of optimization.

So in general, the popular answer that "it is initialized with whatever garbage was in memory" is not even remotely correct. Uninitialized variable's behavior is different from that of a variable initialized with garbage.

Kado answered 20/10, 2009 at 21:37 Comment(1)
I can't understand (well I very well can) why this has much less upvotes than the one from DigitalRoss just a minute after :DMandelbaum
Z
9

Ubuntu 15.10, Kernel 4.2.0, x86-64, GCC 5.2.1 example

Enough standards, let's look at an implementation :-)

Local variable

Standards: undefined behavior.

Implementation: the program allocates stack space, and never moves anything to that address, so whatever was there previously is used.

#include <stdio.h>
int main() {
    int i;
    printf("%d\n", i);
}

compile with:

gcc -O0 -std=c99 a.c

outputs:

0

and decompiles with:

objdump -dr a.out

to:

0000000000400536 <main>:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       48 83 ec 10             sub    $0x10,%rsp
  40053e:       8b 45 fc                mov    -0x4(%rbp),%eax
  400541:       89 c6                   mov    %eax,%esi
  400543:       bf e4 05 40 00          mov    $0x4005e4,%edi
  400548:       b8 00 00 00 00          mov    $0x0,%eax
  40054d:       e8 be fe ff ff          callq  400410 <printf@plt>
  400552:       b8 00 00 00 00          mov    $0x0,%eax
  400557:       c9                      leaveq
  400558:       c3                      retq

From our knowledge of x86-64 calling conventions:

  • %rdi is the first printf argument, thus the string "%d\n" at address 0x4005e4

  • %rsi is the second printf argument, thus i.

    It comes from -0x4(%rbp), which is the first 4-byte local variable.

    At this point, rbp is in the first page of the stack has been allocated by the kernel, so to understand that value we would to look into the kernel code and find out what it sets that to.

    TODO does the kernel set that memory to something before reusing it for other processes when a process dies? If not, the new process would be able to read the memory of other finished programs, leaking data. See: Are uninitialized values ever a security risk?

We can then also play with our own stack modifications and write fun things like:

#include <assert.h>

int f() {
    int i = 13;
    return i;
}

int g() {
    int i;
    return i;
}

int main() {
    f();
    assert(g() == 13);
}

Note that GCC 11 seems to produce a different assembly output, and the above code stops "working", it is undefined behavior after all: Why does -O3 in gcc seem to initialize my local variable to 0, while -O0 does not?

Local variable in -O3

Implementation analysis at: What does <value optimized out> mean in gdb?

Global variables

Standards: 0

Implementation: .bss section.

#include <stdio.h>
int i;
int main() {
    printf("%d\n", i);
}

gcc -O0 -std=c99 a.c

compiles to:

0000000000400536 <main>:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       8b 05 04 0b 20 00       mov    0x200b04(%rip),%eax        # 601044 <i>
  400540:       89 c6                   mov    %eax,%esi
  400542:       bf e4 05 40 00          mov    $0x4005e4,%edi
  400547:       b8 00 00 00 00          mov    $0x0,%eax
  40054c:       e8 bf fe ff ff          callq  400410 <printf@plt>
  400551:       b8 00 00 00 00          mov    $0x0,%eax
  400556:       5d                      pop    %rbp
  400557:       c3                      retq
  400558:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40055f:       00

# 601044 <i> says that i is at address 0x601044 and:

readelf -SW a.out

contains:

[25] .bss              NOBITS          0000000000601040 001040 000008 00  WA  0   0  4

which says 0x601044 is right in the middle of the .bss section, which starts at 0x601040 and is 8 bytes long.

The ELF standard then guarantees that the section named .bss is completely filled with of zeros:

.bss This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occu- pies no file space, as indicated by the section type, SHT_NOBITS.

Furthermore, the type SHT_NOBITS is efficient and occupies no space on the executable file:

sh_size This member gives the section’s size in bytes. Unless the sec- tion type is SHT_NOBITS , the section occupies sh_size bytes in the file. A section of type SHT_NOBITS may have a non-zero size, but it occupies no space in the file.

Then it is up to the Linux kernel to zero out that memory region when loading the program into memory when it gets started.

Zeringue answered 19/4, 2016 at 17:17 Comment(3)
I believe gcc -00 -std=c99 a.c should be gcc -O0 -std=c99 a.c?Topknot
Hope not that intruding. I am learning pipe via linuxhint.com/pipe_system_call_c I really did google search a lot, but I cannot found official reference say that file descriptor 3 means read and 4 mean write. Obviously I can get 0,1,2 from wikipedia.Topknot
@Mark no worries. Why do you care about the exact value of the file descriptors being 3/4, since you get them as return values of the pipe() call? See e.g. the man pipe example. Presumably 3/4 are picked by Linux kernel because 0 1 and 2 are already taken, and it just increments. Of course, next pipe will likely have 5/6. You could confirm this by trying to read Linux kernel source code. Likely POSIX does not specify the exact values: pubs.opengroup.org/onlinepubs/9699919799/functions/pipe.html so relying on them would not be portable.Zeringue
D
3

That depends. If that definition is global (outside any function) then num will be initialized to zero. If it's local (inside a function) then its value is indeterminate. In theory, even attempting to read the value has undefined behavior -- C allows for the possibility of bits that don't contribute to the value, but have to be set in specific ways for you to even get defined results from reading the variable.

Dumps answered 20/10, 2009 at 21:28 Comment(0)
H
0

The basic answer is, yes it is undefined.

If you are seeing odd behavior because of this, it may depended on where it is declared. If within a function on the stack then the contents will more than likely be different every time the function gets called. If it is a static or module scope it is undefined but will not change.

Hessney answered 20/10, 2009 at 21:30 Comment(0)
W
0

Because computers have finite storage capacity, automatic variables will typically be held in storage elements (whether registers or RAM) that have previously been used for some other arbitrary purpose. If a such a variable is used before a value has been assigned to it, that storage may hold whatever it held previously, and so the contents of the variable will be unpredictable.

As an additional wrinkle, many compilers may keep variables in registers which are larger than the associated types. Although a compiler would be required to ensure that any value which is written to a variable and read back will be truncated and/or sign-extended to its proper size, many compilers will perform such truncation when variables are written and expect that it will have been performed before the variable is read. On such compilers, something like:

uint16_t hey(uint32_t x, uint32_t mode)
{
  uint16_t q; 
  if (mode==1) q=2; 
  if (mode==3) q=4; 
  return q;
}
uint32_t wow(uint32_t mode)
{
  return hey(1234567, mode);
}

might very well result in wow() storing the values 1234567 into registers 0 and 1, respectively, and calling hey(). Since x isn't needed within hey, and since functions are supposed to put their return value into register 0, the compiler may allocate register 0 to q. If mode is 1 or 3, register 0 will be loaded with 2 or 4, respectively, but if it is some other value, the function may return whatever was in register 0 (i.e. the value 1234567) even though that value is not within the range of uint16_t.

To avoid requiring compilers to do extra work to ensure that uninitialized variables never seem to hold values outside their domain, and avoid needing to specify indeterminate behaviors in excessive detail, the Standard says that use of uninitialized automatic variables is Undefined Behavior. In some cases, the consequences of this may be even more surprising than a value being outside the range of its type. For example, given:

void moo(int mode)
{
  if (mode < 5)
    launch_nukes();
  hey(0, mode);      
}

a compiler could infer that because invoking moo() with a mode which is greater than 3 will inevitably lead to the program invoking Undefined Behavior, the compiler may omit any code which would only be relevant if mode is 4 or greater, such as the code which would normally prevent the launch of nukes in such cases. Note that neither the Standard, nor modern compiler philosophy, would care about the fact that the return value from "hey" is ignored--the act of trying to return it gives a compiler unlimited license to generate arbitrary code.

Wheen answered 6/5, 2016 at 15:28 Comment(1)
in you examples, I don't see any foo() function which you are mentioning in your answerInsufficient
T
-1

If storage class is static or global then during loading, the BSS initialises the variable or memory location(ML) to 0 unless the variable is initially assigned some value. In case of local uninitialized variables the trap representation is assigned to memory location. So if any of your registers containing important info is overwritten by compiler the program may crash.

but some compilers may have mechanism to avoid such a problem.

I was working with nec v850 series when i realised There is trap representation which has bit patterns that represent undefined values for data types except for char. When i took a uninitialized char i got a zero default value due to trap representation. This might be useful for any1 using necv850es

Thundershower answered 30/5, 2013 at 10:29 Comment(1)
Your system is not compliant if you get trap representations when using unsigned char. They are explicitly not allowed to contain trap representations, C17 6.2.6.1/5.Argus
F
-4

As far as i had gone it is mostly depend on compiler but in general most cases the value is pre assumed as 0 by the compliers.
I got garbage value in case of VC++ while TC gave value as 0. I Print it like below

int i;
printf('%d',i);
Fine answered 27/6, 2012 at 19:34 Comment(1)
If you get a deterministic value as for example 0 your compiler most likely goes extra steps to make sure it gets that value (by adding code to initialize the variables anyway). Some compilers do this when doing "debug" compilation, but choosing the value 0 for these is a bad idea since it will hide faults in your code (more proper thing would to guarantee a really unlikely number like 0xBAADF00D or something similar). I think most compiler will just leave whatever garbage that happens to occupy the memory as the value of the variable (ie. it's in general not assemued as 0).Selfdiscipline

© 2022 - 2024 — McMap. All rights reserved.