If in C I write:
int num;
Before I assign anything to num
, is the value of num
indeterminate?
If in C I write:
int num;
Before I assign anything to num
, is the value of num
indeterminate?
Static variables (file scope and function static) are initialized to zero:
int x; // zero
int y = 0; // also zero
void foo() {
static int x; // also zero
}
Non-static variables (local variables) are indeterminate. Reading them prior to assigning a value results in undefined behavior.
void foo() {
int x;
printf("%d", x); // the compiler is free to crash here
}
In practice, they tend to just have some nonsensical value in there initially - some compilers may even put in specific, fixed values to make it obvious when looking in a debugger - but strictly speaking, the compiler is free to do anything from crashing to summoning demons through your nasal passages.
As for why it's undefined behavior instead of simply "undefined/arbitrary value", there are a number of CPU architectures that have additional flag bits in their representation for various types. A modern example would be the Itanium, which has a "Not a Thing" bit in its registers the Itanium, which has a "Not a Thing" bit in its registers; of course, the C standard drafters were considering some older architectures.
Attempting to work with a value with these flag bits set can result in a CPU exception in an operation that really shouldn't fail (eg, integer addition, or assigning to another variable). And if you go and leave a variable uninitialized, the compiler might pick up some random garbage with these flag bits set - meaning touching that uninitialized variable may be deadly.
char
; all others can have trap representations. Alternatively - since accessing uninitialized variable is U.B. anyway - a conforming compiler might simply do some checking and decide to signal the problem. –
Tampon int
can have is the number that produces an odd result when multiplied by two: blog.frama-c.com/index.php?post/2013/03/13/… –
Smaltite unsigned int
). This is the key argument in the claim of the linked blog post that reading from indeterminate memory is treated as undefined behavior by optimizing compilers, even if the architecture is known not to have trap values. I thought that was what we were discussing here. –
Smaltite j*2
results in 1
, but because J*2
is thrown away by the compiler and the conditional is effectively eliminated. This does not prove that some magic bit pattern produces 1
when multiplied by 2, only that the behavior is indeed undefined and the compiler can do whatever the hell it likes. –
Kelleykelli int a;
) in the compiler I’m using (gcc), it always says 0
. Whenever I use the reference of any variable (&x
or &a
) before of after that printf
call in the same function (and the uninitialized variable isn’t global), it prints things like 32765
or 128339264
. I guess this could be the compiler putting in “specific, fixed values to make it obvious when looking in a debugger”, but it’s quite counter-intuitive when you’re new to C and just testing a few cases. –
Dewie C has always been very specific about the initial values of objects. If global or static
, they will be zeroed. If auto
, the value is indeterminate.
This was the case in pre-C89 compilers and was so specified by K&R and in DMR's original C report.
This was the case in C89, see section 6.5.7 Initialization.
If an object that has automatic storage duration is not initialized explicitely, its value is indeterminate. If an object that has static storage duration is not initialized explicitely, it is initialized implicitely as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant.
This was the case in C99, see section 6.7.8 Initialization.
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
As to what exactly indeterminate means, I'm not sure for C89, C99 says:
3.17.2
indeterminate value
either an unspecified value or a trap representation
But regardless of what standards say, in real life, each stack page actually does start off as zero, but when your program looks at any auto
storage class values, it sees whatever was left behind by your own program when it last used those stack addresses. If you allocate a lot of auto
arrays you will see them eventually start neatly with zeroes.
You might wonder, why is it this way? A different SO answer deals with that question, see: https://mcmap.net/q/15749/-why-are-global-and-static-variables-initialized-to-their-default-values
indeterminate value
can be found at 3.19.2. –
Borborygmus It depends on the storage duration of the variable. A variable with static storage duration is always implicitly initialized with zero.
As for automatic (local) variables, an uninitialized variable has indeterminate value. Indeterminate value, among other things, mean that whatever "value" you might "see" in that variable is not only unpredictable, it is not even guaranteed to be stable. For example, in practice (i.e. ignoring the UB for a second) this code
int num;
int a = num;
int b = num;
does not guarantee that variables a
and b
will receive identical values. Interestingly, this is not some pedantic theoretical concept, this readily happens in practice as consequence of optimization.
So in general, the popular answer that "it is initialized with whatever garbage was in memory" is not even remotely correct. Uninitialized variable's behavior is different from that of a variable initialized with garbage.
Ubuntu 15.10, Kernel 4.2.0, x86-64, GCC 5.2.1 example
Enough standards, let's look at an implementation :-)
Local variable
Standards: undefined behavior.
Implementation: the program allocates stack space, and never moves anything to that address, so whatever was there previously is used.
#include <stdio.h>
int main() {
int i;
printf("%d\n", i);
}
compile with:
gcc -O0 -std=c99 a.c
outputs:
0
and decompiles with:
objdump -dr a.out
to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 10 sub $0x10,%rsp
40053e: 8b 45 fc mov -0x4(%rbp),%eax
400541: 89 c6 mov %eax,%esi
400543: bf e4 05 40 00 mov $0x4005e4,%edi
400548: b8 00 00 00 00 mov $0x0,%eax
40054d: e8 be fe ff ff callq 400410 <printf@plt>
400552: b8 00 00 00 00 mov $0x0,%eax
400557: c9 leaveq
400558: c3 retq
From our knowledge of x86-64 calling conventions:
%rdi
is the first printf argument, thus the string "%d\n"
at address 0x4005e4
%rsi
is the second printf argument, thus i
.
It comes from -0x4(%rbp)
, which is the first 4-byte local variable.
At this point, rbp
is in the first page of the stack has been allocated by the kernel, so to understand that value we would to look into the kernel code and find out what it sets that to.
TODO does the kernel set that memory to something before reusing it for other processes when a process dies? If not, the new process would be able to read the memory of other finished programs, leaking data. See: Are uninitialized values ever a security risk?
We can then also play with our own stack modifications and write fun things like:
#include <assert.h>
int f() {
int i = 13;
return i;
}
int g() {
int i;
return i;
}
int main() {
f();
assert(g() == 13);
}
Note that GCC 11 seems to produce a different assembly output, and the above code stops "working", it is undefined behavior after all: Why does -O3 in gcc seem to initialize my local variable to 0, while -O0 does not?
Local variable in -O3
Implementation analysis at: What does <value optimized out> mean in gdb?
Global variables
Standards: 0
Implementation: .bss
section.
#include <stdio.h>
int i;
int main() {
printf("%d\n", i);
}
gcc -O0 -std=c99 a.c
compiles to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 8b 05 04 0b 20 00 mov 0x200b04(%rip),%eax # 601044 <i>
400540: 89 c6 mov %eax,%esi
400542: bf e4 05 40 00 mov $0x4005e4,%edi
400547: b8 00 00 00 00 mov $0x0,%eax
40054c: e8 bf fe ff ff callq 400410 <printf@plt>
400551: b8 00 00 00 00 mov $0x0,%eax
400556: 5d pop %rbp
400557: c3 retq
400558: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40055f: 00
# 601044 <i>
says that i
is at address 0x601044
and:
readelf -SW a.out
contains:
[25] .bss NOBITS 0000000000601040 001040 000008 00 WA 0 0 4
which says 0x601044
is right in the middle of the .bss
section, which starts at 0x601040
and is 8 bytes long.
The ELF standard then guarantees that the section named .bss
is completely filled with of zeros:
.bss
This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occu- pies no file space, as indicated by the section type,SHT_NOBITS
.
Furthermore, the type SHT_NOBITS
is efficient and occupies no space on the executable file:
sh_size
This member gives the section’s size in bytes. Unless the sec- tion type isSHT_NOBITS
, the section occupiessh_size
bytes in the file. A section of typeSHT_NOBITS
may have a non-zero size, but it occupies no space in the file.
Then it is up to the Linux kernel to zero out that memory region when loading the program into memory when it gets started.
gcc -00 -std=c99 a.c
should be gcc -O0 -std=c99 a.c
? –
Topknot pipe()
call? See e.g. the man pipe
example. Presumably 3/4 are picked by Linux kernel because 0 1 and 2 are already taken, and it just increments. Of course, next pipe will likely have 5/6. You could confirm this by trying to read Linux kernel source code. Likely POSIX does not specify the exact values: pubs.opengroup.org/onlinepubs/9699919799/functions/pipe.html so relying on them would not be portable. –
Zeringue That depends. If that definition is global (outside any function) then num
will be initialized to zero. If it's local (inside a function) then its value is indeterminate. In theory, even attempting to read the value has undefined behavior -- C allows for the possibility of bits that don't contribute to the value, but have to be set in specific ways for you to even get defined results from reading the variable.
The basic answer is, yes it is undefined.
If you are seeing odd behavior because of this, it may depended on where it is declared. If within a function on the stack then the contents will more than likely be different every time the function gets called. If it is a static or module scope it is undefined but will not change.
Because computers have finite storage capacity, automatic variables will typically be held in storage elements (whether registers or RAM) that have previously been used for some other arbitrary purpose. If a such a variable is used before a value has been assigned to it, that storage may hold whatever it held previously, and so the contents of the variable will be unpredictable.
As an additional wrinkle, many compilers may keep variables in registers which are larger than the associated types. Although a compiler would be required to ensure that any value which is written to a variable and read back will be truncated and/or sign-extended to its proper size, many compilers will perform such truncation when variables are written and expect that it will have been performed before the variable is read. On such compilers, something like:
uint16_t hey(uint32_t x, uint32_t mode)
{
uint16_t q;
if (mode==1) q=2;
if (mode==3) q=4;
return q;
}
uint32_t wow(uint32_t mode)
{
return hey(1234567, mode);
}
might very well result in wow()
storing the values 1234567 into registers
0 and 1, respectively, and calling hey()
. Since x
isn't needed within
hey
, and since functions are supposed to put their return value into
register 0, the compiler may allocate register 0 to q
. If mode
is 1 or
3, register 0 will be loaded with 2 or 4, respectively, but if it is some
other value, the function may return whatever was in register 0 (i.e. the
value 1234567) even though that value is not within the range of uint16_t.
To avoid requiring compilers to do extra work to ensure that uninitialized variables never seem to hold values outside their domain, and avoid needing to specify indeterminate behaviors in excessive detail, the Standard says that use of uninitialized automatic variables is Undefined Behavior. In some cases, the consequences of this may be even more surprising than a value being outside the range of its type. For example, given:
void moo(int mode)
{
if (mode < 5)
launch_nukes();
hey(0, mode);
}
a compiler could infer that because invoking moo()
with a mode which is
greater than 3 will inevitably lead to the program invoking Undefined
Behavior, the compiler may omit any code which would only be relevant
if mode
is 4 or greater, such as the code which would normally prevent
the launch of nukes in such cases. Note that neither the Standard, nor
modern compiler philosophy, would care about the fact that the return value
from "hey" is ignored--the act of trying to return it gives a compiler
unlimited license to generate arbitrary code.
If storage class is static or global then during loading, the BSS initialises the variable or memory location(ML) to 0 unless the variable is initially assigned some value. In case of local uninitialized variables the trap representation is assigned to memory location. So if any of your registers containing important info is overwritten by compiler the program may crash.
but some compilers may have mechanism to avoid such a problem.
I was working with nec v850 series when i realised There is trap representation which has bit patterns that represent undefined values for data types except for char. When i took a uninitialized char i got a zero default value due to trap representation. This might be useful for any1 using necv850es
As far as i had gone it is mostly depend on compiler but in general most cases the value is pre assumed as 0 by the compliers.
I got garbage value in case of VC++ while TC gave value as 0.
I Print it like below
int i;
printf('%d',i);
0
your compiler most likely goes extra steps to make sure it gets that value (by adding code to initialize the variables anyway). Some compilers do this when doing "debug" compilation, but choosing the value 0
for these is a bad idea since it will hide faults in your code (more proper thing would to guarantee a really unlikely number like 0xBAADF00D
or something similar). I think most compiler will just leave whatever garbage that happens to occupy the memory as the value of the variable (ie. it's in general not assemued as 0
). –
Selfdiscipline © 2022 - 2024 — McMap. All rights reserved.
extern int x;
However defining always implies declaring. This is not true in C++, with static class member variables one can define without declaring, as the declaration must be in the class definition (not declaration!) and the definition must be outside of the class definition. – Chestnut