Why C++ primitive types are not initialized like the rest of types?
Asked Answered
P

4

14

I know that, in C++, when you write

int i;

you can not make any assumptions about the value that the variable will hold until you effectively assign it a value. However, if you write

int i = int();

then you have the guarantee that i will be 0. So my question is, isn't it actually an incosistency in the behavior of the language? I mean, if I have defined a class MyClass and write

MyClass myInstance;

I can rest assured that the default constructor without parameters of the class will be called to initialize myInstance (and the compiler will fail if there is none), because that's how the RAII principle goes. However, it seems that when it comes to primitive types, resource acquisition is not initialization anymore. Why is that?

I don't think that changing this behavior inherited from C would break any existing code (is there any code in the world that works on the assumption that no assumption can be made about the value of a variable?), so the main possible reason that comes to my mind is performance, for example when creating big arrays of primitive types; but still, I'd like to know if there is some official explanation to this.

Thanks.

Patent answered 24/4, 2014 at 8:33 Comment(7)
"is there any code in the world that works on the assumption that no assumption can be made about the value of a variable?" Yes, for entropy pools.Taluk
I think you're looking at it from the wrong angle. Historically, variables were uninitialised, so the question should be why that changed for (some of) C++'s new class types, and that question is easier to answer. You're asking why the old types deviate from the norm, but the old types were the norm.Eustoliaeutectic
You could try to ask why MyClass(); is not required for the constructor to be called because it had resulted in more consistency in your view.Noncontributory
@Étienne Touché, cool example (although I hope they don't rely on uninitalized variables).Patent
@Noncontributory Your are right, but that's just because that way the compiler would confuse the object construction with a function declaration.Patent
@Patent Some implementations do rely on them as one source of entropy among several, see for instance lwn.net/Articles/586427Taluk
@Étienne: This is a bad idea. This is considered undefined behavior by C standard, and some implementations optimize this.Greenwood
O
8

No. It is not inconsistency.

What if your class is defined as:

struct MyClass
{
    int x;
    float y;
    char *z;
};

then this line does NOT do that you think it does:

MyClass myInstance; 

Assuming the above is declared inside a function, it is same as:

int x; //assuming declared inside a function

In C++, the types are broadly divided into 3 kinds viz. POD, non-POD, Aggregates — and there is a clear distinction between them. Please read about them and their initialization rules (there are too many topics on them. Search on this site). Also read about static initialization and dynamic initialization.

Orv answered 24/4, 2014 at 8:41 Comment(5)
shades of gray -- standard layout, aggregate... ;-)Casebound
I'm not sure aggregates are treated specially; aggregate initialization follows the rules of its members. And it's not POD vs. non-POD; a class with virtual functions but no user defined constructors is not a POD, but it will be initialized as one. And you've missed on of the most important differences: static lifetime vs. other lifetimes.Lientery
@JamesKanze: value initialization was introduced in C++03 (as the only new language feature, AFAIK) in order to clean up the initialization behavior for aggregates. in particular the non-POD ones. so there's special treatment.Casebound
@Nawaz Thanks. I've just found this reference covering several aspects about PODs and aggregates here #4178675 and another about initialization here #620637Patent
@Alf The concept of value initialization was introduced in order to clarify issues that weren't clear in C++98. It didn't actually change anything with regards to what had been the intent (nor what compilers were actually doing in practice).Lientery
L
7

The real reason, at least initially, was that C++ wanted all objects which are compatible with C to behave exactly as they would in C. The reason in C was (and still is) performance; zero initialization of objects with static lifetime was free (because the OS must initialize all memory that it gives the process anyway, for security reasons); zero initialization otherwise costs runtime. (The performance rationale is less strong today than it was originally, because compilers are a lot better at determining that the variable will be initialized later, and suppressing the zero-initialization in such cases; but they do still exist; in particular, in cases like:

char buffer[1000];
strcpy( buffer, something );

If zero initialization were required, I don't know of any compiler which would be able to skip it here, even though it won't be necessary.)

Lientery answered 24/4, 2014 at 9:20 Comment(4)
Also performance penalty for static initialization is likely to be smaller than for local variables as static initialization only happens once.Noncontributory
@Noncontributory The performance penalty for zero initialization of statics is 0, at least on classical general purpose machines.Lientery
If strcpy() is inlined in your particular variant of C++ a good compiler may be able to optimize the initialization away. Otherwise: little chance that would happen.Scurrility
@Scurrility It would also have to prove that you never access beyond the trailing '\0', because strcpy won't initialize these.Lientery
C
3

If you write

int i;

then the initialization or not depends on the context.

  • Namespace scope → zero-initialized.
  • Local function scope → uninitialized.
  • Class member: depends on the constructors, if any.

The lack of initialization for a local variable is just for efficiency. For a very simple function that is called repeatedly at the lowest levels, this can matter. And C and C++ are languages used to construct the bottom levels of things.

Casebound answered 24/4, 2014 at 8:40 Comment(4)
The initialization depends on object lifetime, not scope. Objects with static lifetime (even if they have local scope) are zero initialized before anything else (and this includes class members, even if the class has a constructor).Lientery
@JamesKanze: regarding the declaration int i; from the question and discussed in the answer, you're wrong. it can't be static unless it's in namespace scope , or in a static class type object (in which case it depends on constructors, if any). i understand that you're referring or meant to refer to variables in general, but by placing your remark as a comment on this answer it looks to readers like it contradicts something in the answer.Casebound
void f() { static int i; /*...*/ }. The initialization does not depend on the context; it depends on the object lifetime and the type of the object. And of course, even in class MyClass { int i; MyClass() {} };, MyClass:i will be zero initialized in a static instance of MyClass.Lientery
@JamesKanze: note that you have introduced the keyword static, not present in the question or the answer. similarly, by introducing the keyword unsigned you can change some properties of the variable. and so on. generally, declarations change meaning as you add various keywords.Casebound
W
1

When you set a local variable in a function to some value, then every time the function is called, the assignment takes place and the value is loaded into the stack.

For example:

void func()
{
    int i = 0; // Every time `func` is called, '0' is loaded into the stack
    ...
}

This is something that you might want to avoid, in particularly since the C and C++ languages are also designated for real-time systems, where every operation matters.

And by the way, when you declare MyClass myInstance, you can indeed rest assure that the default constructor is called, but you can choose whether or not you want to do anything in that constructor.

So the C and C++ languages allow you to make the same choice for primitive-type variables as well.

Wriest answered 24/4, 2014 at 8:41 Comment(4)
It's not the real time systems where it matters; it's ones which are CPU intensive (and which take a lot of time).Lientery
@JamesKanze I take it you never programmed on embedded real-time stuff ? Under tight conditions a couple of these might just make all the difference between being able to use a high level programming language and having to reach for assembler. Sometimes a few clock-cycles is all the leeway you have. You don't want any unneeded things happening behind your back. Granted: C++ isn't usually the language of choice for real-time stuff. It is usually C with additional tweaks and optimizations for the specific platform.Scurrility
@Scurrility Most of my early career was on embedded real-time stuff. If things were that tight, we'd use a faster processor. (There are exceptions, where I've had to juggle, but when I got to that point, I was writing in assembler.) The important thing about real-time is that the response time have a deterministic upper bound. So we'd use heap sort, instead of quick sort (supposing we had to sort, which I've never seen), despite the fact that heap sort is typically a lot slower.Lientery
@JamesKanze Seems we both started in the same business then. But were I was working a faster CPU usually didn't exist. So we often ended up looking at the assembler generated by the C compiler and then hand-optimizing that. (I did a lot of Hitachi H8 stuff. The compiler was very bad at optimizing.) Or we went asm all the way for the really critical parts. I agree that determinism (and predictability) is everything in embedded real-time. Never had to sort either :-)Scurrility

© 2022 - 2024 — McMap. All rights reserved.