You can do these things, largely because they're actually not all that difficult to do.
From the viewpoint of the compiler, having a function declaration inside another function is pretty trivial to implement. The compiler needs a mechanism to allow declarations inside of functions to handle other declarations (e.g., int x;
) inside a function anyway.
It will typically have a general mechanism for parsing a declaration. For the guy writing the compiler, it doesn't really matter at all whether that mechanism is invoked when parsing code inside or outside of another function--it's just a declaration, so when it sees enough to know that what's there is a declaration, it invokes the part of the compiler that deals with declarations.
In fact, prohibiting these particular declarations inside a function would probably add extra complexity, because the compiler would then need an entirely gratuitous check to see if it's already looking at code inside a function definition and based on that decide whether to allow or prohibit this particular declaration.
That leaves the question of how a nested function is different. A nested function is different because of how it affects code generation. In languages that allow nested functions (e.g., Pascal) you normally expect that the code in the nested function has direct access to the variables of the function in which it's nested. For example:
int foo() {
int x;
int bar() {
x = 1; // Should assign to the `x` defined in `foo`.
}
}
Without local functions, the code to access local variables is fairly simple. In a typical implementation, when execution enters the function, some block of space for local variables is allocated on the stack. All the local variables are allocated in that single block, and each variable is treated as simply an offset from the beginning (or end) of the block. For example, let's consider a function something like this:
int f() {
int x;
int y;
x = 1;
y = x;
return y;
}
A compiler (assuming it didn't optimize away the extra code) might generate code for this roughly equivalent to this:
stack_pointer -= 2 * sizeof(int); // allocate space for local variables
x_offset = 0;
y_offset = sizeof(int);
stack_pointer[x_offset] = 1; // x = 1;
stack_pointer[y_offset] = stack_pointer[x_offset]; // y = x;
return_location = stack_pointer[y_offset]; // return y;
stack_pointer += 2 * sizeof(int);
In particular, it has one location pointing to the beginning of the block of local variables, and all access to the local variables is as offsets from that location.
With nested functions, that's no longer the case--instead, a function has access not only to its own local variables, but to the variables local to all the functions in which it's nested. Instead of just having one "stack_pointer" from which it computes an offset, it needs to walk back up the stack to find the stack_pointers local to the functions in which it's nested.
Now, in a trivial case that's not all that terrible either--if bar
is nested inside of foo
, then bar
can just look up the stack at the previous stack pointer to access foo
's variables. Right?
Wrong! Well, there are cases where this can be true, but it's not necessarily the case. In particular, bar
could be recursive, in which case a given invocation of bar
might have to look some nearly arbitrary number of levels back up the stack to find the variables of the surrounding function. Generally speaking, you need to do one of two things: either you put some extra data on the stack, so it can search back up the stack at run-time to find its surrounding function's stack frame, or else you effectively pass a pointer to the surrounding function's stack frame as a hidden parameter to the nested function. Oh, but there's not necessarily just one surrounding function either--if you can nest functions, you can probably nest them (more or less) arbitrarily deep, so you need to be ready to pass an arbitrary number of hidden parameters. That means you typically end up with something like a linked list of stack frames to surrounding functions, and access to variables of surrounding functions is done by walking that linked list to find its stack pointer, then accessing an offset from that stack pointer.
That, however, means that access to a "local" variable may not be a trivial matter. Finding the correct stack frame to access the variable can be non-trivial, so access to variables of surrounding functions is also (at least usually) slower than access to truly local variables. And, of course, the compiler has to generate code to find the right stack frames, access variables via any of an arbitrary number of stack frames, and so on.
This is the complexity that C was avoiding by prohibiting nested functions. Now, it's certainly true that a current C++ compiler is a rather different sort of beast from a 1970's vintage C compiler. With things like multiple, virtual inheritance, a C++ compiler has to deal with things on this same general nature in any case (i.e., finding the location of a base-class variable in such cases can be non-trivial as well). On a percentage basis, supporting nested functions wouldn't add much complexity to a current C++ compiler (and some, such as gcc, already support them).
At the same time, it rarely adds much utility either. In particular, if you want to define something that acts like a function inside of a function, you can use a lambda expression. What this actually creates is an object (i.e., an instance of some class) that overloads the function call operator (operator()
) but it still gives function-like capabilities. It makes capturing (or not) data from the surrounding context more explicit though, which allows it to use existing mechanisms rather than inventing a whole new mechanism and set of rules for its use.
Bottom line: even though it might initially seem like nested declarations are hard and nested functions are trivial, more or less the opposite is true: nested functions are actually much more complex to support than nested declarations.
one
is a function definition, the other two are declarations. – Cossackclass three
, you'd use a lambda ... that's why they were added, to reduce exactly this verbiage. Also this question has nothing at all to do with MVP. – CanberraThis is legal, but why would I want this
– Speedballint two(int)
-- there are multiple reasons why this is allowed. One is that it makes parsing easier (Jerry Coffin's answer). Another is that because it is allowed, there's lots of legacy code (and even some modern code) that uses such constructs. A very compelling case needs to be made for a proposed change that revokes backwards compatibility. – Eruditionthree(42)
-- This is not the most vexing parse.three(42)
creates an instance ofclass three
. There is no stream insertion operator for this class (e.g.,std::ostream& operator<<(std::ostream&, const three&)
), so the compiler looks for a conversion operator that will convert an object of typethree
to a type for which a stream insertion operator does exist. That of coursethree::operator int()
. – Eruditionbut are not supported by GNU C++.
. – Speedballint f();
to be a function declaration when used at namespace scope but a variable declaration when used at function scope. That's a bit scary. – Eruditionint two(int bar);
you asked "This is legal, but why would I want this?" In modern code, you don't want to do this. The problem is that not all code compiled by the compiler is modern code. There's lots and lots of old code out there that does exactly this. In the early days of C, some even advocated that this was the "right way" to declare functions as it lets readers of the code see a function's fanout right up front. Some of that code still exists. And it still compiles, as-is. – Eruditionclass
approach,struct {int operator()(int i) {return i;}} func; func(42);
, see: ideone.com/0j8kTK – Equidistant