Why design a language with unique anonymous types?
Asked Answered
M

9

95

This is something that has always been bugging me as a feature of C++ lambda expressions: The type of a C++ lambda expression is unique and anonymous, I simply cannot write it down. Even if I create two lambdas that are syntactically exactly the same, the resulting types are defined to be distinct. The consequence is, that a) lambdas can only be passed to template functions that allow the compile time, unspeakable type to be passed along with the object, and b) that lambdas are only useful once they are type erased via std::function<>.

Ok, but that's just the way C++ does it, I was ready to write it off as just an irksome feature of that language. However, I just learned that Rust seemingly does the same: Each Rust function or lambda has a unique, anonymous type. And now I'm wondering: Why?

So, my question is this:
What is the advantage, from a language designer point of view, to introduce the concept of a unique, anonymous type into a language?

Mccaskill answered 30/7, 2020 at 12:1 Comment(21)
as always the better question is why not.Avaricious
"that lambdas are only useful once they are type erased via std::function<>" - no, they are directly useful without std::function. A lambda that has been passed to a template function can be called directly without involving std::function. The compiler can then inline the lambda into the template function which will improve runtime efficiency.Farlee
i disagree with your b). I rarely use lambdas inside std::functionDaphne
maybe you are not aware how many types you are using whose name you never get in touch with. Eg some_container::iterator is typically just an alias, what is the actual name of the type is irrelevantDaphne
@Avaricious The obvious answer to that is: To allow storing/passing of the type without forcing a template. I cannot pass something to a function implemented in another translation unit, unless I can write down its type.Mccaskill
try to see it the other way. If you had to give each lambdas type a name you would soon start to complain about that ;)Daphne
@idclev463035818 Whether I use a type name directly or indirectly is irrelevant to my question. some_container::iterator is not a unique type, nor is it anonymous. It's generally just a (convenience) type alias to int*, for example. I can write down that type, I can pass a some_container::iterator it to non-template functions, I can store it inside an object, etc. pp.Mccaskill
My guess, it makes implemenenting lambda's easier, and makes the language easier to understand. If you did allow the exact same lambda expression to be folded into the same type, then you'd need special rules to handle { int i = 42; auto foo = [&i](){ return i; }; } { int i = 13; auto foo = [&i](){ return i; }; } since the variable it's referring to is different, even though textually they are the same. If you just say they are all unique, you don't have to worry about trying to figure it out.Frivol
but you can give a name to a lambdas type as well and do all the same with that. lambdas_type = decltype( my_lambda);Daphne
@idclev463035818 No, I wouldn't complain. I don't complain about function pointer types either, and they carry exactly the same information that lambda types do. With the difference that I can declare a function to take an int (*callback)(Foo*, int, bool) parameter which is impossible for a lambda. With the lambda, I still need to go through the type erasure step with std::function<>.Mccaskill
@idclev463035818 lambdas_type = decltype( my_lambda); doesn't cut it. I still cannot assign any other object to a variable of type lambdas_type than my_lambda.Mccaskill
How would you do it with a non-unique type? How should a compiler determine if two lambdas have the same type?Marten
Counterexample: auto my_lambda [i = 0]() mutable { return i++; }; auto your_lambda = my_lambda; std::cout << my_lambda(); my_lambda = your_lambda; std::cout << my_lambda();Grower
@Marten How does the compiler determine if two functions have the same type? It joins the types of their arguments and return value together. isalnum has the same type as isspace, and it makes perfect sense to pass either to a filter function, for instance. The filter function only needs to know that it gets a pointer to an int (*)(int) function.Mccaskill
But what should be a type of a generic lambda [](auto) {}? Should it have a type, to begin with?Marten
And there is an implicit conversion to a common type for non-capturing lambdas, so anywhere you can useFilter(isalnum); you can useFilter(my_lambda);Grower
@cma "The filter function only needs to know that it gets a pointer to an int (*)(int) function." - That would unnecessarily limit any given filter function's utility. After all, it should be possible to pass any callable object. C++20 introduces callable concepts, Rust provides the Fn traits. Neither one requires that you can name the actual callable object you are passing.Shortridge
@cmaster-reinstatemonica “How does the compiler determine if two functions have the same type? It joins the types of their arguments and return value together.” Note that unlike C++, Rust does not do that. All functions have a different type, they just happen to be castable to function pointers.Kilby
@Kilby Which brings us back to my question: Why would the Rust developers want every single function to have its own type? Is there any advantage? What is that advantage? I mean, you never use the type of a function unless you are passing around a pointer to that function, so cluttering the code with these casts seems worse than useless, imho.Mccaskill
Details matter. I can pass a lambda without a capture to anything that satisfy its function signature. Counter question: A lambda has a type (because C++ is strongly typed), how do you avoid name collisions with the rest of any program.Garneau
@Garneau And again, you demonstrate that you missed the point of this question. It's not about C++, or Rust, it's about the use of unique anonymous types to a language designer. C++ and Rust only serve as a motivation and example to this question. Their peculiarities, like whether I can implicitly convert a lambda to a function pointer, bears no importance on my question, whatsoever.Mccaskill
P
86

Many standards (especially C++) take the approach of minimizing how much they demand from compilers. Frankly, they demand enough already! If they don't have to specify something to make it work, they have a tendency to leave it implementation defined.

Were lambdas to not be anonymous, we would have to define them. This would have to say a great deal about how variables are captured. Consider the case of a lambda [=](){...}. The type would have to specify which types actually got captured by the lambda, which could be non-trivial to determine. Also, what if the compiler successfully optimizes out a variable? Consider:

static const int i = 5;
auto f = [i]() { return i; }

An optimizing compiler could easily recognize that the only possible value of i that could be captured is 5, and replace this with auto f = []() { return 5; }. However, if the type is not anonymous, this could change the type or force the compiler to optimize less, storing i even though it didn't actually need it. This is a whole bag of complexity and nuance that simply isn't needed for what lambdas were intended to do.

And, on the off-case that you actually do need a non-anonymous type, you can always construct the closure class yourself, and work with a functor rather than a lambda function. Thus, they can make lambdas handle the 99% case, and leave you to code your own solution in the 1%.


Deduplicator pointed out in comments that I did not address uniqueness as much as anonymity. I am less certain of the benefits of uniqueness, but it is worth noting that the behavior of the following is clear if the types are unique (action will be instantiated twice).

int counter()
{
    static int count = 0;
    return count++;
}

template <typename FuncT>
void action(const FuncT& func)
{
    static int ct = counter();
    func(ct);
}

...
for (int i = 0; i < 5; i++)
    action([](int j) { std::cout << j << std::endl; });

for (int i = 0; i < 5; i++)
    action([](int j) { std::cout << j << std::endl; });

If the types were not unique, we would have to specify what behavior should happen in this case. That could be tricky. Some of the issues that were raised on the topic of anonymity also raise their ugly head in this case for uniqueness.

Phonsa answered 31/7, 2020 at 7:0 Comment(6)
Note that this is not really about saving work for a compiler implementer, but saving work for the standards maintainer. The compiler still has to answer all of the questions above for its specific implementation, but they are not specified in the standard.Agronomy
@Agronomy Putting such things together when implementing a compiler is far easier when you don't have to fit your implementation to someone else's standard. Speaking from experience, it is often far easier on a standards maintainer to overspecify functionality than it is to try to find the minimum amount to specify while still getting the desired functionality out of your language. As an excellent case study, look at just how much work they spent avoiding overspecifying memory_order_consume while still making it useful (on some architectures)Phonsa
Like everyone else, you make a compelling case for anonymous. But is it really such a good idea to force it to be unique as well?Febricity
It is not complexity of the compiler that matters here, but the complexity of the generated code. The point is not to make the compiler simpler, but to give it enough wiggle room to optimize all the cases and produce natural code for the target platform.Cybil
You can't capture a static variable.Subinfeudation
@Subinfeudation I wasn't intending to (although I did realize I had to change variables in my example, so that might have been confusing you)Phonsa
K
72

Lambdas are not just functions, they are a function and a state. Therefore both C++ and Rust implement them as an object with a call operator (operator() in C++, the 3 Fn* traits in Rust).

Basically, [a] { return a + 1; } in C++ desugars to something like

struct __SomeName {
    int a;

    int operator()() {
        return a + 1;
    }
};

then using an instance of __SomeName where the lambda is used.

While in Rust, || a + 1 in Rust will desugar to something like

{
    struct __SomeName {
        a: i32,
    }

    impl FnOnce<()> for __SomeName {
        type Output = i32;
        
        extern "rust-call" fn call_once(self, args: ()) -> Self::Output {
            self.a + 1
        }
    }

    // And FnMut and Fn when necessary

    __SomeName { a }
}

This means that most lambdas must have different types.

Now, there are a few ways we could do that:

  • With anonymous types, which is what both languages implement. Another consequence of that is that all lambdas must have a different type. But for language designers, this has a clear advantage: Lambdas can be simply described using other already existing simpler parts of the language. They are just syntax sugar around already existing bits of the language.
  • With some special syntax for naming lambda types: This is however not necessary since lambdas can already be used with templates in C++ or with generics and the Fn* traits in Rust. Neither language ever force you to type-erase lambdas to use them (with std::function in C++ or Box<Fn*> in Rust).

Also note that both languages do agree that trivial lambdas that do not capture context can be converted to function pointers.


Describing complex features of a languages using simpler feature is pretty common. For example both C++ and Rust have range-for loops, and they both describe them as syntax sugar for other features.

C++ defines

for (auto&& [first,second] : mymap) {
    // use first and second
}

as being equivalent to

{

    init-statement
    auto && __range = range_expression ;
    auto __begin = begin_expr ;
    auto __end = end_expr ;
    for ( ; __begin != __end; ++__begin) {

        range_declaration = *__begin;
        loop_statement

    }

} 

and Rust defines

for <pat> in <head> { <body> }

as being equivalent to

let result = match ::std::iter::IntoIterator::into_iter(<head>) {
    mut iter => {
        loop {
            let <pat> = match ::std::iter::Iterator::next(&mut iter) {
                ::std::option::Option::Some(val) => val,
                ::std::option::Option::None => break
            };
            SemiExpr(<body>);
        }
    }
};

which while they seem more complicated for a human, are both simpler for a language designer or a compiler.

Kilby answered 30/7, 2020 at 12:45 Comment(10)
Ok, so it's the history of lambdas being conceived purely as syntactic sugar to anonymous structs. But why force the struct to be anonymous? Why not define the C++ case as something like struct interface { virtual <someType> operator()(<some other types>) = 0; }; and struct __SomeName : public interface { virtual <someType> operator()(<some other types>) { <code> } private: <captures> };? That way, we could easily pass the respective interface pointer around. That is, what std::function<> is designed to achieve, but why not provide it directly from the language?Mccaskill
@cmaster-reinstatemonica Consider passing a lambda as a comparator argument for a sorting function. Would you really want to impose an overhead of virtual function calls here?Been
@cmaster-reinstatemonica because nothing is virtual-by-default in C++Grower
@cmaster - You mean force all users of lambdas to pay for dynamic dipatch, even when they won't need it?Concettaconcettina
@cmaster-reinstatemonica The best you will get is opt-in to virtual. Guess what, std::function does thatGrower
@StoryTeller-UnslanderMonica No, definitely not. I meant to give an analogy of how the type could be handled. The virtual is just the way that I could express it in C++ parlance. If it were built directly into the language, the pointer pair to code and capture object would be passed directly. But note two things: 1) The virtual dispatch is exactly what std::function<> does. 2) Even with the definition I gave, the compiler would optimize the virtual dispatch away when the type of the object is known to be __SomeName exactly.Mccaskill
@cmaster So you would rely on devirtualization, a notoriously unreliable form of optimization, over simply using existing language features to guarantee static dispatch? Or are you suggesting basically the way it works today, but with a type name in the syntax so it isn't "anonymous" anymore - which seems 99% useless to me?Mannerheim
The generated struct could be final (or whatever it's called in C++), so no virtual call overhead on usage in templates. But there'd still be the size overhead of the vptr.Antilepton
@trentcl No. I would want to define the language in such a way that I can write down the type of a pointer to a lambda, and point it to different lambdas with the same interface. It is an implementation detail how the closure is handled, that should not be relevant to the programmer, imho.Mccaskill
@cmaster-reinstatemonica any mechanism where you can re-point the function to be called will have situations with runtime overhead. That isn't the C++ way. You opt in with std::functionGrower
L
14

Cort Ammon's accepted answer is good, but I think there's one more important point to make about implementability.

Suppose I have two different translation units, "one.cpp" and "two.cpp".

// one.cpp
struct A { int operator()(int x) const { return x+1; } };
auto b = [](int x) { return x+1; };
using A1 = A;
using B1 = decltype(b);

extern void foo(A1);
extern void foo(B1);

The two overloads of foo use the same identifier (foo) but have different mangled names. (In the Itanium ABI used on POSIX-ish systems, the mangled names are _Z3foo1A and, in this particular case, _Z3fooN1bMUliE_E.)

// two.cpp
struct A { int operator()(int x) const { return x + 1; } };
auto b = [](int x) { return x + 1; };
using A2 = A;
using B2 = decltype(b);

void foo(A2) {}
void foo(B2) {}

The C++ compiler must ensure that the mangled name of void foo(A1) in "two.cpp" is the same as the mangled name of extern void foo(A2) in "one.cpp", so that we can link the two object files together. This is the physical meaning of two types being "the same type": it's essentially about ABI-compatibility between separately compiled object files.

The C++ compiler is not required to ensure that B1 and B2 are "the same type." (In fact, it's required to ensure that they're different types; but that's not as important right now.)


What physical mechanism does the compiler use to ensure that A1 and A2 are "the same type"?

It simply burrows through typedefs, and then looks at the fully qualified name of the type. It's a class type named A. (Well, ::A, since it's in the global namespace.) So it's the same type in both cases. That's easy to understand. More importantly, it's easy to implement. To see if two class types are the same type, you take their names and do a strcmp. To mangle a class type into a function's mangled name, you write the number of characters in its name, followed by those characters.

So, named types are easy to mangle.

What physical mechanism might the compiler use to ensure that B1 and B2 are "the same type," in a hypothetical world where C++ required them to be the same type?

Well, it couldn't use the name of the type, because the type doesn't have a name.

Maybe it could somehow encode the text of the body of the lambda. But that would be kind of awkward, because actually the b in "one.cpp" is subtly different from the b in "two.cpp": "one.cpp" has x+1 and "two.cpp" has x + 1. So we'd have to come up with a rule that says either that this whitespace difference doesn't matter, or that it does (making them different types after all), or that maybe it does (maybe the program's validity is implementation-defined, or maybe it's "ill-formed no diagnostic required"). Anyway, mangling lambda types the same way across multiple translation units is certainly a harder problem than mangling named types like A.

The easiest way out of the difficulty is simply to say that each lambda expression produces values of a unique type. Then two lambda types defined in different translation units definitely are not the same type. Within a single translation unit, we can "name" lambda types by just counting from the beginning of the source code:

auto a = [](){};  // a has type $_0
auto b = [](){};  // b has type $_1
auto f(int x) {
    return [x](int y) { return x+y; };  // f(1) and f(2) both have type $_2
} 
auto g(float x) {
    return [x](int y) { return x+y; };  // g(1) and g(2) both have type $_3
} 

Of course these names have meaning only within this translation unit. This TU's $_0 is always a different type from some other TU's $_0, even though this TU's struct A is always the same type as some other TU's struct A.

By the way, notice that our "encode the text of the lambda" idea had another subtle problem: lambdas $_2 and $_3 consist of exactly the same text, but they should clearly not be considered the same type!


By the way, C++ does require the compiler to know how to mangle the text of an arbitrary C++ expression, as in

template<class T> void foo(decltype(T())) {}
template void foo<int>(int);  // _Z3fooIiEvDTcvT__EE, not _Z3fooIiEvT_

But C++ doesn't (yet) require the compiler to know how to mangle an arbitrary C++ statement. decltype([](){ ...arbitrary statements... }) is still ill-formed even in C++20.


Also notice that it's easy to give a local alias to an unnamed type using typedef/using. I have a feeling that your question might have arisen from trying to do something that could be solved like this.

auto f(int x) {
    return [x](int y) { return x+y; };
}

// Give the type an alias, so I can refer to it within this translation unit
using AdderLambda = decltype(f(0));

int of_one(AdderLambda g) { return g(1); }

int main() {
    auto f1 = f(1);
    assert(of_one(f1) == 2);
    auto f42 = f(42);
    assert(of_one(f42) == 43);
}

EDITED TO ADD: From reading some of your comments on other answers, it sounds like you're wondering why

int add1(int x) { return x + 1; }
int add2(int x) { return x + 2; }
static_assert(std::is_same_v<decltype(add1), decltype(add2)>);
auto add3 = [](int x) { return x + 3; };
auto add4 = [](int x) { return x + 4; };
static_assert(not std::is_same_v<decltype(add3), decltype(add4)>);

That's because captureless lambdas are default-constructible. (In C++ only as of C++20, but it's always been conceptually true.)

template<class T>
int default_construct_and_call(int x) {
    T t;
    return t(x);
}

assert(default_construct_and_call<decltype(add3)>(42) == 45);
assert(default_construct_and_call<decltype(add4)>(42) == 46);

If you tried default_construct_and_call<decltype(&add1)>, t would be a default-initialized function pointer and you'd probably segfault. That's, like, not useful.

Lo answered 31/7, 2020 at 23:3 Comment(7)
"In fact, it's required to ensure that they're different types; but that's not as important right now." I wonder if there is a good reason to force uniqueness if equivalently defined.Febricity
Personally I think completely defined behavior is (almost?) always better than unspecified behavior. "Are these two function pointers equal? Well, only if these two template instantiations are the same function, which is true only if these two lambda types are the same type, which is true only if the compiler decided to merge them." Icky! (But notice that we have an exactly analogous situation with string-literal merging, and nobody is perturbed about that situation. So I doubt it'd be catastrophic to permit the compiler to merge identical types.)Lo
Well, whether two equivalent functions (barring as-if) may be identical is also a nice question. The language in the standard is not quite obvious for free and/or static functions. But that's outside the scope here.Febricity
Serendipitously, there has been discussion this very month on the LLVM mailing list about merging functions. Clang's codegen will cause functions with completely empty bodies to get merged almost "by accident": godbolt.org/z/obT55b This is technically non-conforming, and I think they'll likely patch LLVM to stop doing this. But yep, agreed, merging function addresses is also a thing.Lo
That example has other problems, namely missing return-statement. Don't they alone already make the code non-conforming? Also, I will look for the discussion, but did they show or assume merging equivalent functions is non-conforming to the standard, their documented behavior, to gcc, or just that some rely on it not happening?Febricity
@Deduplicator: As long as no caller tries to use f's return value, it's conforming. In this case, no caller even tries to call f, so we're definitely OK. The discussion thread is titled "[cfe-dev] Zero length function pointer equality", started on 2020-07-23 by David Blaikie.Lo
Hmpf. Blaikie (thread-starter) states he believes identical functions must compare unequal, a second one states they believe similar, a third links to a gold linker paper stating that some folding is not safe if code assumes it does not happen (really?!!), but I doubt anybody actually took the time to prove whether it is allowed or forbidden per the standard. Oh well.Febricity
C
12

(Adding to Caleth's answer, but too long to fit in a comment.)

The lambda expression is just syntactic sugar for an anonymous struct (a Voldemort type, because you can't say its name).

You can see the similarity between an anonymous struct and the anonymity of a lambda in this code snippet:

#include <iostream>
#include <typeinfo>

using std::cout;

int main() {
    struct { int x; } foo{5};
    struct { int x; } bar{6};
    cout << foo.x << " " << bar.x << "\n";
    cout << typeid(foo).name() << "\n";
    cout << typeid(bar).name() << "\n";
    auto baz = [x = 7]() mutable -> int& { return x; };
    auto quux = [x = 8]() mutable -> int& { return x; };
    cout << baz() << " " << quux() << "\n";
    cout << typeid(baz).name() << "\n";
    cout << typeid(quux).name() << "\n";
}

If that is still unsatisfying for a lambda, it should be likewise unsatisfying for an anonymous struct.

Some languages allow for a kind of duck typing that is a little more flexible, and even though C++ has templates that doesn't really help in making a object from a template that has a member field that can replace a lambda directly rather than using a std::function wrapper.

Cesium answered 30/7, 2020 at 12:39 Comment(5)
Thank you, that does indeed shed a little light on the reasoning behind the way lambdas are defined in C++ (I got to remember the term "Voldemort type" :-) ). However, the question remains: What is the advantage of this in the eyes of a language designer?Mccaskill
You could even add int& operator()(){ return x; } to those structsGrower
@cmaster-reinstatemonica • Speculatively... the rest of C++ behaves that way. To make lambdas use some sort of "surface shape" duck typing would be something very different from the rest of the language. Adding that kind of facility in the language for lambdas would probably be considered to be generalized for the entire language, and that would be a potentially huge breaking change. Omitting such a facility for just lambdas fits in with the strongly-ish typing of the rest of C++.Cesium
Technically a Voldemort type would be auto foo(){ struct DarkLord {} tom_riddle; return tom_riddle; }, because outside of foo nothing can use the identifier DarkLordGrower
@cmaster-reinstatemonica efficiency, the alternative would be to box & dynamically dispatch every lambda (allocate it on the heap and erase its precise type). Now as you note the compiler could deduplicate the anonymous types of lambdas, but you still wouldn't be able to write them down and it would require significant work for very little gain, so the odds are not really in favor.Choroid
G
10

C++ lambdas need distinct types for distinct operations, as C++ binds statically. They are only copy/move-constructable, so mostly you don't need to name their type. But that's all somewhat of an implementation detail.

I'm not sure if C# lambdas have a type, as they are "anonymous function expressions", and they immediately get converted to a compatible delegate type or expression tree type. If the do, it's probably an unpronouncable type.

C++ also has anonymous structs, where each definition leads to a unique type. Here the name isn't unpronouncable, it simply doesn't exist as far as the standard is concerned.

C# has anonymous data types, which it carefully forbids from escaping from the scope they are defined. The implementation gives a unique, unpronouncable name to those too.

Having an anonymous type signals to the programmer that they shouldn't poke around inside their implementation.

Aside:

You can give a name to a lambda's type.

auto foo = []{}; 
using Foo_t = decltype(foo);

If you don't have any captures, you can use a function pointer type

void (*pfoo)() = foo;
Grower answered 30/7, 2020 at 12:26 Comment(2)
The first example code still won't allow a subsequent Foo_t = []{};, only Foo_t = foo and nothing else.Mccaskill
@cmaster-reinstatemonica that's because the type isn't default constructable, not because of the anonymity. My guess is that's as much to do with avoiding having an even larger set of corner cases that you have to remember, as any technical reason.Grower
F
10

Why design a language with unique anonymous types?

Because there are cases where names are irrelevant and not useful or even counter-productive. In this case the ability abstract out their existence is useful because it reduces name pollution, and solves one of the two hard problems in computers science (how to name things). For the same reason, temporary objects are useful.

lambda

The uniqueness is not a special lambda thing, or even special thing to anonymous types. It applies to named types in the language as well. Consider following:

struct A {
    void operator()(){};
};

struct B {
    void operator()(){};
};

void foo(A);

Note that I cannot pass B into foo, even though the classes are identical. This same property applies to unnamed types.

lambdas can only be passed to template functions that allow the compile time, unspeakable type to be passed along with the object ... erased via std::function<>.

There's a third option for a subset of lambdas: Non-capturing lambdas can be converted to function pointers.


Note that if the limitations of an anonymous type are a problem for a use case, then the solution is simple: A named type can be used instead. Lambdas don't do anything that cannot be done with a named class.

Foxtail answered 30/7, 2020 at 12:51 Comment(0)
S
6

Why use anonymous types?

For types that are automatically generated by the compiler, the choice is to either (1) honor a user's request for the name of the type, or (2) let the compiler choose one on its own.

  1. In the former case, the user is expected to explicitly provide a name each time such a construct appears (C++/Rust: whenever a lambda is defined; Rust: whenever a function is defined). This is a tedious detail for the user to provide each time, and in the majority of cases the name is never referred to again. Thus it make sense to let the compiler figure out a name for it automatically, and use existing features such as decltype or type inference to reference the type in the few places where it is needed.

  2. In the latter case, the compiler need to choose a unique name for the type, which would probably be an obscure, unreadable name such as __namespace1_module1_func1_AnonymousFunction042. The language designer could specify precisely how this name is constructed in glorious and delicate detail, but this needlessly exposes an implementation detail to the user that no sensible user could rely upon, since the name is no doubt brittle in the face of even minor refactors. This also unnecessarily constrains the evolution of the language: future feature additions may cause the existing name generation algorithm to change, leading to backward compatibility issues. Thus, it makes sense to simply omit this detail, and assert that the auto-generated type is unutterable by the user.

Why use unique (distinct) types?

If a value has a unique type, then an optimizing compiler can track a unique type across all its use sites with guaranteed fidelity. As a corollary, the user can then be certain of the places where the provenance of this particular value is full known to the compiler.

As an example, the moment the compiler sees:

let f: __UniqueFunc042 = || { ... };  // definition of __UniqueFunc042 (assume it has a nontrivial closure)

/* ... intervening code */

let g: __UniqueFunc042 = /* some expression */;
g();

the compiler has full confidence that g must necessarily originate from f, without even knowing the provenance of g. This would allow the call to g to be devirtualized. The user would know this too, since the user has taken great care to preserve the unique type of f through the flow of data that led to g.

Necessarily, this constrains what the user can do with f. The user is not at liberty to write:

let q = if some_condition { f } else { || {} };  // ERROR: type mismatch

as that would lead to the (illegal) unification of two distinct types.

To work around this, the user could upcast the __UniqueFunc042 to the non-unique type &dyn Fn(),

let f2 = &f as &dyn Fn();  // upcast
let q2 = if some_condition { f2 } else { &|| {} };  // OK

The trade-off made by this type erasure is that uses of &dyn Fn() complicate the reasoning for the compiler. Given:

let g2: &dyn Fn() = /*expression */;

the compiler has to painstakingly examine the /*expression */ to determine whether g2 originates from f or some other function(s), and the conditions under which that provenance holds. In many circumstances, the compiler may give up: perhaps human could tell that g2 really comes from f in all situations but the path from f to g2 was too convoluted for the compiler to decipher, resulting in a virtual call to g2 with pessimistic performance.

This becomes more evident when such objects delivered to generic (template) functions:

fn h<F: Fn()>(f: F);

If one calls h(f) where f: __UniqueFunc042, then h is specialized to a unique instance:

h::<__UniqueFunc042>(f);

This enables the compiler to generate specialized code for h, tailored for the particular argument of f, and the dispatch to f is quite likely to be static, if not inlined.

In the opposite scenario, where one calls h(f) with f2: &Fn(), the h is instantiated as

h::<&Fn()>(f);

which is shared among all functions of type &Fn(). From within h, the compiler knows very little about an opaque function of type &Fn() and so could only conservatively call f with a virtual dispatch. To dispatch statically, the compiler would have to inline the call to h::<&Fn()>(f) at its call site, which is not guaranteed if h is too complex.

Spar answered 2/8, 2020 at 21:24 Comment(1)
The first part about choosing names misses the point: A type like void(*)(int, double) may not have a name, but I can write it down. I would call it a nameless type, not an anonymous type. And I would call cryptic stuff like __namespace1_module1_func1_AnonymousFunction042 name mangling, which is definitely not within the scope of this question. This question is about types that are guaranteed by the standard to be impossible to write down, as opposed to introducing a type syntax that can express these types in a useful way.Mccaskill
G
3

First, lambda without capture are convertible to a function pointer. So they provide some form of genericity.

Now why lambdas with capture are not convertible to pointer? Because the function must access the state of the lambda, so this state would need to appear as a function argument.

Goya answered 30/7, 2020 at 12:23 Comment(1)
Well, the captures should become part of the lambda itself, no? Just like they are encapsulated within an std::function<>.Mccaskill
G
3

To avoid name collisions with user code.

Even two lambdas with same implementation will have different types. Which is okay because I can have different types for objects too even if their memory layout is equal.

Garneau answered 31/7, 2020 at 12:26 Comment(9)
A type like int (*)(Foo*, int, double) does not run any risk of name collision with user code.Mccaskill
Your example does not generalize very well. While a lambda expression is only syntax it will evaluate to some struct especially with capture clause. Naming it explicitly could lead to name clashes of already existing structs.Garneau
Again, this question is about language design, not about C++. I can surely define a language where a lambda's type is more akin to a function pointer type than to a data structure type. The function pointer syntax in C++ and the dynamic array type syntax in C prove that this is possible. And that begs the question, why lambdas did not use a similar approach?Mccaskill
No you can't, because of variable currying (capturing). You need both a function and data to make it work.Turanian
@Turanian Oh, yes, I can. I could define a lambda to be an object containing two pointers, one for the capture object, and one for the code. Such a lambda object would be easy to pass around by value. Or I could pull tricks with a stub of code at the start of the capture object that takes its own address before jumping to the actual lambda code. That would turn a lambda pointer into a single address. But that's unnecessary as the PPC platform has proven: On PPC, a function pointer is actually a pair of pointers. That's why you can't cast void(*)(void) to void* and back in standard C/C++.Mccaskill
The first part describes exactly what C++ is doing, what is the point you're trying to make with these circuitous arguments? As to the second part, segmented pointers doesn't mean you get extra room inside your pointers, it means the addressing bus is larger than the register size -- again, what are you trying to prove here?Turanian
@Turanian PPC function pointers have nothing, absolutely nothing to do with X86 segmented memory. They are truly a pair of inidividually complete pointers where one points to the code (no surprise here) and the other points to the global data that's relevant for that code. This two-pointer method allows shared libraries to access their global data without relying on recovering their location from the program counter.Mccaskill
@Turanian So, my point is: It is possible to bundle two architectural pointers into a type called void(*)(void), and to pass that two-in-one pointer around. You said that I cannot design a language where a lambda's type is more akin to a function pointer than a data structure type. And you said that the reason for my inability was that I couldn't bundle the "function and data to make it work". But that is exactly what the PPC platform is doing with its function pointers: The type void(*)(void) is actually a pair of pointers where one provides the function and the other the data.Mccaskill
@cmaster-reinstatemonica a double-wide pointer is basically what is done inside std::function if the state is too large to be embedded directly (it stores it dynamically in that case), so yes, you can absolutely encode a stateful callable via two pointers, but you need to store the state somewhere. Using a class avoids resorting to dynamic storage, which I think should be avoided for such an impactful part of the language.Prenomen

© 2022 - 2024 — McMap. All rights reserved.