I discovered that compile time of a relatively small amount of code, converting lambda functions to std::function<>
values, can be very high, in particular with Clang compiler.
Consider the following dummy code that creates 100 lambda functions:
#if MODE==1
#include <functional>
using LambdaType = std::function<int()>;
#elif MODE==2
using LambdaType = int(*)();
#elif MODE==3
#include "function.h" // https://github.com/skarupke/std_function
using LambdaType = func::function<int()>;
#endif
static int total=0;
void add(LambdaType lambda)
{
total += lambda();
}
int main(int argc, const char* argv[])
{
add([]{ return 1; });
add([]{ return 2; });
add([]{ return 3; });
// 96 more such lines...
add([]{ return 100; });
return total == 5050 ? 0 : 1;
}
Depending on MODE
preprocessor macro, that code can select between the following three ways to pass by a lambda function to add
function:
std::function<>
template class- a simple C pointer to function (possible here only because there is no capture)
- a fast replacement to
std::function
written by Malte Skarupke (https://probablydance.com/2013/01/13/a-faster-implementation-of-stdfunction/)
Whatever the mode, the program always exit with a regular 0
error code.
But now look at compilation time with Clang:
$ time clang++ -c -std=c++11 -DMODE=1 lambdas.cpp
real 0m8.162s
user 0m7.828s
sys 0m0.318s
$ time clang++ -c -std=c++11 -DMODE=2 lambdas.cpp
real 0m0.109s
user 0m0.056s
sys 0m0.046s
$ time clang++ -c -std=c++11 -DMODE=3 lambdas.cpp
real 0m0.870s
user 0m0.814s
sys 0m0.051s
$ clang++ --version
Apple LLVM version 10.0.0 (clang-1000.11.45.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Whow. There is a 80 times compile time difference between std::function
and pointer to function modes ! And even a 10 times difference between std::function
and its replacement.
How can it be? Is there a performance problem specific to Clang or is it due to the inherent complexity of std::function
requirement?
I tried to compile the same code with GCC 5.4 and Visual Studio 2015. There are also big compile time differences, but not as much.
GCC:
$ time g++ -c -std=c++11 -DMODE=1 lambdas.cpp
real 0m1.179s
user 0m1.080s
sys 0m0.092s
$ time g++ -c -std=c++11 -DMODE=2 lambdas.cpp
real 0m0.136s
user 0m0.120s
sys 0m0.012s
$ time g++ -c -std=c++11 -DMODE=3 lambdas.cpp
real 0m1.994s
user 0m1.792s
sys 0m0.196s
Visual Studio:
C:\>ptime cl /c /DMODE=1 /EHsc /nologo lambdas.cpp
Execution time: 2.411 s
C:\>ptime cl /c /DMODE=2 /EHsc /nologo lambdas.cpp
Execution time: 0.270 s
C:\>ptime cl /c /DMODE=3 /EHsc /nologo lambdas.cpp
Execution time: 1.122 s
I am now considering using Malte Skarupke's implementation, both for a slight better runtime performance and for a big compile time enhancement.
-O3
gcc optimizes the function pointer version to 37 lines of assembly (on my machine). I think it can compute the result 5050 at compile time. For comparison: Forstd::function
gcc generates about 8000 lines of assembly. There seems to be a lot of complexity involved when usingstd::function
, which would explain the compile time. – Oryx