How can I make sense of C++ profiling data on Windows, when a lot of code gets inlined by the compiler? I.e. I of course want to measure the code that actually gets run, so by definition I'm going to measure an optimized build of the code. But it seems like none of the tools I try actually manage to resolve inline functions.
I have tried both the sampling profiler in Visual Studio 2017 Professional as well as VTune 2018. I have tried to enable /Zo
, but it does not seem to have any affect.
I have found the following resource which seems to indicate that only Visual Studio Ultimate or Premium support inline frame information - is this still true for Visual Studio 2017? https://social.msdn.microsoft.com/Forums/en-US/9df15363-5aae-4f0b-a5ad-dd9939917d4c/which-functions-arent-pgo-optimized-using-profile-data?forum=vsdebug
Here is an example code:
#include <cmath>
#include <random>
#include <iostream>
inline double burn()
{
std::uniform_real_distribution<double> uniform(-1E5, 1E5);
std::default_random_engine engine;
double s = 0;
for (int i = 0; i < 100000000; ++i) {
s += uniform(engine);
}
return s;
}
int main()
{
std::cout << "random sum: " << burn() << '\n';
return 0;
}
Compile it with Visual Studio in Release mode. Or on the command line, try cl /O2 /Zi /Zo /EHsc main.cpp
. Then try to profile it with the CPU Sampling Profiler in Visual Studio. You will at most see something like this:
VTune 2018 looks similar on Windows. On Linux, perf and VTune have no problem showing frames from inlined functions... Is this feature, which is in my opinion crucial for C++ tooling, really not part of the non-Premium/Ultimate Visual Studio toolchains? How do people on Windows deal with that? What is the point of /Zo
then?
EDIT: I just tried to compile the minimal example above with clang and it produces different, but still unsatisfying results? I compiled clang 6.0.0 (trunk), build from LLVM rev 318844 and clang rev 318874. Then I compile my code with clang++ -std=c++17 -O2 -g main.cpp -o main.exe
and run the resulting executable with the Sampling Profiler in Visual Studio again, the result is:
So now I see the burn
function, but lost the source file information. Also, the uniform_real_distribution
is still not being shown anywhere.
EDIT 2: As suggested in the comments, I now also tried out clang-cl
with the same arguments as cl
above, i.e.: clang-cl.exe /O2 /Zi /Zo /EHsc main.cpp
. This produces the same results as clang.exe
, but we also get somewhat working source mappings:
EDIT 3: I originally thought clang would magically solve this issue. It doesn't, sadly. Most inlined frames are still missing :(
EDIT 4: Inline frames are not supported in VTune for applicatoins build with MSVC/PDB builds: https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/749363