I have an extremely simple program to measure how much time a function is taking.
#include <iostream>
#include <vector>
#include <chrono>
struct Foo
{
void addSample(uint64_t s)
{
}
};
void test(const std::vector<uint64_t>& samples)
{
uint32_t onlyCallTime = 0;
uint32_t loopOnlyTime = 0;
Foo stats;
std::chrono::high_resolution_clock::time_point callStart,callEnd;
auto start = callStart = callEnd = std::chrono::high_resolution_clock::now();
for(auto &s : samples)
{
callStart = std::chrono::high_resolution_clock::now();
loopOnlyTime += std::chrono::duration_cast<std::chrono::microseconds>(callStart-callEnd).count();
stats.addSample(s);
callEnd = std::chrono::high_resolution_clock::now();
onlyCallTime += std::chrono::duration_cast<std::chrono::microseconds>(callEnd-callStart).count();
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "overall duration: " << std::chrono::duration_cast<std::chrono::microseconds>(end-start).count() << std::endl;
std::cout << "only call duration: " << onlyCallTime << std::endl;
std::cout << "only loop duration: " << loopOnlyTime << std::endl;
}
int main()
{
std::vector<uint64_t> dataSetDecreasing;
for(uint32_t i = 0; i < 1000000; ++i)
dataSetDecreasing.push_back(1000000-i);
test(dataSetDecreasing);
}
The output is really confusing. Here are some examples:
overall duration: 56047
only call duration: 195
only loop duration: 285
overall duration: 40984
only call duration: 177
only loop duration: 243
overall duration: 47328
only call duration: 187
only loop duration: 177
How I see it is that callEnd-callStart
captures the call to addSample
+ the duration_cast
.
callStart-callEnd
captures everything else, so the loop initialization, iteration, condition and the second duration_cast
. What am I missing? Where are the other ~40000 microseconds going?
Compiled with g++ -Wall -Wextra -std=c++17 -O3
g++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)
My OS is Fedora release 32 (Thirty Two)
perf stat
to report it about as precisely as the OS measures it. – BrahuiaddSample
is pointless. AtO3
compiler should be optimizing it out. You will have to add some observable effect for the function to make it to the executable. – Poetry