In Scott Meyers book I found an example of universal generic lambda expression that can be used to measure function execution time. (C++14)
auto timeFuncInvocation =
[](auto&& func, auto&&... params) {
// get time before function invocation
const auto& start = std::chrono::high_resolution_clock::now();
// function invocation using perfect forwarding
std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
// get time after function invocation
const auto& stop = std::chrono::high_resolution_clock::now();
return stop - start;
};
The problem is that you are measure only one execution so the results can be very differ. To get a reliable result you should measure a large number of execution.
According to Andrei Alexandrescu lecture at code::dive 2015 conference - Writing Fast Code I:
Measured time: tm = t + tq + tn + to
where:
tm - measured (observed) time
t - the actual time of interest
tq - time added by quantization noise
tn - time added by various sources of noise
to - overhead time (measuring, looping, calling functions)
According to what he said later in the lecture, you should take a minimum of this large number of execution as your result.
I encourage you to look at the lecture in which he explains why.
Also there is a very good library from google - https://github.com/google/benchmark.
This library is very simple to use and powerful. You can checkout some lectures of Chandler Carruth on youtube where he is using this library in practice. For example CppCon 2017: Chandler Carruth “Going Nowhere Faster”;
Example usage:
#include <iostream>
#include <chrono>
#include <vector>
auto timeFuncInvocation =
[](auto&& func, auto&&... params) {
// get time before function invocation
const auto& start = high_resolution_clock::now();
// function invocation using perfect forwarding
for(auto i = 0; i < 100000/*largeNumber*/; ++i) {
std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
}
// get time after function invocation
const auto& stop = high_resolution_clock::now();
return (stop - start)/100000/*largeNumber*/;
};
void f(std::vector<int>& vec) {
vec.push_back(1);
}
void f2(std::vector<int>& vec) {
vec.emplace_back(1);
}
int main()
{
std::vector<int> vec;
std::vector<int> vec2;
std::cout << timeFuncInvocation(f, vec).count() << std::endl;
std::cout << timeFuncInvocation(f2, vec2).count() << std::endl;
std::vector<int> vec3;
vec3.reserve(100000);
std::vector<int> vec4;
vec4.reserve(100000);
std::cout << timeFuncInvocation(f, vec3).count() << std::endl;
std::cout << timeFuncInvocation(f2, vec4).count() << std::endl;
return 0;
}
EDIT:
Ofcourse you always need to remember that your compiler can optimize something out or not. Tools like perf can be useful in such cases.
clock_gettime
.. gcc defines other clocks as:typedef system_clock steady_clock; typedef system_clock high_resolution_clock;
on Windows, useQueryPerformanceCounter
. – Shon