I am a bit confused about the implementation of the function void DoNotOptimize
of the Google Benchmark Framework (definition from here):
template <class Tp>
inline BENCHMARK_ALWAYS_INLINE void DoNotOptimize(Tp const& value) {
asm volatile("" : : "r,m"(value) : "memory");
}
template <class Tp>
inline BENCHMARK_ALWAYS_INLINE void DoNotOptimize(Tp& value) {
#if defined(__clang__)
asm volatile("" : "+r,m"(value) : : "memory");
#else
asm volatile("" : "+m,r"(value) : : "memory");
#endif
}
So it materializes the variable, and if non-constant, also tells the compiler to forget anything about its previous value. ("+r"
is an RMW operand).
And also always uses a "memory"
clobber, which is a compiler barrier against reordering loads/stores, i.e. make sure all globally-reachable objects have their memory in sync with the C++ abstract machine, and assume they also might have been modified.
I am far away from being an expert in low-level code, but as far as I understand the implementation, the function serves as a read/write barrier. So - basically - it ensures that the value passed in is either in a register or in memory.
While this seems to be entirely reasonable if I want to preserve the result of a function (which should be benchmarked) generally, I am a bit surprised about the degree of freedom left for the compiler.
My understanding of the given code is that the compiler may insert a materialization point whenever DoNotOptimize
is called, which would imply a notable amount of overhead when executed repeatedly (e.g., in a loop).
When the value should not optimize out is just a single scalar value, it seems to be sufficient if the compiler ensures that the value resides in a register.
Wouldn't it be a good idea to distinguish between pointers and non-pointers for instance:
template< class T >
inline __attribute__((always_inline))
void do_not_optimize( T&& value ) noexcept {
if constexpr( std::is_pointer_v< T > ) {
asm volatile("":"+m"(value)::"memory");
} else {
asm volatile("":"+r"(value)::);
}
}