Consider this example:
#include <utility>
// runtime dominated by argument passing
template <class T>
void foo(T t) {}
int main() {
int i(0);
foo<int>(i); // fast -- int is scalar type
foo<int&>(i); // slow -- lvalue reference overhead
foo<int&&>(std::move(i)); // ???
}
Is foo<int&&>(i)
as fast as foo<int>(i)
, or does it involve pointer overhead like foo<int&>(i)
?
EDIT: As suggested, running g++ -S
gave me the same 51-line assembly file for foo<int>(i)
and foo<int&>(i)
, but foo<int&&>(std::move(i))
resulted in 71 lines of assembly code (it looks like the difference came from std::move
).
EDIT: Thanks to those who recommended g++ -S
with different optimization levels -- using -O3
(and making foo noinline
) I was able to get output which looks like xaxxon's solution.
-O1
vs--O2
vs-O3
vs-Os
). – Hockett-S
is virtually never meaningful without-Os
, and especially not with-O0
or-O3
. Only-S -Os
produces near-readable assembler code that shows what's actually going on. That said, your templatefoo<>
is not even trying to actually use its parameter. The optimizer will throw out what you try to look at. For proper analysis, define three non-template functions likeint foo_noref(int arg) { return arg; }
in a separate file and compile with-S -Os
. Then do the same for the callsvoid bar_noref() { int i = 0; foo_noref(i); }
. – Kielyis_integral
will be your friend here. – Chaussure