Do rvalue references have the same overhead as lvalue references?

Asked 14/8, 2018 at 3:0 Answered 14/8, 2018 at 10:43

Solved c++c++11 overhead pass-by-rvalue-reference

Consider this example:

#include <utility>

// runtime dominated by argument passing
template <class T>
void foo(T t) {}

int main() {
    int i(0);
    foo<int>(i); // fast -- int is scalar type
    foo<int&>(i); // slow -- lvalue reference overhead
    foo<int&&>(std::move(i)); // ???
}

Is foo<int&&>(i) as fast as foo<int>(i), or does it involve pointer overhead like foo<int&>(i)?

EDIT: As suggested, running g++ -S gave me the same 51-line assembly file for foo<int>(i) and foo<int&>(i), but foo<int&&>(std::move(i)) resulted in 71 lines of assembly code (it looks like the difference came from std::move).

EDIT: Thanks to those who recommended g++ -S with different optimization levels -- using -O3 (and making foo noinline) I was able to get output which looks like xaxxon's solution.

Oleaster answered 14/8, 2018 at 3:0 Comment(11)

Premature optimization? – Pyrrolidine 14/8, 2018 at 3:4

@Rakete1111: Yes, this is just a curiosity. – Oleaster 14/8, 2018 at 3:6

Measure and find out. – Frobisher 14/8, 2018 at 3:17

Well, semantically it is possible to do double copy in case of rvalue reference, but for real cases I would expect compiler to use pointers - after all, code with real (not made by std::move) rvalues (and big rvalues - say, std::vector constructed on-the-fly) would be better off with pass-by-non-const-pointer – Behalf 14/8, 2018 at 3:17

"running g++ -S gave me the same 51-line assembly" - try it with different optimization levels (-O1 vs --O2 vs -O3 vs -Os). – Hockett 14/8, 2018 at 5:21

I second @JesperJuhl: -S is virtually never meaningful without -Os, and especially not with -O0 or -O3. Only -S -Os produces near-readable assembler code that shows what's actually going on. That said, your template foo<> is not even trying to actually use its parameter. The optimizer will throw out what you try to look at. For proper analysis, define three non-template functions like int foo_noref(int arg) { return arg; } in a separate file and compile with -S -Os. Then do the same for the calls void bar_noref() { int i = 0; foo_noref(i); }. – Kiely 14/8, 2018 at 11:42

Yes it adds the pointer, so all uses of the referred object will involve a pointer indirection. On the other hand, if the object you are referencing was bigger, then passing by value could invoke all the cost of making a copy, even if you then only accessed 1 member within the function. Pass the object on to another method by value and you make another copy. Sometimes this is what you want to do, but generally it is good policy to pass simple values by value and larger objects by const ref. – Chaussure 14/8, 2018 at 15:10

@GemTaylor: Well put. I've been looking for some template metaprogramming tool to pass simple values by value and larger objects by const ref for arbitrary types, part of why I brought this question up. – Oleaster 15/8, 2018 at 19:43

@TaylorNichols When it comes to TMP everything is inlined, and the compiler optimiser should reduce most reference parameters back to the original declaration, so it shouldn't matter whether you use const references or value copies. – Chaussure 15/8, 2018 at 19:48

If it does make a difference, you can always add conditional SFINAE and have 2 versions, but I suspect mainly it won't make much difference. is_integral will be your friend here. – Chaussure 15/8, 2018 at 19:51

@GemTaylor: I'm mainly thinking about non-temporary variables, such as class members, which won't get inlined. Also if functions take n parameters I'd have 2^n versions so I'm still considering the cleanest implementation. – Oleaster 15/8, 2018 at 19:51

In your specific situation, it's likely they are all the same. The resulting code from godbolt with gcc -O3 is https://godbolt.org/g/XQJ3Z4 for:

#include <utility>

// runtime dominated by argument passing
template <class T>
int foo(T t) { return t;}

int main() {
    int i{0};
    volatile int j;
    j = foo<int>(i); // fast -- int is scalar type
    j = foo<int&>(i); // slow -- lvalue reference overhead
    j = foo<int&&>(std::move(i)); // ???
}

is:

    mov     dword ptr [rsp - 4], 0 // foo<int>(i);
    mov     dword ptr [rsp - 4], 0 // foo<int&>(i);
    mov     dword ptr [rsp - 4], 0 // foo<int&&>(std::move(i)); 
    xor     eax, eax
    ret

The volatile int j is so that the compiler cannot optimize away all the code because it would otherwise know that the results of the calls are discarded and the whole program would optimize to nothing.

HOWEVER, if you force the function to not be inlined, then things change a bit int __attribute__ ((noinline)) foo(T t) { return t;}:

int foo<int>(int):                           # @int foo<int>(int)
        mov     eax, edi
        ret
int foo<int&>(int&):                          # @int foo<int&>(int&)
        mov     eax, dword ptr [rdi]
        ret
int foo<int&&>(int&&):                          # @int foo<int&&>(int&&)
        mov     eax, dword ptr [rdi]
        ret

above: https://godbolt.org/g/pbZ1BT

For questions like these, learn to love https://godbolt.org and https://quick-bench.com/ (quick bench requires you to learn how to properly use google test)

Crus answered 14/8, 2018 at 3:15 Comment(4)

I like the volatile int trick. Theoretically, could the compiler also optimize away the calls to foo(i) because it knows the input is discarded? – Oleaster 14/8, 2018 at 3:28

@TaylorNichols without the volatile, the whole program is optimized to nothing: godbolt.org/g/e3n6BA. With volatile, it means the compiler doesn't know that something doesn't happen between each assignment (something that's not present in the code), so it has to actually do "the right thing" which means setting a value "as if" it had called the function. – Crus 14/8, 2018 at 3:31

That makes sense, so foo never even gets called and we just get j = i; in each case, hence the mov statements. I suppose my question would be more relevant if foo was actually called. – Oleaster 14/8, 2018 at 3:39

well, then it looks like ref vs no-ref are a little different: godbolt.org/g/pbZ1BT – Crus 14/8, 2018 at 3:42

Efficiency of parameter passing depends on the ABI.

For example, on linux the Itanium C++ ABI specifies that references are passed as pointers to the referred object:

3.1.2 Reference Parameters

Reference parameters are handled by passing a pointer to the actual parameter.

This is independent of the reference category (rvalue/lvalue reference).

For a broader view, I have found this quote in a document from the Technical University of Denmark, calling convention, which analyzes most of the compilers:

References are treated as identical to pointers in all respects.

So rvalue and lvalue reference involve pointer overhead on all ABI.

Peignoir answered 14/8, 2018 at 10:43 Comment(1)

Thanks -- I was hoping to find some official docs on this. I suppose rvalue references would have to use something like pointers, as moving from an object often involves resetting it's variables from a different scope. – Oleaster 15/8, 2018 at 19:49

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags