Inefficiency of copy-and-swap idiom?
Asked Answered
O

3

10

I was testing some code where there is a std::vector data member inside a class. The class is both copiable and movable, and the operator= is implemented as described here using the copy-and-swap idiom.

If there are two vectors, say v1 with big capacity and v2 with small capacity, and v2 is copied to v1 (v1 = v2), the big capacity in v1 is kept after the assignment; this makes sense, since next v1.push_back() calls don't have to force new reallocations (in other words: freeing already available memory, then reallocating it to grow the vector doesn't make much sense).

But, if the same assignment is done with the class having the vector as data member, the behavior is different, and after the assignment the bigger capacity is not kept.

If the copy-and-swap idiom is not used, and copy operator= and move operator= are implemented separately, then the behavior is as expected (as for ordinary non-member vectors).

Why is that? Should we not follow copy-and-swap idiom and instead implement operator=(const X& other) (copy op=) and operator=(X&& other) (move op=) separately for optimum performance?

This is the output of a reproducible test with copy-and-swap idiom (note how in this case, after x1 = x2, x1.GetV().capacity() is 1,000, not 1,000,000):

C:\TEMP\CppTests>cl /EHsc /W4 /nologo /DTEST_COPY_AND_SWAP test.cpp
test.cpp

C:\TEMP\CppTests>test.exe
v1.capacity() = 1000000
v2.capacity() = 1000

After copy v1 = v2:
v1.capacity() = 1000000
v2.capacity() = 1000

[Copy-and-swap]

x1.GetV().capacity() = 1000000
x2.GetV().capacity() = 1000

After x1 = x2:
x1.GetV().capacity() = 1000
x2.GetV().capacity() = 1000

This is the output without copy-and-swap idiom (note how in this case x1.GetV().capacity() = 1000000, as expected):

C:\TEMP\CppTests>cl /EHsc /W4 /nologo test.cpp
test.cpp

C:\TEMP\CppTests>test.exe
v1.capacity() = 1000000
v2.capacity() = 1000

After copy v1 = v2:
v1.capacity() = 1000000
v2.capacity() = 1000

[Copy-op= and move-op=]

x1.GetV().capacity() = 1000000
x2.GetV().capacity() = 1000

After x1 = x2:
x1.GetV().capacity() = 1000000
x2.GetV().capacity() = 1000

Compilable sample code follows (tested with VS2010 SP1/VC10):

#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;

class X
{
public:
    X()
    {
    }

    explicit X(const size_t initialCapacity)
    {
        m_v.reserve(initialCapacity);
    }

    X(const X& other)
        : m_v(other.m_v)
    {
    }

    X(X&& other)
        : m_v(move(other.m_v))
    {
    }

    void SetV(const vector<double>& v)
    {
        m_v = v;
    }

    const vector<double>& GetV() const
    {
        return m_v;
    }

#ifdef TEST_COPY_AND_SWAP     
    //
    // Implement a unified op= with copy-and-swap idiom.
    //

    X& operator=(X other)
    {
        swap(*this, other);       
        return *this;
    }

    friend void swap(X& lhs, X& rhs)
    {
        using std::swap;

        swap(lhs.m_v, rhs.m_v);
    }    
#else    
    //
    // Implement copy op= and move op= separately.
    //

    X& operator=(const X& other)
    {
        if (this != &other)
        {
            m_v = other.m_v;
        }
        return *this;
    }

    X& operator=(X&& other)
    {
        if (this != &other)
        {
            m_v = move(other.m_v);
        }
        return *this;
    }    
#endif

private:
    vector<double> m_v;
};    

// Test vector assignment from a small vector to a vector with big capacity.
void Test1()
{
    vector<double> v1;
    v1.reserve(1000*1000);

    vector<double> v2(1000);

    cout << "v1.capacity() = " << v1.capacity() << '\n';
    cout << "v2.capacity() = " << v2.capacity() << '\n';

    v1 = v2;
    cout << "\nAfter copy v1 = v2:\n";    
    cout << "v1.capacity() = " << v1.capacity() << '\n';
    cout << "v2.capacity() = " << v2.capacity() << '\n';
}    

// Similar to Test1, but now vector is a data member inside a class.
void Test2()
{
#ifdef TEST_COPY_AND_SWAP 
    cout << "[Copy-and-swap]\n\n";
#else
    cout << "[Copy-op= and move-op=]\n\n";
#endif

    X x1(1000*1000);

    vector<double> v2(1000);
    X x2;
    x2.SetV(v2);

    cout << "x1.GetV().capacity() = " << x1.GetV().capacity() << '\n';
    cout << "x2.GetV().capacity() = " << x2.GetV().capacity() << '\n';

    x1 = x2;
    cout << "\nAfter x1 = x2:\n";
    cout << "x1.GetV().capacity() = " << x1.GetV().capacity() << '\n';
    cout << "x2.GetV().capacity() = " << x2.GetV().capacity() << '\n';
}

int main()
{
    Test1();       
    cout << '\n';    
    Test2();
}
Oratorical answered 3/3, 2013 at 17:3 Comment(3)
@juanchopanza: Identical.Icelander
@juanchopanza: It doesn't make any difference. Note also that the typical pattern of copy-and-swap idiom is "free" swap().Oratorical
@Oratorical You're right, I completely forgot about the specializations of std::swap.Catharine
I
12

Copy-and-swap with a std::vector can indeed lead to performance loss. The main issue here is that copying a std::vector involves two distinct stages:

  1. Allocate new section of memory
  2. Copy stuff in.

Copy-and-swap can eliminate #2 but not #1. Consider what you would observe prior to the swap() call but after the assignment op is entered. You have three vectors- the one which is about to be overwritten, the one which is a copy, and the original argument.

This clearly implies that if the vector which is about to be overwritten had sufficient or excess capacity, there's a waste in the creation of the intermediate vector, and a loss in the extra capacity of the source. Other containers can behave this way as well.

Copy-and-swap is a great baseline, especially when it comes to exception safety, but it's not globally the highest-performant solution. If you're in a tight area, then other more specialized implementations can be more efficient- but be warned, exception-safety in this area is non-trivial, and sometimes impossible if not copy-and-swapped.

Icelander answered 3/3, 2013 at 17:18 Comment(6)
I think @better urbanite nailed the problem with his "Assignment preserves capacity. swap swaps capacity." statement. You have a good point on exception-safety: in fact, I can't figure out a different way to get strong exception-safety without copy-and-swap. So, should I choose between strong exception-safety with copy-and-swap or mutually exclusive top performance with manually implementing copy operator= and move operator= (but giving up strong exception-safety)? Is it not possible to have both?Oratorical
@Mr.C64: You can have both, but if there is more than one possibly-throwing-assignment member in your class, it is basically impossible to have strong exception safety for operator= without copy and swap.Icelander
So, if I have more than one vector data member in my class, I have to choose between preserving capacity implementing both copy operator= and move operator= separately, or having strong exception-safety with copy-and-swap (and not preserving capacity)?Oratorical
@Mr.C64: You can still fall-back to copy-and-swap in the copy case, and use assignment in the move case.Icelander
But if I fall-back to copy-and-swap in the copy case, I have the inefficiency of not preserving capacity...Oratorical
@Mr.C64: Yes, but you still have capacity-preservation in the move case. That's as good as you're going to get for a copy where multiple distinct assignments can throw.Icelander
B
5

In the X case, you are swapping the vectors, not using vector::operator=(). Assignment preserves capacity. swap swaps capacity.

Bowens answered 3/3, 2013 at 17:19 Comment(0)
A
2

If there are two vectors, say v1 with big capacity and v2 with small capacity, and v2 is copied to v1 (v1 = v2), the big capacity in v1 is kept after the assignment; this makes sense,

It doesn't to me.

After an assignment I expect the assigned-to vector to have the same value and state as the vector is assigned from. Why should I incur and have to drag around excess capacity.

From a quick scan of the standard I am not sure that the standard guarantees that capacity is kept constant across assignment from a smaller vector. (It would be kept across an invocation of vector::assign(...), so that may be the intent.)

If I care about memory efficiency, I have to call vector::shrink_to_fit() after assignment in many cases, if the assignment doesn't do this for me.

Copy and swap has shrink-to-fit semantics. Actually it was the usual C++98 idiom for shrink-to-fit of standard containers.

since next v1.push_back() calls don't have to force new reallocations (in other words: freeing already available memory, then reallocating it to grow the vector doesn't make much sense).

True, but that depends on your usage patterns. If you assign vectors and then continue to add to them, keeping any pre-existing capacity makes sense. If you assign a vector after you have built its contents, you may not want to keep excess capacity allocated.

But, if the same assignment is done with the class having the vector as data member, the behavior is different, and after the assignment the bigger capacity is not kept.

True, if you do copy and swap in that class. Doing that will also copy and swap contained vectors, and as mentioned above that is a way to achieve shrink to fit.

If the copy-and-swap idiom is not used, and copy operator= and move operator= are implemented separately, then the behavior is as expected (as for ordinary non-member vectors).

As discussed above: it is debatable whether that behavior is as expected.

But if it fits your usage patterns, i.e. if you want to continue to grow a vector after it was assigned from another that may have been smaller than the prior value, then you can indeed gain some efficiency by using something that does not drop existing excess capacity (for example vector::assign).

Why is that? Should we not follow copy-and-swap idiom and instead implement operator=(const X& other) (copy op=) and operator=(X&& other) (move op=) separately for optimum performance?

As discussed, if it fits your usage pattern and if performance of that assign and append sequence is critical, then you could indeed consider not using swap and copy for assignment. The main purpose of swap and copy is minimal implementation (avoidance of duplicated code) and strong exception safety.

If you opt for a different implementation for maximum performance, you'll have to take care of exception safety yourself and you'll pay a price in code complexity.

Adalbert answered 3/3, 2013 at 22:7 Comment(1)
I somewhat disagree, because you can always shrink_to_fit later, but you can't unshrink_to_fit.Icelander

© 2022 - 2024 — McMap. All rights reserved.