Understanding the benefits of move semantics vs template metaprogramming

B

5

17

I've read some descriptions about move semantics in C++11 and I wonder in what context it could be used. Currently, many C++ math libraries use template metaprogramming to delay evaluation.

If M = A + B + C*D, where M, A, B, C and D are matrix, template metaprogramming allow to avoid useless copies. Is move semantics a more convenient manner to do these sort of things ?

If not, in what context move semantics is used. If yes, what are the difference/limitations compared to template metaprogramming for that kind of use ?

Bergin answered 23/5, 2012 at 17:54 Comment(4)

metaprogramming is quite the opposite of delaying evaluation. An interesting question, but a sample of your thoughts would be helpful. – Remorseful 23/5, 2012 at 17:58

<speculation> I think the difference between the two approaches wouldn't be large, but I think rvalue references would make the library slightly easier and cleaner. </speculation> – Jamilla 23/5, 2012 at 18:1

@MooingDuck: expression templates sometimes enable more than just eliminating copies. For instance in a A*B+C expression, the expression template could optimize the computation by insterting FMA instructions, etc. – Unattached 23/5, 2012 at 18:33

@AndréCaron: I'm aware of that, I was speculating that programming such a beast might be easier to code with rvalue references than TMP though. Howard clarified how rvalues turn out not to help in this case. – Jamilla 23/5, 2012 at 18:45

F

13

I'm no expert on these optimizations, but as I understand it the delayed evaluation techniques that you're talking about work by defining arithmetic operators on your matrix type such that for example A+B+C*D doesn't return a matrix, it returns a proxy object that can convert to a matrix. This happens when it's assigned to M, and the conversion code will compute each cell of the result matrix by the most efficient means the library designers can come up with, avoiding temporary matrix objects.

So, suppose the program contains M = A + B + C * D;

If you did nothing clever other than implement operator+ in the usual way using operator+=, you'd get something like this once normal, C++03-style copy elision has kicked in:

Matrix tmp1 = C;
tmp1 *= D;
Matrix tmp2 = A;
tmp2 += B;
tmp2 += tmp1;
M = tmp2;

With the delayed evaluation, you might get something more like:

for (int i = 0; i < M.rows; ++i) {
    for (int j = 0; j < M.cols; ++j) {
        /* not necessarily the best matrix multiplication, but serves to illustrate */
        c_times_d = 0;
        for (int k = 0; k < C.cols; ++k) {
            c_times_d += C[i][k] * D[k][j];
        }
        M[i][j] = A[i][j] + B[i][j] + c_times_d;
    }
}

whereas the "nothing clever" code would do a couple of separate addition loops and a lot more assignment.

As far as I'm aware, move semantics doesn't help much in this case. Nothing in what you've written permits us to move from A, B, C or D, so we're going to end up with the equivalent of:

Matrix tmp1 = C;
tmp1 *= D;
Matrix tmp2 = A;
tmp2 += B;
tmp2 += std::move(tmp1);
M = std::move(tmp2);

So move semantics haven't helped with anything other than the last bit, where maybe the rvalue versions of the operators are better than the regular ones. There's more available if you wrote std::move(A) + std::move(B) + std::move(C) * std::move(D), because we wouldn't have to copy from C or A, but I still don't think the result is as good as the delayed-evaluation code.

Basically, move semantics don't help with some important parts of the optimization provided by delayed evaluation:

1) with delayed evaluation, the intermediate results never need to actually exist as complete matrices. Move semantics don't save the compiler from creating the complete matrix A+B in memory at some point.

2) with delayed evaluation, we can start modifying M before we've finished computing the whole expression. Move semantics don't help the compiler to reorder modifications: even if the compiler is smart enough to spot the potential opportunity, modifications to non-temporaries must be kept in their correct order if there's any danger of an exception being thrown, because if any part of A + B + C * D throws, then M must be left as it started.

Flofloat answered 23/5, 2012 at 18:19 Comment(0)

H

34

I believe a more precise term for what you're calling "template metaprogramming" is expression templates.

If your matrices allocate their data dynamically, move semantics can help transfer that data from object to object (including to/from the temporaries) generated during an expression such as:

M = A + B + C*D

Expression templates, on the other hand, will eliminate the temporaries entirely.

If your matrices do not allocate their data dynamically (e.g. if they are fixed size and small), move semantics will not aid your performance at all.

The application of expression templates to a matrix library will result in the highest performance. It is also a very difficult implementation technique. Move semantics is much easier to implement, and can be done in addition to expression templates (if there are resources such as memory that can be transferred).

In summary:

Move semantics does not eliminate temporaries, but will transfer dynamically allocated memory among the temporaries instead of re-allocating it.

Expression templates eliminates the temporaries.

Hylan answered 23/5, 2012 at 18:12 Comment(0)

F

13

I'm no expert on these optimizations, but as I understand it the delayed evaluation techniques that you're talking about work by defining arithmetic operators on your matrix type such that for example A+B+C*D doesn't return a matrix, it returns a proxy object that can convert to a matrix. This happens when it's assigned to M, and the conversion code will compute each cell of the result matrix by the most efficient means the library designers can come up with, avoiding temporary matrix objects.

So, suppose the program contains M = A + B + C * D;

If you did nothing clever other than implement operator+ in the usual way using operator+=, you'd get something like this once normal, C++03-style copy elision has kicked in:

Matrix tmp1 = C;
tmp1 *= D;
Matrix tmp2 = A;
tmp2 += B;
tmp2 += tmp1;
M = tmp2;

With the delayed evaluation, you might get something more like:

for (int i = 0; i < M.rows; ++i) {
    for (int j = 0; j < M.cols; ++j) {
        /* not necessarily the best matrix multiplication, but serves to illustrate */
        c_times_d = 0;
        for (int k = 0; k < C.cols; ++k) {
            c_times_d += C[i][k] * D[k][j];
        }
        M[i][j] = A[i][j] + B[i][j] + c_times_d;
    }
}

whereas the "nothing clever" code would do a couple of separate addition loops and a lot more assignment.

As far as I'm aware, move semantics doesn't help much in this case. Nothing in what you've written permits us to move from A, B, C or D, so we're going to end up with the equivalent of:

Matrix tmp1 = C;
tmp1 *= D;
Matrix tmp2 = A;
tmp2 += B;
tmp2 += std::move(tmp1);
M = std::move(tmp2);

So move semantics haven't helped with anything other than the last bit, where maybe the rvalue versions of the operators are better than the regular ones. There's more available if you wrote std::move(A) + std::move(B) + std::move(C) * std::move(D), because we wouldn't have to copy from C or A, but I still don't think the result is as good as the delayed-evaluation code.

Basically, move semantics don't help with some important parts of the optimization provided by delayed evaluation:

1) with delayed evaluation, the intermediate results never need to actually exist as complete matrices. Move semantics don't save the compiler from creating the complete matrix A+B in memory at some point.

2) with delayed evaluation, we can start modifying M before we've finished computing the whole expression. Move semantics don't help the compiler to reorder modifications: even if the compiler is smart enough to spot the potential opportunity, modifications to non-temporaries must be kept in their correct order if there's any danger of an exception being thrown, because if any part of A + B + C * D throws, then M must be left as it started.

Flofloat answered 23/5, 2012 at 18:19 Comment(0)

P

2

They are two different beasts. Move semantics is about appropriating resources from a value that is going to be destroyed. When mixed with expression templates of say big ints (which need dynamic memory allocation), one would simply appropriate of such memory instead of making a copy of something that is about to be destroyed.

Move semantics are also important for objects that are inherently not copyable (like fstreams) but makes sense to make moveable.

Pekin answered 23/5, 2012 at 18:2 Comment(0)

Q

0

Move semantics is applicable to resources managed within objects and is used to avoid acquiring/releasing resources needlessly when temporary objects are created (e.e. a dynamically allocated memory is a resource).

Template metaprogramming operates on structures, which are allocated on the stack (as it needs compile time evaluation of the operands). You could use it to avoid runtime computation for operations that can be computed at compile time

Quash answered 23/5, 2012 at 18:2 Comment(0)

V

0

Move semantics are dynamic, expression templates are not. You cannot expression template an expression which is spread out over several statements and some of it is only evaluated if the moon is blue, whereas move semantics can.

Vertebral answered 23/5, 2012 at 19:1 Comment(0)

Recommended topics

Hot tags