I'm no expert on these optimizations, but as I understand it the delayed evaluation techniques that you're talking about work by defining arithmetic operators on your matrix type such that for example A+B+C*D
doesn't return a matrix, it returns a proxy object that can convert to a matrix. This happens when it's assigned to M
, and the conversion code will compute each cell of the result matrix by the most efficient means the library designers can come up with, avoiding temporary matrix objects.
So, suppose the program contains M = A + B + C * D;
If you did nothing clever other than implement operator+
in the usual way using operator+=
, you'd get something like this once normal, C++03-style copy elision has kicked in:
Matrix tmp1 = C;
tmp1 *= D;
Matrix tmp2 = A;
tmp2 += B;
tmp2 += tmp1;
M = tmp2;
With the delayed evaluation, you might get something more like:
for (int i = 0; i < M.rows; ++i) {
for (int j = 0; j < M.cols; ++j) {
/* not necessarily the best matrix multiplication, but serves to illustrate */
c_times_d = 0;
for (int k = 0; k < C.cols; ++k) {
c_times_d += C[i][k] * D[k][j];
}
M[i][j] = A[i][j] + B[i][j] + c_times_d;
}
}
whereas the "nothing clever" code would do a couple of separate addition loops and a lot more assignment.
As far as I'm aware, move semantics doesn't help much in this case. Nothing in what you've written permits us to move from A
, B
, C
or D
, so we're going to end up with the equivalent of:
Matrix tmp1 = C;
tmp1 *= D;
Matrix tmp2 = A;
tmp2 += B;
tmp2 += std::move(tmp1);
M = std::move(tmp2);
So move semantics haven't helped with anything other than the last bit, where maybe the rvalue versions of the operators are better than the regular ones. There's more available if you wrote std::move(A) + std::move(B) + std::move(C) * std::move(D)
, because we wouldn't have to copy from C
or A
, but I still don't think the result is as good as the delayed-evaluation code.
Basically, move semantics don't help with some important parts of the optimization provided by delayed evaluation:
1) with delayed evaluation, the intermediate results never need to actually exist as complete matrices. Move semantics don't save the compiler from creating the complete matrix A+B
in memory at some point.
2) with delayed evaluation, we can start modifying M
before we've finished computing the whole expression. Move semantics don't help the compiler to reorder modifications: even if the compiler is smart enough to spot the potential opportunity, modifications to non-temporaries must be kept in their correct order if there's any danger of an exception being thrown, because if any part of A + B + C * D
throws, then M
must be left as it started.
A*B+C
expression, the expression template could optimize the computation by insterting FMA instructions, etc. – Unattached