Expression templates and C++11

Asked 4/8, 2012 at 13:32 Answered 18/10, 2013 at 10:13

Let's look at one particular benefit of expression templates: ETs can be used to avoid vector-sized temporaries in memory which occur in overloaded operators like:

template<typename T>
std::vector<T> operator+(const std::vector<T>& a, const std::vector<T>& b)
{
  std::vector<T> tmp;   // vector-sized temporary
  for_each(...);
  return tmp;
}

In C++11 the return statement of this function applies move semantics. No copy of the vector. That's a win.

However, if I look at a simple expression like

d = a + b + c;

I see that the above function gets called twice (for both operator+) while the final assignment can be done with move semantics.

In total 2 loops are executed. Means that I put out a temporary and read it back in right after. For big vectors this falls out of cache. That's worse than expression templates. They can do the whole thing in just 1 loop. ETs can execute the above code equivalent to:

for(int i=0 ; i < vec_length ; ++i)
  d[i] = a[i] + b[i] + c[i];

I was wondering whether lambdas together with move semantics or any other new feature can do as good as ETs. Any thoughts?

Edit:

Basically, using the ET technique the compiler builds a parse tree that resembles the algebraic expression with it's type system. This tree consists of inner nodes and leaf nodes. The inner nodes represent operations (addition, multiplication, etc.) and the leaf nodes represent references to the data objects.

I tried to think of the whole computation process in the fashion of a stack machine: Take an operation from an operation stack and pull the next arguments from the argument stack and evaluate the operation. Put the result back on the stack waiting for the operation.

To represent these two different objects (operation stack and data leaf stack) I bundled together a std::tuple for the operations and a std::tuple for the data leaves into a std::pair<>. Initially I used a std:vector but that resulted in runtime overhead.

The whole process goes in two phases: Stack machine initialisation where the operation and argument stack are initialised. And the evaluation phase which is triggered by assigning the paired containers to the vector.

I created a class Vec which holds a private array<int,5> (the payload) and which features an overloaded assignment operator that takes the "expression".

The global operator* is overloaded for all combinations of taking Vec and "expression" to enable the correct handling also in the case where we have more than just a*b. (Notice, I switched for this educational example to the multiplication - basically to quickly spot the imull in the assembler.)

What is done first before the evaluation starts is "extracting" the values out of the involved Vec objects and initializing the argument stack. That was necessary to not have different kinds of objects lying around: Indexable vectors and non-indexable results. This is what the Extractor is for. Nice thing again: Variadic templates are used which in this case results in no run-time overhead (all this is done at compile time).

The whole thing works. The expression is nicely evaluated (I also added the addition, but that is left out here to fit the code). Below you can see the assembler output. Just raw compuation, exactly as you want it to be: En-par with ET technique.

Upshot. The new language features of C++11 offer the variadic templates which (along with template meta-programming) open up the area of compile time computation. I showed here how the benefits of variadic templates can be used to produce code as good as with the traditional ET technique.

#include<algorithm>
#include<iostream>
#include<vector>
#include<tuple>
#include<utility>
#include<array>



template<typename Target,typename Tuple, int N, bool end>
struct Extractor {
  template < typename ... Args >
  static Target index(int i,const Tuple& t, Args && ... args)
  {
    return Extractor<Target, Tuple,  N+1, 
             std::tuple_size<Tuple>::value == N+1>::
      index(i, t , std::forward<Args>(args)..., std::get<N>(t).vec[i] );
  }
};

template < typename Target, typename Tuple, int N >
struct Extractor<Target,Tuple,N,true>
{
    template < typename ... Args >
    static Target index(int i,Tuple const& t, 
            Args && ... args) { 
      return Target(std::forward<Args>(args)...); }
};

template < typename ... Vs > 
std::tuple<typename std::remove_reference<Vs>::type::type_t...>
extract(int i , const std::tuple<Vs...>& tpl)
{
  return Extractor<std::tuple<typename std::remove_reference<Vs>::type::type_t...>,
           std::tuple<Vs...>, 0,
           std::tuple_size<std::tuple<Vs...> >::value == 0>::index(i,tpl);
}


struct Vec {
  std::array<int,5> vec;
  typedef int type_t;

  template<typename... OPs,typename... VALs>
  Vec& operator=(const std::pair< std::tuple<VALs...> , std::tuple<OPs...> >& e) {
    for( int i = 0 ; i < vec.size() ; ++i ) {
      vec[i] = eval( extract(i,e.first) , e.second );
    }
  }
};




template<int OpPos,int ValPos, bool end>
struct StackMachine {
  template<typename... OPs,typename... VALs>
  static void eval_pos( std::tuple<VALs...>& vals , const std::tuple<OPs...> & ops )
  {
    std::get<ValPos+1>( vals ) =
      std::get<OpPos>(ops).apply( std::get<ValPos>( vals ) , 
                  std::get<ValPos+1>( vals ) );
    StackMachine<OpPos+1,ValPos+1,sizeof...(OPs) == OpPos+1>::eval_pos(vals,ops);
  }
};

template<int OpPos,int ValPos>
struct StackMachine<OpPos,ValPos,true> {
  template<typename... OPs,typename... VALs>
  static void eval_pos( std::tuple<VALs...>& vals , 
            const std::tuple<OPs...> & ops )
  {}
};



template<typename... OPs,typename... VALs>
int eval( const std::tuple<VALs...>& vals , const std::tuple<OPs...> & ops )
{
  StackMachine<0,0,false>::eval_pos(const_cast<std::tuple<VALs...>&>(vals),ops);
  return std::get<sizeof...(OPs)>(vals);
}




struct OpMul {
  static int apply(const int& lhs,const int& rhs)  {
    return lhs*rhs;
  }
};

std::pair< std::tuple< const Vec&, const Vec& > , std::tuple<OpMul> >
operator*(const Vec& lhs,const Vec& rhs)
{
  return std::make_pair( std::tuple< const Vec&, const Vec& >( lhs , rhs ) , 
             std::tuple<OpMul>( OpMul() ) );
}

template<typename... OPs,typename... VALs>
std::pair< std::tuple< const Vec&, VALs... > , std::tuple<OPs...,OpMul> >
operator*(const Vec& lhs,const std::pair< std::tuple< VALs... > , std::tuple<OPs...> >& rhs)
{
  return std::make_pair( std::tuple_cat( rhs.first , std::tuple< const Vec& >(lhs)  ) , 
             std::tuple_cat( rhs.second , std::tuple<OpMul>( OpMul() )  ) );
}

template<typename... OPs,typename... VALs>
std::pair< std::tuple< const Vec&, VALs... > , std::tuple<OPs...,OpMul> >
operator*(const std::pair< std::tuple< VALs... > , std::tuple<OPs...> >& lhs,
      const Vec& rhs)
{
  return std::make_pair( std::tuple_cat( lhs.first , std::tuple< const Vec& >(rhs)  ) , 
             std::tuple_cat( lhs.second , std::tuple<OpMul>( OpMul() ) ) );
}

int main()
{
  Vec d,c,b,a;


  for( int i = 0 ; i < d.vec.size() ; ++i ) {
    a.vec[i] = 10+i;
    b.vec[i] = 20+i;
    c.vec[i] = 30+i;
    d.vec[i] = 0;
  }

  d = a * b * c * a;

  for( int i = 0 ; i < d.vec.size() ; ++i ) 
    std::cout << d.vec[i] << std::endl;
}

Assembler generated with g++-4.6 -O3 (I had to put some runtime dependence into the vector initialization so that the compiler doesn't calculate the whole thing at compile time and you actually see the imull instaructions.)

imull   %esi, %edx
imull   32(%rsp), %edx
imull   %edx, %esi
movl    68(%rsp), %edx
imull   %ecx, %edx
movl    %esi, (%rsp)
imull   36(%rsp), %edx
imull   %ecx, %edx
movl    104(%rsp), %ecx
movl    %edx, 4(%rsp)
movl    72(%rsp), %edx
imull   %ecx, %edx
imull   40(%rsp), %edx
imull   %ecx, %edx
movl    108(%rsp), %ecx
movl    %edx, 8(%rsp)
movl    76(%rsp), %edx
imull   %ecx, %edx
imull   44(%rsp), %edx
imull   %ecx, %edx
movl    112(%rsp), %ecx
movl    %edx, 12(%rsp)
movl    80(%rsp), %edx
imull   %ecx, %edx
imull   %eax, %edx
imull   %ecx, %edx
movl    %edx, 16(%rsp)

Mulberry answered 4/8, 2012 at 13:32 Comment(11)

You should look up copy elision and RVO. Also, passing one of the vectors by value instead of making your own tmp copy could help. – Kosygin 4/8, 2012 at 13:39

Passing by value can't help (a copy is still made). (N)RVO doesn't help eliminating the additional loop – Mulberry 4/8, 2012 at 13:43

@Frank: are you sure? This is a strong candidate for RVO. – Very 4/8, 2012 at 13:47

Closely related stackoverflow question and answers are here: #10725799 – Earthenware 4/8, 2012 at 14:0

@HowardHinnant: it is so close that this question a semi-duplicate of the one you linked. – Very 4/8, 2012 at 14:6

How about this: liveworkspace.org/code/b7e444dfc8e46c5f0fd49ca0d98ea9c0 ? It's not as nice as operator+, but it works(it still requires some bounds checking, probably). – Folkway 4/8, 2012 at 14:19

@Folkway Thanks! But its not as powerful as ETs. There is one operation applied to all elements. – Mulberry 4/8, 2012 at 14:24

@Very Regards the thread: I read it. The issue is not answered to a satisfactory level. Also, as far as I know there is no ET library that operates on arbitrarily large matrices. – Mulberry 4/8, 2012 at 14:29

I don't think that you can achieve the power of expression templates without them.... unless the compiler succeeds somehow, but I would not hold my breath. C++ aliasing issues make it somewhat unlikely that the compiler would be able to achieve the transformations ET do on its own. – Schedule 4/8, 2012 at 17:56

@HowardHinnant You're right it is very close, but, they are distinct and therefore it is not a duplicate (even though similar material is in both). The linked article asks if move semantics can elide the copies and what are the differences with template metaprogramming. This post asks if C++11 features can do as good as ETs --which is distinct. I state no in my answer to this question. – Hypotenuse 4/8, 2012 at 22:41

@PaulPreney: Agreed. If I thought the linked question was a dupe, I would have said so and voted to close this one. I was just trying to help spread some relevant information. – Earthenware 4/8, 2012 at 23:56

I was wondering whether lambdas together with move semantics or any other new feature can do as good as ETs. Any thoughts?

Quick Answer

Move semantics are not a total panacea on their own --techniques such as expression templates (ETs) are still needed in C++11 to eliminate overheads such as moving data around! So, to answer your question quickly before diving into the rest of my answer, move semantics, etc. doesn't completely replace ETs as my answer illustrates below.

Detailed Answer

ETs typically return proxy objects to defer evaluation until later, so there is no immediate apparent benefit of C++11 language features until the code that triggers the computation. That said, one would not want to write ET code, however, that triggers run-time code generation during the building of the expression tree with the proxies. Nicely, C++11's move semantics and perfect forwarding can help avoid such overheads should that otherwise occur. (Such would not have been possible in C++03.)

Essentially, when writing ETs one wants to exploit the language features in a way to generate optimal code once the member function(s) of the involved proxy objects are invoked. In C++11 this will include using perfect forwarding, move semantics over copying, etc. if such is actually still needed over and above what the compiler can already do. The name of the game is to minimize the run-time code generated and/or maximize the run-time speed and/or minimize the run-time overhead.

I wanted to actually try some ETs with C++11 features to see if I could elide ALL intermediate temporary instance types with a a = b + c + d; expression. (As this was just a fun break from my normal activities so I did not compare it to or write ET code purely using C++03. Also I did not worry about all aspects of code polishing that appears below.)

To start with, I did not use lambdas --as I preferred to use explicit types and functions-- so I won't argue for/against lambdas with respect to your question. My guess is that they would be similar to using functors and performing no better than the non-ET code below (i.e., moves would be required) --at least until compilers can automatically optimize lambdas using their own internal ETs for such. The code I wrote, however, exploits move semantics and perfect forwarding. Here's what I did starting with the results and then finally presenting the code.

I created a math_vector<N> class where N==3 and it defines an internal private instance of std::array<long double, N>. The members are a default constructor, copy and move constructors and assignments, an initializer list constructor, a destructor, a swap() member, operator [] to access elements of the vector and operator +=. Used without any expression templates, this code:

{
  cout << "CASE 1:\n";
  math_vector<3> a{1.0, 1.1, 1.2};
  math_vector<3> b{2.0, 2.1, 2.2};
  math_vector<3> c{3.0, 3.1, 3.2};
  math_vector<3> d{4.0, 4.1, 4.2};
  math_vector<3> result = a + b + c + d;
  cout << '[' << &result << "]: " << result << "\n";
}

outputs (when compiled with clang++ 3.1 or g++ 4.8 with -std=c++11 -O3):

CASE 1:
0x7fff8d6edf50: math_vector(initlist)
0x7fff8d6edef0: math_vector(initlist)
0x7fff8d6ede90: math_vector(initlist)
0x7fff8d6ede30: math_vector(initlist)
0x7fff8d6edd70: math_vector(copy: 0x7fff8d6edf50)
0x7fff8d6edda0: math_vector(move: 0x7fff8d6edd70)
0x7fff8d6eddd0: math_vector(move: 0x7fff8d6edda0)
0x7fff8d6edda0: ~math_vector()
0x7fff8d6edd70: ~math_vector()
[0x7fff8d6eddd0]: (10,10.4,10.8)
0x7fff8d6eddd0: ~math_vector()
0x7fff8d6ede30: ~math_vector()
0x7fff8d6ede90: ~math_vector()
0x7fff8d6edef0: ~math_vector()
0x7fff8d6edf50: ~math_vector()

i.e., the four explicit constructed instances using initializer lists (i.e., the initlist items), the result variable (i.e., 0x7fff8d6eddd0), and, also makes an additional three objects copying and moving.

To only focus on temporaries and moving, I created a second case that only creates result as a named variable --all others are rvalues:

{
  cout << "CASE 2:\n";
  math_vector<3> result =
    math_vector<3>{1.0, 1.1, 1.2} +
    math_vector<3>{2.0, 2.1, 2.2} +
    math_vector<3>{3.0, 3.1, 3.2} +
    math_vector<3>{4.0, 4.1, 4.2}
  ;
  cout << '[' << &result << "]: " << result << "\n";
}

which outputs this (again when ETs are NOT used):

CASE 2:
0x7fff8d6edcb0: math_vector(initlist)
0x7fff8d6edc50: math_vector(initlist)
0x7fff8d6edce0: math_vector(move: 0x7fff8d6edcb0)
0x7fff8d6edbf0: math_vector(initlist)
0x7fff8d6edd10: math_vector(move: 0x7fff8d6edce0)
0x7fff8d6edb90: math_vector(initlist)
0x7fff8d6edd40: math_vector(move: 0x7fff8d6edd10)
0x7fff8d6edb90: ~math_vector()
0x7fff8d6edd10: ~math_vector()
0x7fff8d6edbf0: ~math_vector()
0x7fff8d6edce0: ~math_vector()
0x7fff8d6edc50: ~math_vector()
0x7fff8d6edcb0: ~math_vector()
[0x7fff8d6edd40]: (10,10.4,10.8)
0x7fff8d6edd40: ~math_vector()

which is better: only extra move objects are created.

But I wanted better: I wanted zero extra temporaries and to have the code as if I hard-coded it with the one normal coding caveat: all explicitly instantiated types would still be created (i.e., the four initlist constructors and result). To accomplish this I then added expression template code as follows:

a proxy math_vector_expr<LeftExpr,BinaryOp,RightExpr> class was created to hold an expression not computed yet,
a proxy plus_op class was created to hold the addition operation,
a constructor was added to math_vector to accept a math_vector_expr object, and,
"starter" member functions were added to trigger the creation of the expression template.

The results using ETs are wonderful: no extra temporaries in either case! The previous two cases above now output:

CASE 1:
0x7fffe7180c60: math_vector(initlist)
0x7fffe7180c90: math_vector(initlist)
0x7fffe7180cc0: math_vector(initlist)
0x7fffe7180cf0: math_vector(initlist)
0x7fffe7180d20: math_vector(expr: 0x7fffe7180d90)
[0x7fffe7180d20]: (10,10.4,10.8)
0x7fffe7180d20: ~math_vector()
0x7fffe7180cf0: ~math_vector()
0x7fffe7180cc0: ~math_vector()
0x7fffe7180c90: ~math_vector()
0x7fffe7180c60: ~math_vector()

CASE 2:
0x7fffe7180dd0: math_vector(initlist)
0x7fffe7180e20: math_vector(initlist)
0x7fffe7180e70: math_vector(initlist)
0x7fffe7180eb0: math_vector(initlist)
0x7fffe7180d20: math_vector(expr: 0x7fffe7180dc0)
0x7fffe7180eb0: ~math_vector()
0x7fffe7180e70: ~math_vector()
0x7fffe7180e20: ~math_vector()
0x7fffe7180dd0: ~math_vector()
[0x7fffe7180d20]: (10,10.4,10.8)
0x7fffe7180d20: ~math_vector()

i.e., exactly 5 constructor calls and 5 destructor calls in each case. In fact, if you ask the compiler to generate the assembler code between the 4 initlist constructor calls and the outputting of result one gets this beautiful string of assembler code:

fldt    128(%rsp)
leaq    128(%rsp), %rdi
leaq    80(%rsp), %rbp
fldt    176(%rsp)
faddp   %st, %st(1)
fldt    224(%rsp)
faddp   %st, %st(1)
fldt    272(%rsp)
faddp   %st, %st(1)
fstpt   80(%rsp)
fldt    144(%rsp)
fldt    192(%rsp)
faddp   %st, %st(1)
fldt    240(%rsp)
faddp   %st, %st(1)
fldt    288(%rsp)
faddp   %st, %st(1)
fstpt   96(%rsp)
fldt    160(%rsp)
fldt    208(%rsp)
faddp   %st, %st(1)
fldt    256(%rsp)
faddp   %st, %st(1)
fldt    304(%rsp)
faddp   %st, %st(1)
fstpt   112(%rsp)

with g++ and clang++ outputs similar (even smaller) code. No function calls, etc. --just a bunch of adds which is EXACTLY what one wants!

The C++11 code to achieve this follows. Simply #define DONT_USE_EXPR_TEMPL to not use ETs or don't define it at all to use ETs.

#include <array>
#include <algorithm>
#include <initializer_list>
#include <type_traits>
#include <iostream>

//#define DONT_USE_EXPR_TEMPL

//===========================================================================

template <std::size_t N> class math_vector;

template <
  typename LeftExpr,
  typename BinaryOp,
  typename RightExpr
>
class math_vector_expr
{
  public:
    math_vector_expr() = delete;

    math_vector_expr(LeftExpr l, RightExpr r) : 
      l_(std::forward<LeftExpr>(l)), 
      r_(std::forward<RightExpr>(r))
    {
    }

    // Prohibit copying...
    math_vector_expr(math_vector_expr const&) = delete;
    math_vector_expr& operator =(math_vector_expr const&) = delete;

    // Allow moves...
    math_vector_expr(math_vector_expr&&) = default;
    math_vector_expr& operator =(math_vector_expr&&) = default;

    template <typename RE>
    auto operator +(RE&& re) const ->
      math_vector_expr<
        math_vector_expr<LeftExpr,BinaryOp,RightExpr> const&,
        BinaryOp,
        decltype(std::forward<RE>(re))
      >
    {
      return 
        math_vector_expr<
          math_vector_expr<LeftExpr,BinaryOp,RightExpr> const&,
          BinaryOp,
          decltype(std::forward<RE>(re))
        >(*this, std::forward<RE>(re))
      ;
    }

    auto le() -> 
      typename std::add_lvalue_reference<LeftExpr>::type
      { return l_; }

    auto le() const ->
      typename std::add_lvalue_reference<
        typename std::add_const<LeftExpr>::type
      >::type
      { return l_; }

    auto re() -> 
      typename std::add_lvalue_reference<RightExpr>::type
      { return r_; }

    auto re() const -> 
      typename std::add_lvalue_reference<
        typename std::add_const<RightExpr>::type
      >::type
      { return r_; }

    auto operator [](std::size_t index) const ->
      decltype(
        BinaryOp::apply(this->le()[index], this->re()[index])
      )
    {
      return BinaryOp::apply(le()[index], re()[index]);
    }

  private:
    LeftExpr l_;
    RightExpr r_;
};

//===========================================================================

template <typename T>
struct plus_op
{
  static T apply(T const& a, T const& b)
  {
    return a + b;
  }

  static T apply(T&& a, T const& b)
  {
    a += b;
    return std::move(a);
  }

  static T apply(T const& a, T&& b)
  {
    b += a;
    return std::move(b);
  }

  static T apply(T&& a, T&& b)
  {
    a += b;
    return std::move(a);
  }
};

//===========================================================================

template <std::size_t N>
class math_vector
{
  using impl_type = std::array<long double, N>;

  public:
    math_vector()
    {
      using namespace std;
      fill(begin(v_), end(v_), impl_type{});
      std::cout << this << ": math_vector()" << endl;
    }

    math_vector(math_vector const& mv) noexcept
    {
      using namespace std;
      copy(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector(copy: " << &mv << ")" << endl;
    }

    math_vector(math_vector&& mv) noexcept
    {
      using namespace std;
      move(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector(move: " << &mv << ")" << endl;
    }

    math_vector(std::initializer_list<typename impl_type::value_type> l)
    {
      using namespace std;
      copy(begin(l), end(l), begin(v_));
      std::cout << this << ": math_vector(initlist)" << endl;
    }

    math_vector& operator =(math_vector const& mv) noexcept
    {
      using namespace std;
      copy(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector op =(copy: " << &mv << ")" << endl;
      return *this;
    }

    math_vector& operator =(math_vector&& mv) noexcept
    {
      using namespace std;
      move(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector op =(move: " << &mv << ")" << endl;
      return *this;
    }

    ~math_vector()
    {
      using namespace std;
      std::cout << this << ": ~math_vector()" << endl;
    }

    void swap(math_vector& mv)
    {
      using namespace std;
      for (std::size_t i = 0; i<N; ++i)
        swap(v_[i], mv[i]);
    }

    auto operator [](std::size_t index) const
      -> typename impl_type::value_type const&
    {
      return v_[index];
    }

    auto operator [](std::size_t index)
      -> typename impl_type::value_type&
    {
      return v_[index];
    }

    math_vector& operator +=(math_vector const& b)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] += b[i];
      return *this;
    }

  #ifndef DONT_USE_EXPR_TEMPL

    template <typename LE, typename Op, typename RE>
    math_vector(math_vector_expr<LE,Op,RE>&& mve)
    {
      for (std::size_t i = 0; i < N; ++i)
        v_[i] = mve[i];
      std::cout << this << ": math_vector(expr: " << &mve << ")" << std::endl;
    }

    template <typename RightExpr>
    math_vector& operator =(RightExpr&& re)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] = re[i];
      return *this;
    }

    template <typename RightExpr>
    math_vector& operator +=(RightExpr&& re)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] += re[i];
      return *this;
    }

    template <typename RightExpr>
    auto operator +(RightExpr&& re) const ->
      math_vector_expr<
        math_vector const&, 
        plus_op<typename impl_type::value_type>,
        decltype(std::forward<RightExpr>(re))
      >
    {
      return 
        math_vector_expr<
          math_vector const&, 
          plus_op<typename impl_type::value_type>, 
          decltype(std::forward<RightExpr>(re))
        >(
          *this, 
          std::forward<RightExpr>(re)
        )
      ;
    }

  #endif // #ifndef DONT_USE_EXPR_TEMPL

  private:
    impl_type v_;
};

//===========================================================================

template <std::size_t N>
inline void swap(math_vector<N>& a, math_vector<N>& b)
{
  a.swap(b);
}

//===========================================================================

#ifdef DONT_USE_EXPR_TEMPL

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N> const& a, 
  math_vector<N> const& b
)
{
  math_vector<N> retval(a);
  retval += b;
  return retval;
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N>&& a, 
  math_vector<N> const& b
)
{
  a += b;
  return std::move(a);
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N> const& a, 
  math_vector<N>&& b
)
{
  b += a;
  return std::move(b);
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N>&& a, 
  math_vector<N>&& b
)
{
  a += std::move(b);
  return std::move(a);
}

#endif // #ifdef DONT_USE_EXPR_TEMPL

//===========================================================================

template <std::size_t N>
std::ostream& operator <<(std::ostream& os, math_vector<N> const& mv)
{
  os << '(';
  for (std::size_t i = 0; i < N; ++i)
    os << mv[i] << ((i+1 != N) ? ',' : ')');
  return os;
}

//===========================================================================

int main()
{
  using namespace std;

  try
  {
    {
      cout << "CASE 1:\n";
      math_vector<3> a{1.0, 1.1, 1.2};
      math_vector<3> b{2.0, 2.1, 2.2};
      math_vector<3> c{3.0, 3.1, 3.2};
      math_vector<3> d{4.0, 4.1, 4.2};
      math_vector<3> result = a + b + c + d;
      cout << '[' << &result << "]: " << result << "\n";
    }
    cout << endl;
    {
      cout << "CASE 2:\n";
      math_vector<3> result =
        math_vector<3>{1.0, 1.1, 1.2} +
        math_vector<3>{2.0, 2.1, 2.2} +
        math_vector<3>{3.0, 3.1, 3.2} +
        math_vector<3>{4.0, 4.1, 4.2}
      ;
      cout << '[' << &result << "]: " << result << "\n";
    }
  }
  catch (...)
  {
    return 1;
  }
}

//===========================================================================

Hypotenuse answered 4/8, 2012 at 22:33 Comment(7)

Thanks Paul for your answer. In your code you applied nicely the expression template technique salted with new C++11 techniques. Great work. I like the overloaded operation classes. And yes, I agree that lambdas would basically render to the same performance as having named structs. I tried to circumvent the ET technique, if you fancy the question is extended by my code. – Mulberry 5/8, 2012 at 17:41

You're welcome! Your question is also very relevant. The new additions (especially variadic templates, move semantics, and perfect forwarding) actually has us all taking steps back to readjust and do things in a simpler way again --more like when we started coding! Perhaps for too long we've all been optimizing to effectively bypass the compiler. With C++11 this is no longer needed as well-thought out/simpler code yields better/optimal performance just by letting the compiler do what it does. So with C++11, I try to remind myself to resist temptation to optimize but to rethink more simply. – Hypotenuse 5/8, 2012 at 18:44

@Frank: I think this answer is ready to be accepted, don't you? – Lyallpur 12/2, 2013 at 14:14

your usage of std::move, e.g. static T apply(T&& a, T const& b) { a += b; return std::move(a); } doesn't seem correct to me. See: thbecker.net/articles/rvalue_references/section_06.html and https://mcmap.net/q/15702/-what-is-move-semantics , section Moving out of functions. – Sharpedged 25/9, 2013 at 19:38

The linked to thbecker.net page is different than this. On the thbecker.net page X x; is a declared locally within the function --one should simply do return x; (and if RVO does not apply, if it can be moved, the compiler will move it, or it will copy it). With my function you've quoted a is passed in as an rvalue reference. When an rvalue reference (it is not an lvalue reference due to the other overload with T const&) is passed as an argument, it acquires a name and becomes an lvalue --yet it still refers to an rvalue. Thus, return std::move(a); makes a an rvalue again to return it. – Hypotenuse 1/10, 2013 at 22:20

I know, I wasn't convinced this can improve anything, but here is the confirmation I was looking for: cpp-next.com/archive/2009/09/making-your-next-move PS. My edit has been rejected for some ridiculous reason, the code in its current shape is simply incorrect; blind reviewers. Act, Paul. – Sharpedged 2/10, 2013 at 10:4

Expression templates + perfect forwarding + auto is actually a recipe for disaster due to dangling references all over the place :/ – Subdiaconate 11/11, 2015 at 19:50

Here is the corrected version of Paul Preney's code. I have informed the author by email and in comments; I have written an edit, but it has been rejected by unqualified reviewers.

The error in the original code is that BinaryOp template parameter of math_vector_expr::operator+ is fixed.

#include <array>
#include <algorithm>
#include <initializer_list>
#include <type_traits>
#include <iostream>

//#define DONT_USE_EXPR_TEMPL

//===========================================================================

template <std::size_t N> class math_vector;
template <typename T> struct plus_op;


template <
  typename LeftExpr,
  typename BinaryOp,
  typename RightExpr
>
class math_vector_expr
{
  public:
    typedef typename std::remove_reference<LeftExpr>::type::value_type value_type;

    math_vector_expr() = delete;

    math_vector_expr(LeftExpr l, RightExpr r) :
      l_(std::forward<LeftExpr>(l)),
      r_(std::forward<RightExpr>(r))
    {
    }

    // Prohibit copying...
    math_vector_expr(math_vector_expr const&) = delete;
    math_vector_expr& operator =(math_vector_expr const&) = delete;

    // Allow moves...
    math_vector_expr(math_vector_expr&&) = default;
    math_vector_expr& operator =(math_vector_expr&&) = default;

    template <typename RE>
    auto operator +(RE&& re) const ->
      math_vector_expr<
        math_vector_expr<LeftExpr,BinaryOp,RightExpr> const&,
        plus_op<value_type>,
        decltype(std::forward<RE>(re))
      >
    {
      return
        math_vector_expr<
          math_vector_expr<LeftExpr,BinaryOp,RightExpr> const&,
          plus_op<value_type>,
          decltype(std::forward<RE>(re))
        >(*this, std::forward<RE>(re))
      ;
    }

    auto le() ->
      typename std::add_lvalue_reference<LeftExpr>::type
      { return l_; }

    auto le() const ->
      typename std::add_lvalue_reference<
        typename std::add_const<LeftExpr>::type
      >::type
      { return l_; }

    auto re() ->
      typename std::add_lvalue_reference<RightExpr>::type
      { return r_; }

    auto re() const ->
      typename std::add_lvalue_reference<
        typename std::add_const<RightExpr>::type
      >::type
      { return r_; }

    auto operator [](std::size_t index) const ->
      value_type
    {
      return BinaryOp::apply(le()[index], re()[index]);
    }

  private:
    LeftExpr l_;
    RightExpr r_;
};

//===========================================================================

template <typename T>
struct plus_op
{
  static T apply(T const& a, T const& b)
  {
    return a + b;
  }

  static T apply(T&& a, T const& b)
  {
    a += b;
    return std::move(a);
  }

  static T apply(T const& a, T&& b)
  {
    b += a;
    return std::move(b);
  }

  static T apply(T&& a, T&& b)
  {
    a += b;
    return std::move(a);
  }
};

//===========================================================================

template <std::size_t N>
class math_vector
{
  using impl_type = std::array<long double, N>;

  public:
    typedef typename impl_type::value_type value_type;

    math_vector()
    {
      using namespace std;
      fill(begin(v_), end(v_), impl_type{});
      std::cout << this << ": math_vector()" << endl;
    }

    math_vector(math_vector const& mv) noexcept
    {
      using namespace std;
      copy(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector(copy: " << &mv << ")" << endl;
    }

    math_vector(math_vector&& mv) noexcept
    {
      using namespace std;
      move(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector(move: " << &mv << ")" << endl;
    }

    math_vector(std::initializer_list<value_type> l)
    {
      using namespace std;
      copy(begin(l), end(l), begin(v_));
      std::cout << this << ": math_vector(initlist)" << endl;
    }

    math_vector& operator =(math_vector const& mv) noexcept
    {
      using namespace std;
      copy(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector op =(copy: " << &mv << ")" << endl;
      return *this;
    }

    math_vector& operator =(math_vector&& mv) noexcept
    {
      using namespace std;
      move(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector op =(move: " << &mv << ")" << endl;
      return *this;
    }

    ~math_vector()
    {
      using namespace std;
      std::cout << this << ": ~math_vector()" << endl;
    }

    void swap(math_vector& mv)
    {
      using namespace std;
      for (std::size_t i = 0; i<N; ++i)
        swap(v_[i], mv[i]);
    }

    auto operator [](std::size_t index) const
      -> value_type const&
    {
      return v_[index];
    }

    auto operator [](std::size_t index)
      -> value_type&
    {
      return v_[index];
    }

    math_vector& operator +=(math_vector const& b)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] += b[i];
      return *this;
    }

  #ifndef DONT_USE_EXPR_TEMPL

    template <typename LE, typename Op, typename RE>
    math_vector(math_vector_expr<LE,Op,RE>&& mve)
    {
      for (std::size_t i = 0; i < N; ++i)
        v_[i] = mve[i];
      std::cout << this << ": math_vector(expr: " << &mve << ")" << std::endl;
    }

    template <typename RightExpr>
    math_vector& operator =(RightExpr&& re)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] = re[i];
      return *this;
    }

    template <typename RightExpr>
    math_vector& operator +=(RightExpr&& re)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] += re[i];
      return *this;
    }

    template <typename RightExpr>
    auto operator +(RightExpr&& re) const ->
      math_vector_expr<
        math_vector const&,
        plus_op<value_type>,
        decltype(std::forward<RightExpr>(re))
      >
    {
      return
        math_vector_expr<
          math_vector const&,
          plus_op<value_type>,
          decltype(std::forward<RightExpr>(re))
        >(
          *this,
          std::forward<RightExpr>(re)
        )
      ;
    }

  #endif // #ifndef DONT_USE_EXPR_TEMPL

  private:
    impl_type v_;
};

//===========================================================================

template <std::size_t N>
inline void swap(math_vector<N>& a, math_vector<N>& b)
{
  a.swap(b);
}

//===========================================================================

#ifdef DONT_USE_EXPR_TEMPL

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N> const& a,
  math_vector<N> const& b
)
{
  math_vector<N> retval(a);
  retval += b;
  return retval;
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N>&& a,
  math_vector<N> const& b
)
{
  a += b;
  return std::move(a);
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N> const& a,
  math_vector<N>&& b
)
{
  b += a;
  return std::move(b);
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N>&& a,
  math_vector<N>&& b
)
{
  a += std::move(b);
  return std::move(a);
}

#endif // #ifdef DONT_USE_EXPR_TEMPL

//===========================================================================

template <std::size_t N>
std::ostream& operator <<(std::ostream& os, math_vector<N> const& mv)
{
  os << '(';
  for (std::size_t i = 0; i < N; ++i)
    os << mv[i] << ((i+1 != N) ? ',' : ')');
  return os;
}

//===========================================================================

int main()
{
  using namespace std;

  try
  {
    {
      cout << "CASE 1:\n";
      math_vector<3> a{1.0, 1.1, 1.2};
      math_vector<3> b{2.0, 2.1, 2.2};
      math_vector<3> c{3.0, 3.1, 3.2};
      math_vector<3> d{4.0, 4.1, 4.2};
      math_vector<3> result = a + b + c + d;
      cout << '[' << &result << "]: " << result << "\n";
    }
    cout << endl;
    {
      cout << "CASE 2:\n";
      math_vector<3> result =
        math_vector<3>{1.0, 1.1, 1.2} +
        math_vector<3>{2.0, 2.1, 2.2} +
        math_vector<3>{3.0, 3.1, 3.2} +
        math_vector<3>{4.0, 4.1, 4.2}
      ;
      cout << '[' << &result << "]: " << result << "\n";
    }
  }
  catch (...)
  {
    return 1;
  }
}

//===========================================================================

Sharpedged answered 18/10, 2013 at 10:13 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags