Algorithm for function evaluation by pairs (C++, STL)
Asked Answered
A

4

6

I need to apply a custom func to an STL containers by pairs -> that is:

// if c => {a,b,c,d,e,f,g}; // a,b,c,.. are just aliases for some object
my_algorithm(c.begin(),c.end(),[](auto a, auto b){ a + b }); // c++14

should resolve into something like this:

temp1 = a + b;
temp2 = c + d;
temp3 = e + f;
temp4 = temp1 + temp2;
temp5 = temp3 + g;
result = temp4 + temp5;

(I am sure this kind of algorithm has a proper name but I have no idea what this may be)

I have tried with std::accumulate, I am not sure if its implementation is defined by the standard, but in my case and with my compiler it seems to resolve to this (I think this is called pairwise summation, right?) :

temp1 = a + b;
temp2 = temp1 + c;
temp3 = temp2 + d;
// etc

which is more less the same I can get with

auto temp = c[0];
std::for_each(c.begin()+1,c.end(),[&temp](auto a){temp + a); // c++14

I browsed STL and Boost, but did not manage to find something relevant. Is there any library that provides such an algorithm? If not, any ideas for a good STL compliant implementation?

EDIT Just to add that I'm not really interested in adding the elements in the traditional sense - in that case order does not really matter. My function will do a more complex, weighted, kind of summation and will give different results if carried out this way. My question is more generic nevertheless.

Avent answered 9/2, 2016 at 20:39 Comment(13)
partial_sum() ? A bit unclear what result supposed to bePurser
std::accumulate is definitely defined by the standard to do a left fold. It seems that you want to bottom-up construct a balanced tree, which is possible but not in the standard library afaik. ("Construct a balanced tree" has different algorithms, depending on your definition of "balanced", which you only provide by a single example; I don't think that's precise enough for an implementation.)Marginal
Why don't you want to add them one after another?Freezedrying
Actually I know this kind of accumulation from CUDA, there it is called "parallel reduction"Freezedrying
I think there's no benefit to adding them this way unless you really will use multiple threads? Or if there is some floating-point issue you are concerned about, potentially... It's hard for me to tell how generally useful this kind of algorithm is, I don't think I would expect to find it in the standard librarySchweiz
I'm not really interested in adding them in the traditional sense. The function I pass in my case does a more complex, weighted, kind of summation and will give different results if carried out this way.Avent
Here's a naive and wasteful version. It doesn't take into account that the return type of func(a,b) may be different than func(temp3, g), and it keeps all the temporaries in memory the whole time...Atmospherics
It’s a long shot, but maybe this is an alternative worth investigating: are you in control of the data or the iteration? You are motivated in visiting a tree in a particular order, but all you have (for now) is the wrong order. Rather than 'rebuild' the tree from the iterators to fit your needs, can you tweak the program so that the data is correctly shaped to begin with, or so that the iterators visit your data in the order you want? Then you can fold/accumulate.Evaluate
@Luc, accumulate will always sum an element with another sum which is not what I need in this case regardless of the ordering.Avent
@severin not really, partial_sum does something different.Avent
@melak47, it's indeed a bit naive but maybe it's a startAvent
"Is there any library that provides such an algorithm?" "Questions asking us to recommend or find a ... software library ... are off-topic for Stack Overflow" But I'm not voting to close because the next part asks for a way to implement it, which is fine.Liew
@Andrian, sorry I know I should but I wan't aware of thatAvent
S
3

Here's my attempt at an STL-compatible solution, at C++11 standard:

#include <cassert>
#include <cmath>
#include <cstddef>

#include <array>
#include <iostream>
#include <iterator>

namespace detail {

  // Returns first power of two which is strictly less than n
  unsigned int pot_half(std::ptrdiff_t n) {
    assert(n > 1);
    return 1 << (static_cast<unsigned int>(ceil(log2(n))) - 1);
  }

} // end namespace detail

struct tree_fold_on_empty_range : std::exception {};

template <typename Iterator, typename F>
auto tree_fold(const Iterator & begin, const Iterator & end, F && func) -> decltype(func(*begin, *end)) {
  std::ptrdiff_t diff = end - begin;
  switch (diff) {
    case 0: throw tree_fold_on_empty_range{}; // or, return {}; ?
    case 1: return *begin;
    case 2: return func(*begin, *(begin + 1));
    default: {
      Iterator mid{begin};
      std::advance(mid, detail::pot_half(diff));
      return func(tree_fold(begin, mid, func), tree_fold(mid, end, func));
    }
  }
}

int main() {
  for (uint n = 2; n < 20; ++n) {
    std::cout << n << " -> " << detail::pot_half(n) << std::endl;
  }
  std::cout << std::endl;

  std::array<int, 8> test{1, 2, 3, 4, 5, 6, 7, 8};
  std::cout << tree_fold(test.begin(), test.end(), [](int a, int b){ return a + b; }) << std::endl;
  std::cout << tree_fold(test.begin(), test.end(), [](int a, int b){ return a - b; }) << std::endl;
}

Live on coliru also,

it gives this as the final output:

36
0

I believe this indicates that it is correct:

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 36
((1 - 2) - (3 - 4)) - ((5 - 6) - (7 - 8)) =
((-1) - (-1)) - ((-1) - (-1)) =
0 - 0 = 0

Note that the "correct" behavior on ranges that are not a power of two is a bit ambiguous. In my version what I do is always split a range of length n at the first power of two less than n. So if you give it a power of two, you always get a perfectly balanced binary tree. If you give it 6, you get something like this:

        /\
    /\       /\
  /\  /\

However there's nothing that says always dividing by two is not also correct, so that you would get a tree structure like this

        /\
    /\       /\
  /\       /\

So IMO your question is a bit underspecified. Maybe it doesn't matter to you, so long as the depth is O(log n)?

Schweiz answered 9/2, 2016 at 23:9 Comment(1)
Thanks for the example, it's really intriguing. I've not been necessarily looking for an algorithm that involves trees, so I haven't think of such details. I reckon it depends heavily on the situation: since we're talking about cases where the order of operations does matter, whether the behaviour on non-powers of two is acceptable is operation/case-dependent. It might work in my case tho, I have to run some tests..Avent
M
2

Since November 2015 I had been working in a so called VectorFuncRange container that resolvs that in a STL style in C++14.

I did my own beta version that works good to imitate a std::vector container but with a func_range() method that returns in O(log n) a function evaluation in a range, evaluating as a tree. I endorse that even evaluating as a tree internally they are just vectors and has O(1) random access, push_back in amortized O(1) and worst scenario O(log n) and etc. Some std::vector methods are not yet programmed by me, as emplace_back() and different constructs, but the main ones to use as a vector are. For test reasons I compare the rang_func() with range_func_dumb(), the second version evaluate the function in linear order.

VectorFuncRange.h my current version: http://pastebin.com/dnwznUqg A test code that do it in 5 different ways, with integers, matrix and other types and a lot of functions: http://pastebin.com/YdRfN0CQ

I had think about put in a public Git, but I guess I should organized more my code before that and I do not know if other people have interest to contribute.

Melisa answered 10/2, 2016 at 4:15 Comment(0)
K
1

You should take a look at the second form of std::transform : http://www.cplusplus.com/reference/algorithm/transform/

In pseudo-code near C++ 11, a STL implementation of your algorithm might look like this:

c = {a,b,c,d,e,f,g} // container of elements of type 'my_obj'
tmp = {a,b,c,d,e,f,g} // copy of 'c' to not impact 'c' while executing algorithm
while (tmp.size() > 1)
{
    // partition 'tmp' into even index elements 'c1' and odd index elements 'c2'
    // first iteration would look like this :
    // c1 = {a,c,e,g}
    // c2 = {b,d,f,identity} where 'idendity' is a new element (when 'tmp' size is odd) to match 'g' without impacting final result... identity = 0 for integers addition :)

    // overwrite first elements of 'tmp' with intermediate results
    std::transform(c1.cbegin(), c1.cend(), c2.cbegin(), tmp.begin(), std::plus<my_obj>()); // replace std::plus with any other binary operation including any proper lambda

    // cut 'tmp' ununsed upper half
    tmp.resize(size_t(0.5 * (tmp.size() + 1)));
}
my_obj result = tmp[0];

There is obviously a cost to copy 'c' at start and partition 'tmp' into two halves at each iteration. You decide how to optimize from here :)

Kickshaw answered 10/2, 2016 at 7:0 Comment(1)
thanks, it's a nice try but in my case a,b,c,d,e,f are rather costly to copy.. Maybe I can optimise somehow using some pointers, I might have a try and report.Avent
A
0

Considering a number of suggested solutions (particularly Chris Beck's) I came up with this algorithm which I'm now trying to further optimise. I've moved this to a different thread because the code opens up a number of issues that are worth discussing on their own respect I think.

Avent answered 10/2, 2016 at 15:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.