C++ accumulator library with ability to remove old samples
Asked Answered
S

3

7

In Boost.Accumulator, you can add samples to an accumulator and then extract statistical quantities from it. e.g:

acc(1.)
acc(2.)
acc(3.)
cout << mean; // 2

The library has lots of more complicated statistical quantities, such as skewness, kurtosis, or p_square_cumulative_distribution.

What I'd like to do is something like this:

acc(1.)
acc(2.)
acc(3.)
std::cout << mean(acc); // 2
acc.pop() // withdraw the first value (1.)
std::cout << mean(acc); // 2.5

pop() would work in a FIFO (First In First Out) manner. What I'm trying to do is calculate stats stuffs on my data in an on-line (incremental) fashion within a sliding time-window.

The accumulator would have to internally keep all the values.

I could do my own but I always like to check first for existing libraries, and there may be algorithm I'm not aware of that smartly compute the quantities when data is incoming or outgoing.

Server answered 26/9, 2012 at 0:51 Comment(3)
Boost accumulators do not store any data point values. That's the whole point. Thus you cannot "pop" off one data point. Imagine you have the max of 0, 0, 0, 5: How would you "pop off" the 5?Clamber
@KerrekSB I understand Boost does not do what I want, thus my question. Also, it does save the values in some cases, e.g. rolling_sum. Finally, in the rolling sum instance but probably for lots of other cases, you do have to keep all values but you don't have to use them all to compute the new quantity.Server
I know you're only looking to clear specific values but I think it's worth mentioning that you can clear the entire accumulator using acc = {}Precocity
P
11

Since you mentioned "sliding time window", one option is to use a rolling mean (there is also rolling sum and rolling count), which is the mean of the last N samples. Depending on your needs you could create separate accumulators with different window sizes.

typedef accumulator_set<double,
                stats<tag::rolling_mean>
                > my_accumulator;

my_accumulator acc(tag::rolling_window::window_size = 3);
acc(1.);
acc(2.);
acc(3.);
std::cout << rolling_mean(acc);
// Reset accumulator and use different window size
acc = my_accumulator(tag::rolling_window::window_size = 2);
acc(2.);
acc(3.);
std::cout << rolling_mean(acc);

Also, if you look at the implementation of these, they use boost/circular_buffer.hpp.

Purpurin answered 26/9, 2012 at 3:19 Comment(0)
M
2

You probably want to store the data in a std::deque instead of a vector, so both your insertion and deletion can have constant complexity. If you use a vector, one will inevitably be linear instead.

Other than that, it's a pretty simple matter of applying an algorithm to the collection. Strangely, however, I don't know of a collection of such algorithms already written and tested, despite seeming like a fairly obvious set of algorithms to have available.

For what it's worth, it is fairly trivial to build an adapter to feed data from a collection into an accumulator to compute the statistic(s) you can about. In a few cases the accumulator probably has to do a bit of extra work to compute results incrementally, but I'd guess it's pretty rare to lose enough efficiency to care about.

Morningglory answered 26/9, 2012 at 2:48 Comment(1)
Having to feed an accumulator at each iteration wouldn't be very efficient so I guess I'll make my own solution. On your more general remark, it would indeed be useful if a library included such algorithm, all the more as this wikipedia page suggests there are ways to efficiently compute higher-order moments in an "on-line sliding window" fashion.Server
M
0

You'll probably need to just keep all of your samples in a vector, and then accumulate them from the vector for each calculation. Something like this: https://mcmap.net/q/55751/-calculate-mean-and-standard-deviation-from-a-vector-of-samples-in-c-using-boost

Mcinnis answered 26/9, 2012 at 1:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.