Is there a way to calculate mean and standard deviation for a vector containing samples using Boost?
Or do I have to create an accumulator and feed the vector into it?
Is there a way to calculate mean and standard deviation for a vector containing samples using Boost?
Or do I have to create an accumulator and feed the vector into it?
Using accumulators is the way to compute means and standard deviations in Boost.
accumulator_set<double, stats<tag::variance> > acc;
for_each(a_vec.begin(), a_vec.end(), bind<void>(ref(acc), _1));
cout << mean(acc) << endl;
cout << sqrt(variance(acc)) << endl;
I don't know if Boost has more specific functions, but you can do it with the standard library.
Given std::vector<double> v
, this is the naive way:
#include <numeric>
double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();
double sq_sum = std::inner_product(v.begin(), v.end(), v.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size() - mean * mean);
This is susceptible to overflow or underflow for huge or tiny values. A slightly better way to calculate the standard deviation is:
double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();
std::vector<double> diff(v.size());
std::transform(v.begin(), v.end(), diff.begin(),
std::bind2nd(std::minus<double>(), mean));
double sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size());
UPDATE for C++11:
The call to std::transform
can be written using a lambda function instead of std::minus
and std::bind2nd
(now deprecated):
std::transform(v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; });
mean
calculated in the top part. –
Hertzog (v.size() - 1)
for v.size()
in the last line above: std::sqrt(sq_sum / (v.size() - 1))
. (For the first method, it's a little complicated: std::sqrt(sq_sum / (v.size() - 1) - mean * mean * v.size() / (v.size() - 1))
. –
Hertzog <cmath>
, <functional>
, <vector>
, <algorithm>
, etc. –
Hertzog std::inner_product
for sum of squares is very neat. –
Marilee -fopenmp
is added? or is a for loop better for the accumulations? –
Nicoline double stdev = std::sqrt( sq_sum / (v.size()-1) );
which could make quite a difference for small N. –
Sleek n
vs n-1
. This returns the population standard dev, so if this detail matters, make sure to adjust the code accordingly –
Legion If performance is important to you, and your compiler supports lambdas, the stdev calculation can be made faster and simpler: In tests with VS 2012 I've found that the following code is over 10 X quicker than the Boost code given in the chosen answer; it's also 5 X quicker than the safer version of the answer using standard libraries given by musiphil.
Note I'm using sample standard deviation, so the below code gives slightly different results (Why there is a Minus One in Standard Deviations)
double sum = std::accumulate(std::begin(v), std::end(v), 0.0);
double m = sum / v.size();
double accum = 0.0;
std::for_each (std::begin(v), std::end(v), [&](const double d) {
accum += (d - m) * (d - m);
});
double stdev = sqrt(accum / (v.size()-1));
std::end()
function was added by C++11 standard for cases when there is nothing like v.end()
. The std::end
can be overloaded for the less standard container -- see en.cppreference.com/w/cpp/iterator/end –
Paphian Using accumulators is the way to compute means and standard deviations in Boost.
accumulator_set<double, stats<tag::variance> > acc;
for_each(a_vec.begin(), a_vec.end(), bind<void>(ref(acc), _1));
cout << mean(acc) << endl;
cout << sqrt(variance(acc)) << endl;
second moment - squared mean
which will produce incorrect result if variance is very small because of rounding errors. It can actually produce negative variance. –
Halftrack Improving on the answer by musiphil, you can write a standard deviation function without the temporary vector diff
, just using a single inner_product
call with the C++11 lambda capabilities:
double stddev(std::vector<double> const & func)
{
double mean = std::accumulate(func.begin(), func.end(), 0.0) / func.size();
double sq_sum = std::inner_product(func.begin(), func.end(), func.begin(), 0.0,
[](double const & x, double const & y) { return x + y; },
[mean](double const & x, double const & y) { return (x - mean)*(y - mean); });
return std::sqrt(sq_sum / func.size() - 1);
}
I suspect doing the subtraction multiple times is cheaper than using up additional intermediate storage, and I think it is more readable, but I haven't tested the performance yet.
As for explanation why to use N-1 (as in func.size() - 1
), see these questions - note how the question states we have a "vector containing samples".
It seems the following elegant recursive solution has not been mentioned, although it has been around for a long time. Referring to Knuth's Art of Computer Programming,
mean_1 = x_1, variance_1 = 0; //initial conditions; edge case;
//for k >= 2,
mean_k = mean_k-1 + (x_k - mean_k-1) / k;
variance_k = variance_k-1 + (x_k - mean_k-1) * (x_k - mean_k);
then for a list of n>=2
values, the estimate of the standard deviation is:
stddev = std::sqrt(variance_n / (n-1)).
Hope this helps!
My answer is similar as Josh Greifer but generalised to sample covariance. Sample variance is just sample covariance but with the two inputs identical. This includes Bessel's correlation.
template <class Iter> typename Iter::value_type cov(const Iter &x, const Iter &y)
{
double sum_x = std::accumulate(std::begin(x), std::end(x), 0.0);
double sum_y = std::accumulate(std::begin(y), std::end(y), 0.0);
double mx = sum_x / x.size();
double my = sum_y / y.size();
double accum = 0.0;
for (auto i = 0; i < x.size(); i++)
{
accum += (x.at(i) - mx) * (y.at(i) - my);
}
return accum / (x.size() - 1);
}
2x faster than the versions before mentioned - mostly because transform() and inner_product() loops are joined. Sorry about my shortcut/typedefs/macro: Flo = float. CR const ref. VFlo - vector. Tested in VS2010
#define fe(EL, CONTAINER) for each (auto EL in CONTAINER) //VS2010
Flo stdDev(VFlo CR crVec) {
SZ n = crVec.size(); if (n < 2) return 0.0f;
Flo fSqSum = 0.0f, fSum = 0.0f;
fe(f, crVec) fSqSum += f * f; // EDIT: was Cit(VFlo, crVec) {
fe(f, crVec) fSum += f;
Flo fSumSq = fSum * fSum;
Flo fSumSqDivN = fSumSq / n;
Flo fSubSqSum = fSqSum - fSumSqDivN;
Flo fPreSqrt = fSubSqSum / (n - 1);
return sqrt(fPreSqrt);
}
for( float f : crVec ) { fSqSum += f * f; fSum += f; }
? –
Towe In order to calculate the sample mean with a better presicion the following r-step recursion can be used:
mean_k=1/k*[(k-r)*mean_(k-r) + sum_over_i_from_(n-r+1)_to_n(x_i)],
where r is chosen to make summation components closer to each other.
Create your own container:
template <class T>
class statList : public std::list<T>
{
public:
statList() : std::list<T>::list() {}
~statList() {}
T mean() {
return accumulate(begin(),end(),0.0)/size();
}
T stddev() {
T diff_sum = 0;
T m = mean();
for(iterator it= begin(); it != end(); ++it)
diff_sum += ((*it - m)*(*it -m));
return diff_sum/size();
}
};
It does have some limitations, but it works beautifully when you know what you are doing.
//means deviation in c++
/A deviation that is a difference between an observed value and the true value of a quantity of interest (such as a population mean) is an error and a deviation that is the difference between the observed value and an estimate of the true value (such an estimate may be a sample mean) is a residual. These concepts are applicable for data at the interval and ratio levels of measurement./
#include <iostream>
#include <conio.h>
using namespace std;
/* run this program using the console pauser or add your own getch, system("pause") or input loop */
int main(int argc, char** argv)
{
int i,cnt;
cout<<"please inter count:\t";
cin>>cnt;
float *num=new float [cnt];
float *s=new float [cnt];
float sum=0,ave,M,M_D;
for(i=0;i<cnt;i++)
{
cin>>num[i];
sum+=num[i];
}
ave=sum/cnt;
for(i=0;i<cnt;i++)
{
s[i]=ave-num[i];
if(s[i]<0)
{
s[i]=s[i]*(-1);
}
cout<<"\n|ave - number| = "<<s[i];
M+=s[i];
}
M_D=M/cnt;
cout<<"\n\n Average: "<<ave;
cout<<"\n M.D(Mean Deviation): "<<M_D;
getch();
return 0;
}
© 2022 - 2024 — McMap. All rights reserved.
second moment - squared mean
which will produce incorrect result if variance is very small because of rounding errors. It can actually produce negative variance. – Halftrack