Calculating moving average in C++

Asked 15/12, 2011 at 11:55 Answered 15/12, 2011 at 22:38

I am trying to calculate the moving average of a signal. The signal value ( a double ) is updated at random times. I am looking for an efficient way to calculate it's time weighted average over a time window, in real time. I could do it my self, but it is more challenging than I thought.

Most of the resources I've found over the internet are calculating moving average of periodical signal, but mine updates at random time.

Does anyone know good resources for that ?

Thanks

Sheikdom answered 15/12, 2011 at 11:55 Comment(7)

What have you got so far? How do you know it is inefficient? – Cloverleaf 15/12, 2011 at 11:59

Interesting question, but being tagged C++ I expect to see the code you have. Right now, all I can say is you have to find a way to interpolate between given data points in input, and base your algorithm on a given timewindow and number of samples. – Kristlekristo 15/12, 2011 at 11:59

This may or may not be useful in your context, but an exponential moving average might be a suitable alternative to a fixed-window one. It's very easy to compute recursively. – Microcopy 15/12, 2011 at 12:0

It's also very cheap (O(1)) to compute a fixed-window moving average if your data-type is an integer. – Cloverleaf 15/12, 2011 at 12:3

This means the moving average value will use a varying number of values, depending on how much data entered during the time window. This will have impact on the variance of the moving average estimator. I concur with @aix in that an exponentially weighted MA sounds better without other information. – Piggish 15/12, 2011 at 12:26

since the weight function is unknown (different time intervals) you won't be able to compute the moving average on the fly without retaining the last N values and compute the weighted average each time. – Petree 15/12, 2011 at 12:50

Related for exponential moving average: #1024360 – Limicolous 15/12, 2011 at 13:31

The trick is the following: You get updates at random times via void update(int time, float value). However you also need to also track when an update falls off the time window, so you set an "alarm" which called at time + N which removes the previous update from being ever considered again in the computation.

If this happens in real-time you can request the operating system to make a call to a method void drop_off_oldest_update(int time) to be called at time + N

If this is a simulation, you cannot get help from the operating system and you need to do it manually. In a simulation you would call methods with the time supplied as an argument (which does not correlate with real time). However, a reasonable assumption is that the calls are guaranteed to be such that the time arguments are increasing. In this case you need to maintain a sorted list of alarm time values, and for each update and read call you check if the time argument is greater than the head of the alarm list. While it is greater you do the alarm related processing (drop off the oldest update), remove the head and check again until all alarms prior to the given time are processed. Then do the update call.

I have so far assumed it is obvious what you would do for the actual computation, but I will elaborate just in case. I assume you have a method float read (int time) that you use to read the values. The goal is to make this call as efficient as possible. So you do not compute the moving average every time the read method is called. Instead you precompute the value as of the last update or the last alarm, and "tweak" this value by a couple of floating point operations to account for the passage of time since the last update. (i. e. a constant number of operations except for perhaps processing a list of piled up alarms).

Hopefully this is clear -- this should be a quite simple algorithm and quite efficient.

Further optimization: one of the remaining problems is if a large number of updates happen within the time window, then there is a long time for which there are neither reads nor updates, and then a read or update comes along. In this case, the above algorithm will be inefficient in incrementally updating the value for each of the updates that is falling off. This is not necessary because we only care about the last update beyond the time window so if there is a way to efficiently drop off all older updates, it would help.

To do this, we can modify the algorithm to do a binary search of updates to find the most recent update before the time window. If there are relatively few updates that needs to be "dropped" then one can incrementally update the value for each dropped update. But if there are many updates that need to be dropped then one can recompute the value from scratch after dropping off the old updates.

Appendix on Incremental Computation: I should clarify what I mean by incremental computation above in the sentence "tweak" this value by a couple of floating point operations to account for the passage of time since the last update. Initial non-incremental computation:

start with

sum = 0; 
updates_in_window = /* set of all updates within window */; 
prior_update' = /* most recent update prior to window with timestamp tweaked to window beginning */; 
relevant_updates = /* union of prior_update' and updates_in_window */,

then iterate over relevant_updates in order of increasing time:

for each update EXCEPT last { 
    sum += update.value * time_to_next_update; 
},

and finally

moving_average = (sum + last_update * time_since_last_update) / window_length;.

Now if exactly one update falls off the window but no new updates arrive, adjust sum as:

sum -= prior_update'.value * time_to_next_update + first_update_in_last_window.value * time_from_first_update_to_new_window_beginning;

(note it is prior_update' which has its timestamp modified to start of last window beginning). And if exactly one update enters the window but no new updates fall off, adjust sum as:

sum += previously_most_recent_update.value * corresponding_time_to_next_update.

As should be obvious, this is a rough sketch but hopefully it shows how you can maintain the average such that it is O(1) operations per update on an amortized basis. But note further optimization in previous paragraph. Also note stability issues alluded to in an older answer, which means that floating point errors may accumulate over a large number of such incremental operations such that there is a divergence from the result of the full computation that is significant to the application.

Glider answered 15/12, 2011 at 18:24 Comment(0)

If an approximation is OK and there's a minimum time between samples, you could try super-sampling. Have an array that represents evenly spaced time intervals that are shorter than the minimum, and at each time period store the latest sample that was received. The shorter the interval, the closer the average will be to the true value. The period should be no greater than half the minimum or there is a chance of missing a sample.

Cicely answered 15/12, 2011 at 18:12 Comment(0)

#include <map>
#include <iostream>

// Sample - the type of a single sample
// Date - the type of a time notation
// DateDiff - the type of difference of two Dates    
template <class Sample, class Date, class DateDiff = Date>
class TWMA {
private:
  typedef std::map<Date, Sample> qType;
  const DateDiff windowSize; // The time width of the sampling window
  qType samples; // A set of sample/date pairs
  Sample average; // The answer

public:

  // windowSize - The time width of the sampling window
  TWMA(const DateDiff& windowSize) : windowSize(windowSize), average(0) {}

  // Call this each time you receive a sample
  void
  Update(const Sample& sample, const Date& now) {
    // First throw away all old data
    Date then(now - windowSize);
    samples.erase(samples.begin(), samples.upper_bound(then));

    // Next add new data
    samples[now] = sample;

    // Compute average: note: this could move to Average(), depending upon
    // precise user requirements.
    Sample sum = Sample();
    for(typename qType::iterator it = samples.begin();
        it != samples.end();
        ++it) {
      DateDiff duration(it->first - then);
      sum += duration * it->second;
      then = it->first;
    }
    average = sum / windowSize;
  }

  // Call this when you need the answer.
  const Sample& Average() { return average; }

};

int main () {
  TWMA<double, int> samples(10);

  samples.Update(1, 1);
  std::cout << samples.Average() << "\n"; // 1
  samples.Update(1, 2);
  std::cout << samples.Average() << "\n"; // 1
  samples.Update(1, 3);
  std::cout << samples.Average() << "\n"; // 1
  samples.Update(10, 20);
  std::cout << samples.Average() << "\n"; // 10
  samples.Update(0, 25);
  std::cout << samples.Average() << "\n"; // 5
  samples.Update(0, 30);
  std::cout << samples.Average() << "\n"; // 0
}

Asperity answered 15/12, 2011 at 22:38 Comment(2)

Thanks for the answer. One improvement that would be needed it to actually "cache" the value of the total average so we don't loop all the time. Also, it may be a minor point, but would it not be more efficient to use a deque or a list to store the value, since we assume that update will come in the right order. Insertion would be faster than in the map. – Sheikdom 16/12, 2011 at 8:55

Yes, you could cache the value of sum. Subtract the values of the samples you erase, add the values of the samples you insert. Also, yes, a deque<pair<Sample,Date>> might be more efficient. I chose map for readability, and the ease of invoking map::upper_bound. As always, write correct code first, then profile and measure incremental changes. – Kahle 16/12, 2011 at 15:0

Note: Apparently this is not the way to approach this. Leaving it here for reference on what is wrong with this approach. Check the comments.

UPDATED - based on Oli's comment... not sure about the instability that he is talking about though.

Use a sorted map of "arrival times" against values. Upon arrival of a value add the arrival time to the sorted map along with it's value and update the moving average.

warning this is pseudo-code:

SortedMapType< int, double > timeValueMap;

void onArrival(double value)
{
    timeValueMap.insert( (int)time(NULL), value);
}

//for example this runs every 10 seconds and the moving window is 120 seconds long
void recalcRunningAverage()
{
    // you know that the oldest thing in the list is 
    // going to be 129.9999 seconds old
    int expireTime = (int)time(NULL) - 120;
    int removeFromTotal = 0;
    MapIterType i;
    for( i = timeValueMap.begin();
    (i->first < expireTime || i != end) ; ++i )
    {
    }

    // NOW REMOVE PAIRS TO LEFT OF i

    // Below needs to apply your time-weighting to the remaining values
    runningTotal = calculateRunningTotal(timeValueMap); 
    average = runningTotal/timeValueMap.size();
}

There... Not fully fleshed out but you get the idea.

Things to note: As I said the above is pseudo code. You'll need to choose an appropriate map. Don't remove the pairs as you iterate through as you will invalidate the iterator and will have to start again.
See Oli's comment below also.

Rags answered 15/12, 2011 at 12:22 Comment(5)

This doesn't work: it doesn't take into account what proportion of the window-length each value exists for. Also, this approach of adding and then subtracting is only stable for integer types, not floats. – Cloverleaf 15/12, 2011 at 12:29

@OliCharlesworth - sorry I missed some key points in the description (double and time-weighted). I will update. Thanks. – Rags 15/12, 2011 at 12:33

The time-weighting is yet another problem. But that's not what I'm talking about. I was referring to the fact that when a new value first enters the time window, its contribution to the average is minimal. Its contribution continues to increase until a new value enters. – Cloverleaf 15/12, 2011 at 12:35

Shirely he can simply apply any algorithm he needs to the remaining values now? He has all the info he needs... the number of values, the value and their arrival times. – Rags 15/12, 2011 at 12:40

I don't think you can just divide the total by the count of events, you have to divide by the time span. Hopefully the weighting applied in calculateRunningTotal will account for this. – Cicely 16/12, 2011 at 0:4

Recommended topics

Hot tags