LINQ to calculate a moving average of a SortedList<dateTime,double>
Asked Answered
G

8

17

I have a time series in the form of a SortedList<dateTime,double>. I would like to calculate a moving average of this series. I can do this using simple for loops. I was wondering if there is a better way to do this using linq.

my version:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var mySeries = new SortedList<DateTime, double>();
            mySeries.Add(new DateTime(2011, 01, 1), 10);
            mySeries.Add(new DateTime(2011, 01, 2), 25);
            mySeries.Add(new DateTime(2011, 01, 3), 30);
            mySeries.Add(new DateTime(2011, 01, 4), 45);
            mySeries.Add(new DateTime(2011, 01, 5), 50);
            mySeries.Add(new DateTime(2011, 01, 6), 65);

            var calcs = new calculations();
            var avg = calcs.MovingAverage(mySeries, 3);
            foreach (var item in avg)
            {
                Console.WriteLine("{0} {1}", item.Key, item.Value);                
            }
        }
    }
    class calculations
    {
        public SortedList<DateTime, double> MovingAverage(SortedList<DateTime, double> series, int period)
        {
            var result = new SortedList<DateTime, double>();

            for (int i = 0; i < series.Count(); i++)
            {
                if (i >= period - 1)
                {
                    double total = 0;
                    for (int x = i; x > (i - period); x--)
                        total += series.Values[x];
                    double average = total / period;
                    result.Add(series.Keys[i], average);  
                }

            }
            return result;
        }
    }
}
Gannes answered 2/3, 2011 at 11:17 Comment(2)
I would test it out before moving over to LINQ. Usually a simple hand written for-loop will beat LINQ in performance.Hint
After testing this, the hand coded non-Linq solution was indeed a better(read faster) solutionGannes
T
7

You already have an answer showing you how you can use LINQ but frankly I wouldn't use LINQ here as it will most likely perform poorly compared to your current solution and your existing code already is clear.

However instead of calculating the total of the previous period elements on every step, you can keep a running total and adjust it on each iteration. That is, change this:

total = 0;
for (int x = i; x > (i - period); x--)
    total += series.Values[x];

to this:

if (i >= period) {
    total -= series.Values[i - period];
}
total += series.Values[i];

This will mean that your code will take the same amount of time to execute regardless of the size of period.

Tribe answered 2/3, 2011 at 11:23 Comment(3)
This is not really answering the question. The OP wants to know how to do it in Linq.Paleozoology
In my opinion, don't use LINQ is a valid answer to the question. LINQ is wonderful but it is the wrong tool here.Tribe
Actually, I really just wanted to know how to do it well. That said. at a later date, I may pull these values directly from an SQL DB. In this case, an all LINQ solution MAY be better. I will bench mark them to see which is faster.Gannes
R
19

In order to achieve an asymptotical performance of O(n) (as the hand-coded solution does), you could use the Aggregate function like in

series.Skip(period-1).Aggregate(
  new {
    Result = new SortedList<DateTime, double>(), 
    Working = List<double>(series.Take(period-1).Select(item => item.Value))
  }, 
  (list, item)=>{
     list.Working.Add(item.Value); 
     list.Result.Add(item.Key, list.Working.Average()); 
     list.Working.RemoveAt(0);
     return list;
  }
).Result;

The accumulated value (implemented as anonymous type) contains two fields: Result contains the result list build up so far. Working contains the last period-1 elements. The aggregate function adds the current value to the Working list, builds the current average and adds it to the result and then removes the first (i.e. oldest) value from the working list.

The "seed" (i.e. the starting value for the accumulation) is build by putting the first period-1 elements into Working and initializing Result to an empty list.

Consequently tha aggregation starts with element period (by skipping (period-1) elements at the beginning)

In functional programming this is a typical usage pattern for the aggretate (or fold) function, btw.

Two remarks:

The solution is not "functionally" clean in that the same list objects (Working and Result) are reused in every step. I'm not sure if that might cause problems if some future compilers try to parallellize the Aggregate function automatically (on the other hand I'm also not sure, if that's possible after all...). A purely functional solution should "create" new lists at every step.

Also note that C# lacks powerful list expressions. In some hypothetical Python-C#-mixed pseudocode one could write the aggregation function like

(list, item)=>
  new {
    Result = list.Result + [(item.Key, (list.Working+[item.Value]).Average())], 
    Working=list.Working[1::]+[item.Value]
  }

which would be a bit more elegant in my humble opinion :)

Reflex answered 3/3, 2011 at 0:48 Comment(0)
G
13

For the most efficient way possible to compute a Moving Average with LINQ, you shouldn't use LINQ!

Instead I propose creating a helper class which computes a moving average in the most efficient way possible (using a circular buffer and causal moving average filter), then an extension method to make it accessible to LINQ.

First up, the moving average

public class MovingAverage
{
    private readonly int _length;
    private int _circIndex = -1;
    private bool _filled;
    private double _current = double.NaN;
    private readonly double _oneOverLength;
    private readonly double[] _circularBuffer;
    private double _total;

    public MovingAverage(int length)
    {
        _length = length;
        _oneOverLength = 1.0 / length;
        _circularBuffer = new double[length];
    }       

    public MovingAverage Update(double value)
    {
        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Maintain totals for Push function
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled)
        {
            _current = double.NaN;
            return this;
        }

        // Compute the average
        double average = 0.0;
        for (int i = 0; i < _circularBuffer.Length; i++)
        {
            average += _circularBuffer[i];
        }

        _current = average * _oneOverLength;

        return this;
    }

    public MovingAverage Push(double value)
    {
        // Apply the circular buffer
        if (++_circIndex == _length)
        {
            _circIndex = 0;
        }

        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Compute the average
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled && _circIndex != _length - 1)
        {
            _current = double.NaN;
            return this;
        }
        else
        {
            // Set a flag to indicate this is the first time the buffer has been filled
            _filled = true;
        }

        _current = _total * _oneOverLength;

        return this;
    }

    public int Length { get { return _length; } }
    public double Current { get { return _current; } }
}

This class provides a very fast and lightweight implementation of a MovingAverage filter. It creates a circular buffer of Length N and computes one add, one subtract and one multiply per data-point appended, as opposed to the N multiply-adds per point for the brute force implementation.

Next, to LINQ-ify it!

internal static class MovingAverageExtensions
{
    public static IEnumerable<double> MovingAverage<T>(this IEnumerable<T> inputStream, Func<T, double> selector, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(selector(item));
            yield return ma.Current;
        }
    }

    public static IEnumerable<double> MovingAverage(this IEnumerable<double> inputStream, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(item);
            yield return ma.Current;
        }
    }
}

The above extension methods wrap the MovingAverage class and allow insertion into an IEnumerable stream.

Now to use it!

int period = 50;

// Simply filtering a list of doubles
IEnumerable<double> inputDoubles;
IEnumerable<double> outputDoubles = inputDoubles.MovingAverage(period);   

// Or, use a selector to filter T into a list of doubles
IEnumerable<Point> inputPoints; // assuming you have initialised this
IEnumerable<double> smoothedYValues = inputPoints.MovingAverage(pt => pt.Y, period);
Gerstner answered 14/4, 2014 at 22:1 Comment(2)
Thanks, the mightly for-loop laughs at the .Zip.Scan.Select(Tuple) approach!Gerstner
A few years later, but really, a solid approach.Toreutics
T
7

You already have an answer showing you how you can use LINQ but frankly I wouldn't use LINQ here as it will most likely perform poorly compared to your current solution and your existing code already is clear.

However instead of calculating the total of the previous period elements on every step, you can keep a running total and adjust it on each iteration. That is, change this:

total = 0;
for (int x = i; x > (i - period); x--)
    total += series.Values[x];

to this:

if (i >= period) {
    total -= series.Values[i - period];
}
total += series.Values[i];

This will mean that your code will take the same amount of time to execute regardless of the size of period.

Tribe answered 2/3, 2011 at 11:23 Comment(3)
This is not really answering the question. The OP wants to know how to do it in Linq.Paleozoology
In my opinion, don't use LINQ is a valid answer to the question. LINQ is wonderful but it is the wrong tool here.Tribe
Actually, I really just wanted to know how to do it well. That said. at a later date, I may pull these values directly from an SQL DB. In this case, an all LINQ solution MAY be better. I will bench mark them to see which is faster.Gannes
N
7

This block

double total = 0;
for (int x = i; x > (i - period); x--)
    total += series.Values[x];
double average = total / period;

can be rewritten as:

double average = series.Values.Skip(i - period + 1).Take(period).Sum() / period;

Your method may look like:

series.Skip(period - 1)
    .Select((item, index) =>
        new 
        {
            item.Key,            
            series.Values.Skip(index).Take(period).Sum() / period
        });

As you can see, linq is very expressive. I recommend to start with some tutorial like Introducing LINQ and 101 LINQ Samples.

Null answered 2/3, 2011 at 13:14 Comment(2)
Note the running time of O(n^2), since you need to skip more and more elements at every step (and afaik Skip(i) has to call IEnumerator.MoveNext i times). See my response for a solution in O(n) time ... (I just noticed the OPs comment below that he/she will possibly get the values from an SQL DB in the future. In this case I would every strongly discourage from this solution!)Reflex
@Andre You are welcome. @Reflex Yes, you are right. I try to write the most elegant solution, not the most efficient...Null
T
3

To do this in a more functional way, you'd need a Scan method which exists in Rx but not in LINQ.

Let's look how it would look like if we'd have a scan method

var delta = 3;
var series = new [] {1.1, 2.5, 3.8, 4.8, 5.9, 6.1, 7.6};

var seed = series.Take(delta).Average();
var smas = series
    .Skip(delta)
    .Zip(series, Tuple.Create)
    .Scan(seed, (sma, values)=>sma - (values.Item2/delta) + (values.Item1/delta));
smas = Enumerable.Repeat(0.0, delta-1).Concat(new[]{seed}).Concat(smas);

And here's the scan method, taken and adjusted from here:

public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
    this IEnumerable<TSource> source,
    TAccumulate seed,
    Func<TAccumulate, TSource, TAccumulate> accumulator
)
{
    if (source == null) throw new ArgumentNullException("source");
    if (seed == null) throw new ArgumentNullException("seed");
    if (accumulator == null) throw new ArgumentNullException("accumulator");

    using (var i = source.GetEnumerator())
    {
        if (!i.MoveNext())
        {
            throw new InvalidOperationException("Sequence contains no elements");
        }
        var acc = accumulator(seed, i.Current);

        while (i.MoveNext())
        {
            yield return acc;
            acc = accumulator(acc, i.Current);
        }
        yield return acc;
    }
}

This should have better performance than the brute force method since we are using a running total to calculate the SMA.

What's going on here?

To start we need to calculate the first period which we call seed here. Then, every subsequent value we calculate from the accumulated seed value. To do that we need the old value (that is t-delta) and the newest value for which we zip together the series, once from the beginning and once shifted by the delta.

At the end we do some cleanup by adding zeroes for the length of the first period and adding the initial seed value.

Transduction answered 19/6, 2013 at 22:58 Comment(3)
Just saw this. Very interesting! Will have to try it out to see if it improves on the C# for i loopGannes
@AndreP. besides being more efficient than brute force, the values are calculated in a lazy manner. So lets say you have 200k values, but then just write smas.Take(1000), it will only calculate the first 1000 moving average values.Transduction
After reading the problem (and not all the answers), I just devised the same thing (though I called my function AggregateSeq)Rat
G
2

Another option is to use MoreLINQ's Windowed method, which simplifies the code significantly:

var averaged = mySeries.Windowed(period).Select(window => window.Average(keyValuePair => keyValuePair.Value));
Globose answered 10/10, 2017 at 6:32 Comment(0)
K
0

I use this code to calculate SMA:

private void calculateSimpleMA(decimal[] values, out decimal[] buffer)
{
    int period = values.Count();     // gets Period (assuming Period=Values-Array-Size)
    buffer = new decimal[period];    // initializes buffer array
    var sma = SMA(period);           // gets SMA function
    for (int i = 0; i < period; i++)
        buffer[i] = sma(values[i]);  // fills buffer with SMA calculation
}

static Func<decimal, decimal> SMA(int p)
{
    Queue<decimal> s = new Queue<decimal>(p);
    return (x) =>
    {
        if (s.Count >= p)
        {
            s.Dequeue();
        }
        s.Enqueue(x);
        return s.Average();
    };
}
Krupp answered 1/4, 2015 at 15:2 Comment(0)
H
0

Here is an extension method:

public static IEnumerable<double> MovingAverage(this IEnumerable<double> source, int period)
{
    if (source is null)
    {
        throw new ArgumentNullException(nameof(source));
    }

    if (period < 1)
    {
        throw new ArgumentOutOfRangeException(nameof(period));
    }

    return Core();

    IEnumerable<double> Core()
    {
        var sum = 0.0;
        var buffer = new double[period];
        var n = 0;
        foreach (var x in source)
        {
            n++;
            sum += x;
            var index = n % period;
            if (n >= period)
            {
                sum -= buffer[index];
                yield return sum / period;
            }

            buffer[index] = x;
        }
    }
}
Hazlip answered 24/5, 2021 at 9:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.