Average function without overflow exception
Asked Answered
M

18

20

.NET Framework 3.5.
I'm trying to calculate the average of some pretty large numbers.
For instance:

using System;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        var items = new long[]
                        {
                            long.MaxValue - 100, 
                            long.MaxValue - 200, 
                            long.MaxValue - 300
                        };
        try
        {
            var avg = items.Average();
            Console.WriteLine(avg);
        }
        catch (OverflowException ex)
        {
            Console.WriteLine("can't calculate that!");
        }
        Console.ReadLine();
    }
}

Obviously, the mathematical result is 9223372036854775607 (long.MaxValue - 200), but I get an exception there. This is because the implementation (on my machine) to the Average extension method, as inspected by .NET Reflector is:

public static double Average(this IEnumerable<long> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    long num = 0L;
    long num2 = 0L;
    foreach (long num3 in source)
    {
        num += num3;
        num2 += 1L;
    }
    if (num2 <= 0L)
    {
        throw Error.NoElements();
    }
    return (((double) num) / ((double) num2));
}

I know I can use a BigInt library (yes, I know that it is included in .NET Framework 4.0, but I'm tied to 3.5).

But I still wonder if there's a pretty straight forward implementation of calculating the average of integers without an external library. Do you happen to know about such implementation?

Thanks!!


UPDATE:

The previous example, of three large integers, was just an example to illustrate the overflow issue. The question is about calculating an average of any set of numbers which might sum to a large number that exceeds the type's max value. Sorry about this confusion. I also changed the question's title to avoid additional confusion.

Thanks all!!

Microphysics answered 24/5, 2010 at 7:58 Comment(4)
You are anyway going to convert your sum into double why not to use double type for sum accumulator? There may be some small errors as result of truncating long to width of mantissa.Obie
@ony: It feels like he doesn't have access to the Average function's code - why else would he use Reflector on it?Swift
@ANeves: That's just a variant of implementation as response to "I still wonder if there's".Obie
@PauliL - oops, I fixed it to the original values.Microphysics
F
18

This answer used to suggest storing the quotient and remainder (mod count) separately. That solution is less space-efficient and more code-complex.

In order to accurately compute the average, you must keep track of the total. There is no way around this, unless you're willing to sacrifice accuracy. You can try to store the total in fancy ways, but ultimately you must be tracking it if the algorithm is correct.

For single-pass algorithms, this is easy to prove. Suppose you can't reconstruct the total of all preceding items, given the algorithm's entire state after processing those items. But wait, we can simulate the algorithm then receiving a series of 0 items until we finish off the sequence. Then we can multiply the result by the count and get the total. Contradiction. Therefore a single-pass algorithm must be tracking the total in some sense.

Therefore the simplest correct algorithm will just sum up the items and divide by the count. All you have to do is pick an integer type with enough space to store the total. Using a BigInteger guarantees no issues, so I suggest using that.

var total = BigInteger.Zero
var count = 0
for i in values
    count += 1
    total += i
return total / (double)count //warning: possible loss of accuracy, maybe return a Rational instead?
Fluctuate answered 24/5, 2010 at 11:9 Comment(3)
+1 for more accuracy while handling any values within the Int64 range and concise codeNinetieth
pop quiz: now implement this without knowing the count a priori ;)Arbogast
I've actually thought about it more and... it's more time and space efficient to just store the total in an Int64 or BigInteger and do one division at the end. Also makes the unknown count case trivial.Fluctuate
G
13

If you're just looking for an arithmetic mean, you can perform the calculation like this:

public static double Mean(this IEnumerable<long> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }

    double count = (double)source.Count();
    double mean = 0D;

    foreach(long x in source)
    {
        mean += (double)x/count;
    }

    return mean;
}

Edit:

In response to comments, there definitely is a loss of precision this way, due to performing numerous divisions and additions. For the values indicated by the question, this should not be a problem, but it should be a consideration.

Guaiacum answered 24/5, 2010 at 8:18 Comment(5)
Excellent answer - minimal loss of precision, minimal chance of overflow, and gets the right answer! +1 from me... However: IEnumerable doesn't have a .Count(), so you should maybe correct your parameter type (or make explicit that you're using Linq). Oh, and nice avatar ;)Nannettenanni
@Dan, IEnumerable does have a .Count(), given that you include a using statement for System.Linq.Noreen
If count is very large, and the elements are small, the loss of precision might not be negligible. The more elements you have and the smaller they are, the worse this performs...Inez
@Tomas - fair point - I missed the using in the OP. He's already had my +1 anyway ;-)Nannettenanni
@TomasAschan while Count() is accessible via LINQ, it will still be a bad choice here as it will potentially cause multiple enumeration of the ienumerable. It would be more adequate to pass the value in as a ICollection<T> which keeps track of its count.Agonizing
P
7

You may try the following approach:

let number of elements is N, and numbers are arr[0], .., arr[N-1].

You need to define 2 variables:

mean and remainder.

initially mean = 0, remainder = 0.

at step i you need to change mean and remainder in the following way:

mean += arr[i] / N;
remainder += arr[i] % N;
mean += remainder / N;
remainder %= N;

after N steps you will get correct answer in mean variable and remainder / N will be fractional part of the answer (I am not sure you need it, but anyway)

Plentiful answered 24/5, 2010 at 11:5 Comment(0)
N
2

If you know approximately what the average will be (or, at least, that all pairs of numbers will have a max difference < long.MaxValue), you can calculate the average difference from that value instead. I take an example with low numbers, but it works equally well with large ones.

// Let's say numbers cannot exceed 40.
List<int> numbers = new List<int>() { 31 28 24 32 36 29 }; // Average: 30

List<int> diffs = new List<int>();

// This can probably be done more effectively in linq, but to show the idea:
foreach(int number in numbers.Skip(1))
{
    diffs.Add(numbers.First()-number);
}
// diffs now contains { -3 -6 1 5 -2 }

var avgDiff = diffs.Sum() / diffs.Count(); // the average is -1

// To get the average value, just add the average diff to the first value:
var totalAverage = numbers.First()+avgDiff;

You can of course implement this in some way that makes it easier to reuse, for example as an extension method to IEnumerable<long>.

Noreen answered 24/5, 2010 at 8:11 Comment(2)
If you're unlucky to have a list {long.MaxValue, long.MinValue+100, ... }, it still goes awry. But your idea seems nice.Swift
@Swift - for this to work I explicitly assumed that no two numbers should be longer than long.MaxValue apart.Noreen
H
2

Here is how I would do if given this problem. First let's define very simple RationalNumber class, which contains two properties - Dividend and Divisor and an operator for adding two complex numbers. Here is how it looks:

public sealed class RationalNumber
{
    public RationalNumber()
    {
        this.Divisor = 1;
    }


    public static RationalNumberoperator +( RationalNumberc1, RationalNumber c2 )
    {
        RationalNumber result = new RationalNumber();

        Int64 nDividend = ( c1.Dividend * c2.Divisor ) + ( c2.Dividend * c1.Divisor );
        Int64 nDivisor = c1.Divisor * c2.Divisor;
        Int64 nReminder = nDividend % nDivisor;

        if ( nReminder == 0 )
        {
            // The number is whole
            result.Dividend = nDividend / nDivisor;
        }
        else
        {
            Int64 nGreatestCommonDivisor = FindGreatestCommonDivisor( nDividend, nDivisor );

            if ( nGreatestCommonDivisor != 0 )
            {
                nDividend = nDividend / nGreatestCommonDivisor;
                nDivisor = nDivisor / nGreatestCommonDivisor;
            }

            result.Dividend = nDividend;
            result.Divisor = nDivisor;
        }

            return result;
    }


    private static Int64 FindGreatestCommonDivisor( Int64 a, Int64 b)
    {
        Int64 nRemainder;

        while ( b != 0 )
        {
            nRemainder = a% b;
            a = b;
            b = nRemainder;
        }

        return a;
    }


    // a / b = a is devidend, b is devisor
    public Int64 Dividend   { get; set; }
    public Int64 Divisor    { get; set; }
}

Second part is really easy. Let's say we have an array of numbers. Their average is estimated by Sum(Numbers)/Length(Numbers), which is the same as Number[ 0 ] / Length + Number[ 1 ] / Length + ... + Number[ n ] / Length. For to be able to calculate this we will represent each Number[ i ] / Length as a whole number and a rational part ( reminder ). Here is how it looks:

Int64[] aValues = new Int64[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };

List<RationalNumber> list = new List<RationalNumber>();
Int64 nAverage = 0;

for ( Int32 i = 0; i < aValues.Length; ++i )
{
    Int64 nReminder = aValues[ i ] % aValues.Length;
    Int64 nWhole = aValues[ i ] / aValues.Length;

    nAverage += nWhole;

    if ( nReminder != 0 )
    {
        list.Add( new RationalNumber() { Dividend = nReminder, Divisor = aValues.Length } );
    }
}

RationalNumber rationalTotal = new RationalNumber();

foreach ( var rational in list )
{
    rationalTotal += rational;
}

nAverage = nAverage + ( rationalTotal.Dividend / rationalTotal.Divisor );

At the end we have a list of rational numbers, and a whole number which we sum together and get the average of the sequence without an overflow. Same approach can be taken for any type without an overflow for it, and there is no lost of precision.

EDIT:

Why this works:

Define: A set of numbers.

if Average( A ) = SUM( A ) / LEN( A ) =>

Average( A ) = A[ 0 ] / LEN( A ) + A[ 1 ] / LEN( A ) + A[ 2 ] / LEN( A ) + ..... + A[ N ] / LEN( 2 ) =>

if we define An to be a number that satisfies this: An = X + ( Y / LEN( A ) ), which is essentially so because if you divide A by B we get X with a reminder a rational number ( Y / B ).

=> so

Average( A ) = A1 + A2 + A3 + ... + AN = X1 + X2 + X3 + X4 + ... + Reminder1 + Reminder2 + ...;

Sum the whole parts, and sum the reminders by keeping them in rational number form. In the end we get one whole number and one rational, which summed together gives Average( A ). Depending on what precision you'd like, you apply this only to the rational number at the end.

Hierarchize answered 24/5, 2010 at 9:25 Comment(2)
You are using misleading names (ComplexNumber? where's the real and imaginary parts?! - you probably meant RationalNumber - left and right for a GCD function?!). You are using modulos, divisions and the GCD algorithm during addition so I don't understand how this is faster than @Programming Hero's solution. You're not exactly clear about how and why it works either. -1.Nigelniger
I take your criticism and will update my answer. I rechecked my code for testing the speed. My mistake. I will correct my comment.Hierarchize
D
2

Simple answer with LINQ...

var data = new[] { int.MaxValue, int.MaxValue, int.MaxValue };
var mean = (int)data.Select(d => (double)d / data.Count()).Sum();

Depending on the size of the set fo data you may want to force data .ToList() or .ToArray() before your process this method so it can't requery count on each pass. (Or you can call it before the .Select(..).Sum().)

Didi answered 24/5, 2010 at 10:56 Comment(0)
T
1

If you know in advance that all your numbers are going to be 'big' (in the sense of 'much nearer long.MaxValue than zero), you can calculate the average of their distance from long.MaxValue, then the average of the numbers is long.MaxValue less that.

However, this approach will fail if (m)any of the numbers are far from long.MaxValue, so it's horses for courses...

Tojo answered 24/5, 2010 at 8:5 Comment(1)
This is about the same as my approach, but yours will fail for any negative number.Noreen
J
1

I guess there has to be a compromise somewhere or the other. If the numbers are really getting so large then few digits of lower orders (say lower 5 digits) might not affect the result as much.

Another issue is where you don't really know the size of the dataset coming in, especially in stream/real time cases. Here I don't see any solution other then the (previousAverage*oldCount + newValue) / (oldCount <- oldCount+1)


Here's a suggestion:

*LargestDataTypePossible* currentAverage;
*SomeSuitableDatatypeSupportingRationalValues* newValue;

*int* count;
addToCurrentAverage(value){
 newValue = value/100000;
 count = count + 1;
 currentAverage = (currentAverage * (count-1) + newValue) / count;
}

getCurrentAverage(){
 return currentAverage * 100000;
}
Jackelynjackeroo answered 6/1, 2011 at 9:5 Comment(2)
PS: Based on the principle: If a + b = c then a/n + b/n = c/nJackelynjackeroo
Sorry, wiki's got a better one. Check en.wikipedia.org/wiki/Moving_average. Check formula at end of section "Cumulative moving average".Jackelynjackeroo
C
1

Averaging numbers of a specific numeric type in a safe way while also only using that numeric type is actually possible, although I would advise using the help of BigInteger in a practical implementation. I created a project for Safe Numeric Calculations that has a small structure (Int32WithBoundedRollover) which can sum up to 2^32 int32s without any overflow (the structure internally uses two int32 fields to do this, so no larger data types are used).

Once you have this sum you then need to calculate sum/total to get the average, which you can do (although I wouldn't recommend it) by creating and then incrementing by total another instance of Int32WithBoundedRollover. After each increment you can compare it to the sum until you find out the integer part of the average. From there you can peel off the remainder and calculate the fractional part. There are likely some clever tricks to make this more efficient, but this basic strategy would certainly work without needing to resort to a bigger data type.

That being said, the current implementation isn't build for this (for instance there is no comparison operator on Int32WithBoundedRollover, although it wouldn't be too hard to add). The reason is that it is just much simpler to use BigInteger at the end to do the calculation. Performance wise this doesn't matter too much for large averages since it will only be done once, and it is just too clean and easy to understand to worry about coming up with something clever (at least so far...).

As far as your original question which was concerned with the long data type, the Int32WithBoundedRollover could be converted to a LongWithBoundedRollover by just swapping int32 references for long references and it should work just the same. For Int32s I did notice a pretty big difference in performance (in case that is of interest). Compared to the BigInteger only method the method that I produced is around 80% faster for the large (as in total number of data points) samples that I was testing (the code for this is included in the unit tests for the Int32WithBoundedRollover class). This is likely mostly due to the difference between the int32 operations being done in hardware instead of software as the BigInteger operations are.

Corsiglia answered 30/3, 2014 at 3:4 Comment(1)
Nice project, I'll dive into it when I can.Microphysics
A
0

How about BigInteger in Visual J#.

Adventuresome answered 24/5, 2010 at 8:3 Comment(0)
H
0

If you're willing to sacrifice precision, you could do something like:

long num2 = 0L;
foreach (long num3 in source)
{
    num2 += 1L;
}
if (num2 <= 0L)
{
    throw Error.NoElements();
}
double average = 0;
foreach (long num3 in source)
{
    average += (double)num3 / (double)num2;
}
return average;
Hathorn answered 24/5, 2010 at 8:9 Comment(0)
S
0

Perhaps you can reduce every item by calculating average of adjusted values and then multiply it by the number of elements in collection. However, you'll find a bit different number of of operations on floating point.

var items = new long[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
var avg = items.Average(i => i / items.Count()) * items.Count();
Sialoid answered 24/5, 2010 at 8:12 Comment(0)
C
0

You could keep a rolling average which you update once for each large number.

Colet answered 24/5, 2010 at 8:13 Comment(0)
T
0

Use the IntX library on CodePlex.

Trappings answered 24/5, 2010 at 8:29 Comment(0)
U
0

NextAverage = CurrentAverage + (NewValue - CurrentAverage) / (CurrentObservations + 1)

Unhook answered 26/2, 2013 at 0:58 Comment(0)
L
0

Here is my version of an extension method that can help with this.

    public static long Average(this IEnumerable<long> longs)
    {
        long mean = 0;
        long count = longs.Count();
        foreach (var val in longs)
        {
            mean += val / count;
        }
        return mean;
    }
Lepto answered 3/4, 2013 at 15:42 Comment(1)
Thanks for posting your answer. However, this isn't actually an answer to the question asked. Answers on Stack Overflow are expected to be directly related to the question that is being asked. With a little bit of editing, though, it could be appropriate.Malmo
D
0

Let Avg(n) be the average in first n number, and data[n] is the nth number.

Avg(n)=(double)(n-1)/(double)n*Avg(n-1)+(double)data[n]/(double)n

Can avoid value overflow however loss precision when n is very large.

Deception answered 17/9, 2013 at 3:43 Comment(0)
D
0

For two positive numbers (or two negative numbers) , I found a very elegant solution from here.

where an average computation of (a+b)/2 can be replaced with a+((b-a)/2.

Drona answered 22/11, 2019 at 19:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.