LINQ Aggregate algorithm explained
Asked Answered
C

12

847

This might sound lame, but I have not been able to find a really good explanation of Aggregate.

Good means short, descriptive, comprehensive with a small and clear example.

Coppola answered 18/8, 2011 at 9:51 Comment(0)
M
1176

The easiest-to-understand definition of Aggregate is that it performs an operation on each element of the list taking into account the operations that have gone before. That is to say it performs the action on the first and second element and carries the result forward. Then it operates on the previous result and the third element and carries forward. etc.

Example 1. Summing numbers

var nums = new[]{1,2,3,4};
var sum = nums.Aggregate( (a,b) => a + b);
Console.WriteLine(sum); // output: 10 (1+2+3+4)

This adds 1 and 2 to make 3. Then adds 3 (result of previous) and 3 (next element in sequence) to make 6. Then adds 6 and 4 to make 10.

Example 2. create a csv from an array of strings

var chars = new []{"a","b","c","d"};
var csv = chars.Aggregate( (a,b) => a + ',' + b);
Console.WriteLine(csv); // Output a,b,c,d

This works in much the same way. Concatenate a a comma and b to make a,b. Then concatenates a,b with a comma and c to make a,b,c. and so on.

Example 3. Multiplying numbers using a seed

For completeness, there is an overload of Aggregate which takes a seed value.

var multipliers = new []{10,20,30,40};
var multiplied = multipliers.Aggregate(5, (a,b) => a * b);
Console.WriteLine(multiplied); //Output 1200000 ((((5*10)*20)*30)*40)

Much like the above examples, this starts with a value of 5 and multiplies it by the first element of the sequence 10 giving a result of 50. This result is carried forward and multiplied by the next number in the sequence 20 to give a result of 1000. This continues through the remaining 2 element of the sequence.

Live examples: http://rextester.com/ZXZ64749
Docs: http://msdn.microsoft.com/en-us/library/bb548651.aspx


Addendum

Example 2, above, uses string concatenation to create a list of values separated by a comma. This is a simplistic way to explain the use of Aggregate which was the intention of this answer. However, if using this technique to actually create a large amount of comma separated data, it would be more appropriate to use a StringBuilder, and this is entirely compatible with Aggregate using the seeded overload to initiate the StringBuilder.

var chars = new []{"a","b","c", "d"};
var csv = chars.Aggregate(new StringBuilder(), (a,b) => {
    if(a.Length>0)
        a.Append(",");
    a.Append(b);
    return a;
});
Console.WriteLine(csv);

Updated example: http://rextester.com/YZCVXV6464

Mainstay answered 18/8, 2011 at 9:59 Comment(9)
Another explanation for the first description is that the function you provide always combines the first two members until the array is shrinked to one element. So [1,2,3,4] will be [3,3,4] then [6,4] and at last [10]. But instead of returning an array of a single value you just get the value itself.Chippy
Can I early break/exit from an Aggregate function? For example, chars.Aggregate((a, b) => {if(a == 'a') break the whole aggregate else return a + ', ' + b})Unconventionality
@JeffTian - I would suggest chaining a TakeWhile then an Aggregate - thats the beatuty of Enumerable extensions - they are easily chainable. So you end up with TakeWhile(a => a == 'a').Aggregate(....). See this example: rextester.com/WPRA60543Mainstay
Two suggestions: show how seed could be use to do concatenation without the quadratic behaviour of your example, and explain the asymmetric roles of the two arguments (the first is the accumulator, the second is the value).Hasa
@Clément - do you mean sead with a StringBuilder? and for your second point, I think i've done that - but if you think it could be improved feel free to edit :)Mainstay
As a sidenote on the addendum, the whole block could easily be replaced by var csv = string.Join(",", chars) (no need of aggregate or stringbuilders) - but yeah, I know the point of the answer was to give example usage of aggregate so it's cool. But I still wanted to mention it's not recommended for just joining strings, there's already a method dedicated for that....Theater
another common usage (so far the only i have even seen in production code) is to get min or max items like var biggestAccount = Accounts.Aggregate((a1, a2) => a1.Amount >= a2.Amount ? a1 : a2);Quinquennial
Feels like aggregate in .net is reduce in javascriptGustation
@Gustation that's exactly what it is, yes.Mainstay
P
150

It partly depends on which overload you're talking about, but the basic idea is:

  • Start with a seed as the "current value"
  • Iterate over the sequence. For each value in the sequence:
    • Apply a user-specified function to transform (currentValue, sequenceValue) into (nextValue)
    • Set currentValue = nextValue
  • Return the final currentValue

You may find the Aggregate post in my Edulinq series useful - it includes a more detailed description (including the various overloads) and implementations.

One simple example is using Aggregate as an alternative to Count:

// 0 is the seed, and for each item, we effectively increment the current value.
// In this case we can ignore "item" itself.
int count = sequence.Aggregate(0, (current, item) => current + 1);

Or perhaps summing all the lengths of strings in a sequence of strings:

int total = sequence.Aggregate(0, (current, item) => current + item.Length);

Personally I rarely find Aggregate useful - the "tailored" aggregation methods are usually good enough for me.

Portuguese answered 18/8, 2011 at 9:54 Comment(3)
@Jon Are there asynchronous variations of Aggregate that split the items into a tree so that work can be split between cores? It seems the design of the method is consistent with the concepts of "reduce" or "fold", but I don't know if it really is doing that under the hood, or simply iterating through the list of items.Agonizing
@Jon : the edulink mentioned above is not working can you redirect me to the right link. And can you please be more specific about the term "tailored" aggregation functions that you used in your answer.Nitrification
@Koushik: I've fixed the link in the post. By "tailored" aggregation functions I mean things like Max/Min/Count/Sum.Portuguese
P
65

Super short Aggregate works like fold in Haskell/ML/F#.

Slightly longer .Max(), .Min(), .Sum(), .Average() all iterates over the elements in a sequence and aggregates them using the respective aggregate function. .Aggregate () is generalized aggregator in that it allows the developer to specify the start state (aka seed) and the aggregate function.

I know you asked for a short explaination but I figured as others gave a couple of short answers I figured you would perhaps be interested in a slightly longer one

Long version with code One way to illustrate what does it could be show how you implement Sample Standard Deviation once using foreach and once using .Aggregate. Note: I haven't prioritized performance here so I iterate several times over the colleciton unnecessarily

First a helper function used to create a sum of quadratic distances:

static double SumOfQuadraticDistance (double average, int value, double state)
{
    var diff = (value - average);
    return state + diff * diff;
}

Then Sample Standard Deviation using ForEach:

static double SampleStandardDeviation_ForEach (
    this IEnumerable<int> ints)
{
    var length = ints.Count ();
    if (length < 2)
    {
        return 0.0;
    }

    const double seed = 0.0;
    var average = ints.Average ();

    var state = seed;
    foreach (var value in ints)
    {
        state = SumOfQuadraticDistance (average, value, state);
    }
    var sumOfQuadraticDistance = state;

    return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}

Then once using .Aggregate:

static double SampleStandardDeviation_Aggregate (
    this IEnumerable<int> ints)
{
    var length = ints.Count ();
    if (length < 2)
    {
        return 0.0;
    }

    const double seed = 0.0;
    var average = ints.Average ();

    var sumOfQuadraticDistance = ints
        .Aggregate (
            seed,
            (state, value) => SumOfQuadraticDistance (average, value, state)
            );

    return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}

Note that these functions are identical except for how sumOfQuadraticDistance is calculated:

var state = seed;
foreach (var value in ints)
{
    state = SumOfQuadraticDistance (average, value, state);
}
var sumOfQuadraticDistance = state;

Versus:

var sumOfQuadraticDistance = ints
    .Aggregate (
        seed,
        (state, value) => SumOfQuadraticDistance (average, value, state)
        );

So what .Aggregate does is that it encapsulates this aggregator pattern and I expect that the implementation of .Aggregate would look something like this:

public static TAggregate Aggregate<TAggregate, TValue> (
    this IEnumerable<TValue> values,
    TAggregate seed,
    Func<TAggregate, TValue, TAggregate> aggregator
    )
{
    var state = seed;

    foreach (var value in values)
    {
        state = aggregator (state, value);
    }

    return state;
}

Using the Standard deviation functions would look something like this:

var ints = new[] {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
var average = ints.Average ();
var sampleStandardDeviation = ints.SampleStandardDeviation_Aggregate ();
var sampleStandardDeviation2 = ints.SampleStandardDeviation_ForEach ();

Console.WriteLine (average);
Console.WriteLine (sampleStandardDeviation);
Console.WriteLine (sampleStandardDeviation2);

IMHO

So does .Aggregate help readability? In general I love LINQ because I think .Where, .Select, .OrderBy and so on greatly helps readability (if you avoid inlined hierarhical .Selects). Aggregate has to be in Linq for completeness reasons but personally I am not so convinced that .Aggregate adds readability compared to a well written foreach.

Pires answered 18/8, 2011 at 10:39 Comment(2)
+1 Excellent! But extension methods SampleStandardDeviation_Aggregate() and SampleStandardDeviation_ForEach() cannot be private (by default in absence of an access qualifier), so should have been accrued by either public or internal, it seems to meVoltz
FYI: If I remember correctly the extension methods in my sample was part of the same class that used them ==> private works in this case.Pires
F
57

A picture is worth a thousand words

Reminder:
Func<X, Y, R> is a function with two inputs of type X and Y, that returns a result of type R.

Enumerable.Aggregate has three overloads:

Overload 1:

A Aggregate<A>(this IEnumerable<A> a, Func<A, A, A> f)

Aggregate1

Example:

new[]{1,2,3,4}.Aggregate((x, y) => x + y);  // 10

This overload is simple, but it has the following limitations:

  • the sequence must contain at least one element,
    otherwise the function will throw an InvalidOperationException.
  • elements and result must be of the same type.


Overload 2:

B Aggregate<A, B>(this IEnumerable<A> a, B bIn, Func<B, A, B> f)

Aggregate2

Example:

var hayStack = new[] {"straw", "needle", "straw", "straw", "needle"};
var nNeedles = hayStack.Aggregate(0, (n, e) => e == "needle" ? n+1 : n);  // 2

This overload is more general:

  • a seed value must be provided (bIn).
  • the collection can be empty,
    in this case, the function will yield the seed value as result.
  • elements and result can have different types.


Overload 3:

C Aggregate<A,B,C>(this IEnumerable<A> a, B bIn, Func<B,A,B> f, Func<B,C> f2)

The third overload is not very useful IMO.
The same can be written more succinctly by using overload 2 followed by a function that transforms its result.


The illustrations are adapted from this excellent blogpost.

Furlough answered 13/4, 2017 at 12:10 Comment(3)
This would be a great answer.... on a question about Haskel. But there is no overload of Aggegate in .net which takes a Func<T, T, T>.Mainstay
Yes there is. You use it in your own answer!Furlough
Upvoting because you carefully describe what happens when the sequence is empty. Let N be the number of elements in the source. We observe that the overload that does not take a seed, applies the accumulator function N-1 times; while the other overloads (that do take a seed) apply the accumulator function N times.Ditheism
L
18

Aggregate is basically used to Group or Sum up data.

According to MSDN "Aggregate Function Applies an accumulator function over a sequence."

Example 1: Add all the numbers in a array.

int[] numbers = new int[] { 1,2,3,4,5 };
int aggregatedValue = numbers.Aggregate((total, nextValue) => total + nextValue);

*important: The initial aggregate value by default is the 1 element in the sequence of collection. i.e: the total variable initial value will be 1 by default.

variable explanation

total: it will hold the sum up value(aggregated value) returned by the func.

nextValue: it is the next value in the array sequence. This value is than added to the aggregated value i.e total.

Example 2: Add all items in an array. Also set the initial accumulator value to start adding with from 10.

int[] numbers = new int[] { 1,2,3,4,5 };
int aggregatedValue = numbers.Aggregate(10, (total, nextValue) => total + nextValue);

arguments explanation:

the first argument is the initial(starting value i.e seed value) which will be used to start addition with the next value in the array.

the second argument is a func which is a func that takes 2 int.

1.total: this will hold same as before the sum up value(aggregated value) returned by the func after the calculation.

2.nextValue: : it is the next value in the array sequence. This value is than added to the aggregated value i.e total.

Also debugging this code will give you a better understanding of how aggregate work.

Lemnos answered 19/9, 2014 at 1:11 Comment(0)
N
11

In addition to all the great answers here already, I've also used it to walk an item through a series of transformation steps.

If a transformation is implemented as a Func<T,T>, you can add several transformations to a List<Func<T,T>> and use Aggregate to walk an instance of T through each step.

A more concrete example

You want to take a string value, and walk it through a series of text transformations that could be built programatically.

var transformationPipeLine = new List<Func<string, string>>();
transformationPipeLine.Add((input) => input.Trim());
transformationPipeLine.Add((input) => input.Substring(1));
transformationPipeLine.Add((input) => input.Substring(0, input.Length - 1));
transformationPipeLine.Add((input) => input.ToUpper());

var text = "    cat   ";
var output = transformationPipeLine.Aggregate(text, (input, transform)=> transform(input));
Console.WriteLine(output);

This will create a chain of transformations: Remove leading and trailing spaces -> remove first character -> remove last character -> convert to upper-case. Steps in this chain can be added, removed, or reordered as needed, to create whatever kind of transformation pipeline is required.

The end result of this specific pipeline, is that " cat " becomes "A".


This can become very powerful once you realize that T can be anything. This could be used for image transformations, like filters, using BitMap as an example;

Nanine answered 8/9, 2017 at 20:1 Comment(0)
P
7

Learned a lot from Jamiec's answer.

If the only need is to generate CSV string, you may try this.

var csv3 = string.Join(",",chars);

Here is a test with 1 million strings

0.28 seconds = Aggregate w/ String Builder 
0.30 seconds = String.Join 

Source code is here

Piddock answered 3/9, 2015 at 20:43 Comment(3)
When I ran the same code in dotnetfiddle.net as provided in the link, I got "Fatal Error: Memory usage limit was exceeded" for "string.Join" but Aggregate always worked as expected. So I believe this is not recommended to use String.JoinMillstone
Strange? When I commented First stop watch which was for Aggregate; then I am not getting any "Fatal Error: Memory usage limit was exceeded". Please explain! Link : dotnetfiddle.net/6YyumSMillstone
dotnetfiddle.net has a memory limit, when reaching execution stop. if you move aggregate code before String.Join code, you may get error for aggregate.Piddock
N
7

Definition

Aggregate method is an extension method for generic collections. Aggregate method applies a function to each item of a collection. Not just only applies a function, but takes its result as initial value for the next iteration. So, as a result, we will get a computed value (min, max, avg, or other statistical value) from a collection.

Therefore, Aggregate method is a form of safe implementation of a recursive function.

Safe, because the recursion will iterate over each item of a collection and we can’t get any infinite loop suspension by wrong exit condition. Recursive, because the current function’s result is used as a parameter for the next function call.

Syntax:

collection.Aggregate(seed, func, resultSelector);
  • seed - initial value by default;
  • func - our recursive function. It can be a lambda-expression, a Func delegate or a function type T F(T result, T nextValue);
  • resultSelector - it can be a function like func or an expression to compute, transform, change, convert the final result.

How it works:

var nums = new[]{1, 2};
var result = nums.Aggregate(1, (result, n) => result + n); //result = (1 + 1) + 2 = 4
var result2 = nums.Aggregate(0, (result, n) => result + n, response => (decimal)response/2.0); //result2 = ((0 + 1) + 2)*1.0/2.0 = 3*1.0/2.0 = 3.0/2.0 = 1.5

Practical usage:

  1. Find Factorial from a number n:

int n = 7;
var numbers = Enumerable.Range(1, n);
var factorial = numbers.Aggregate((result, x) => result * x);

which is doing the same thing as this function:

public static int Factorial(int n)
{
   if (n < 1) return 1;

   return n * Factorial(n - 1);
}
  1. Aggregate() is one of the most powerful LINQ extension method, like Select() and Where(). We can use it to replace the Sum(), Min(). Max(), Avg() functionality, or to change it by implementing addition context:
    var numbers = new[]{3, 2, 6, 4, 9, 5, 7};
    var avg = numbers.Aggregate(0.0, (result, x) => result + x, response => (double)response/(double)numbers.Count());
    var min = numbers.Aggregate((result, x) => (result < x)? result: x);
  1. More complex usage of extension methods:
    var path = @“c:\path-to-folder”;

    string[] txtFiles = Directory.GetFiles(path).Where(f => f.EndsWith(“.txt”)).ToArray<string>();
    var output = txtFiles.Select(f => File.ReadAllText(f, Encoding.Default)).Aggregate<string>((result, content) => result + content);

    File.WriteAllText(path + “summary.txt”, output, Encoding.Default);

    Console.WriteLine(“Text files merged into: {0}”, output); //or other log info
Neral answered 11/10, 2019 at 8:46 Comment(1)
Pretty good first answer. Well done! Shame it's such an old question or you would have got lots of upvotesMainstay
W
1

This is an explanation about using Aggregate on a Fluent API such as Linq Sorting.

var list = new List<Student>();
var sorted = list
    .OrderBy(s => s.LastName)
    .ThenBy(s => s.FirstName)
    .ThenBy(s => s.Age)
    .ThenBy(s => s.Grading)
    .ThenBy(s => s.TotalCourses);

and lets see we want to implement a sort function that take a set of fields, this is very easy using Aggregate instead of a for-loop, like this:

public static IOrderedEnumerable<Student> MySort(
    this List<Student> list,
    params Func<Student, object>[] fields)
{
    var firstField = fields.First();
    var otherFields = fields.Skip(1);

    var init = list.OrderBy(firstField);
    return otherFields.Skip(1).Aggregate(init, (resultList, current) => resultList.ThenBy(current));
}

And we can use it like this:

var sorted = list.MySort(
    s => s.LastName,
    s => s.FirstName,
    s => s.Age,
    s => s.Grading,
    s => s.TotalCourses);
Wincer answered 17/9, 2016 at 17:41 Comment(0)
B
1

Aggregate used to sum columns in a multi dimensional integer array

        int[][] nonMagicSquare =
        {
            new int[] {  3,  1,  7,  8 },
            new int[] {  2,  4, 16,  5 },
            new int[] { 11,  6, 12, 15 },
            new int[] {  9, 13, 10, 14 }
        };

        IEnumerable<int> rowSums = nonMagicSquare
            .Select(row => row.Sum());
        IEnumerable<int> colSums = nonMagicSquare
            .Aggregate(
                (priorSums, currentRow) =>
                    priorSums.Select((priorSum, index) => priorSum + currentRow[index]).ToArray()
                );

Select with index is used within the Aggregate func to sum the matching columns and return a new Array; { 3 + 2 = 5, 1 + 4 = 5, 7 + 16 = 23, 8 + 5 = 13 }.

        Console.WriteLine("rowSums: " + string.Join(", ", rowSums)); // rowSums: 19, 27, 44, 46
        Console.WriteLine("colSums: " + string.Join(", ", colSums)); // colSums: 25, 24, 45, 42

But counting the number of trues in a Boolean array is more difficult since the accumulated type (int) differs from the source type (bool); here a seed is necessary in order to use the second overload.

        bool[][] booleanTable =
        {
            new bool[] { true, true, true, false },
            new bool[] { false, false, false, true },
            new bool[] { true, false, false, true },
            new bool[] { true, true, false, false }
        };

        IEnumerable<int> rowCounts = booleanTable
            .Select(row => row.Select(value => value ? 1 : 0).Sum());
        IEnumerable<int> seed = new int[booleanTable.First().Length];
        IEnumerable<int> colCounts = booleanTable
            .Aggregate(seed,
                (priorSums, currentRow) =>
                    priorSums.Select((priorSum, index) => priorSum + (currentRow[index] ? 1 : 0)).ToArray()
                );

        Console.WriteLine("rowCounts: " + string.Join(", ", rowCounts)); // rowCounts: 3, 1, 2, 2
        Console.WriteLine("colCounts: " + string.Join(", ", colCounts)); // colCounts: 3, 2, 1, 2
Brunn answered 28/2, 2017 at 20:16 Comment(0)
E
1

Everyone has given his explanation. My explanation is like that.

Aggregate method applies a function to each item of a collection. For example, let's have collection { 6, 2, 8, 3 } and the function Add (operator +) it does (((6+2)+8)+3) and returns 19

var numbers = new List<int> { 6, 2, 8, 3 };
int sum = numbers.Aggregate(func: (result, item) => result + item);
// sum: (((6+2)+8)+3) = 19

In this example there is passed named method Add instead of lambda expression.

var numbers = new List<int> { 6, 2, 8, 3 };
int sum = numbers.Aggregate(func: Add);
// sum: (((6+2)+8)+3) = 19

private static int Add(int x, int y) { return x + y; }
Emend answered 16/11, 2017 at 13:49 Comment(0)
P
0

A short and essential definition might be this: Linq Aggregate extension method allows to declare a sort of recursive function applied on the elements of a list, the operands of whom are two: the elements in the order in which they are present into the list, one element at a time, and the result of the previous recursive iteration or nothing if not yet recursion.

In this way you can compute the factorial of numbers, or concatenate strings.

Particiaparticipant answered 12/7, 2016 at 8:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.