LINQ and IEnumerable<T>
is pull based. This means that the predicates and actions that are part of the LINQ statement in general are not executed until values are pulled. Furthermore the predicates and actions will execute each time values are pulled (e.g. there is no secret caching going on).
Pulling from an IEnumerable<T>
is done by the foreach
statement which really is syntactic sugar for getting an enumerator by calling IEnumerable<T>.GetEnumerator()
and repeatedly calling IEnumerator<T>.MoveNext()
to pull the values.
LINQ operators like ToList()
, ToArray()
, ToDictionary()
and ToLookup()
wraps a foreach
statement so these methods will do a pull. The same can be said about operators like Aggregate()
, Count()
and First()
. These methods have in common that they produce a single result that has to be created by executing a foreach
statement.
Many LINQ operators produce a new IEnumerable<T>
sequence. When an element is pulled from the resulting sequence the operator pulls one or more elements from the source sequence. The Select()
operator is the most obvious example but other examples are SelectMany()
, Where()
, Concat()
, Union()
, Distinct()
, Skip()
and Take()
. These operators don't do any caching. When then N'th element is pulled from a Select()
it pulls the N´th element from the source sequence, applies the projection using the action supplied and returns it. Nothing secret going on here.
Other LINQ operators also produce new IEnumerable<T>
sequences but they are implemented by actually pulling the entire source sequence, doing their job and then producing a new sequence. These methods include Reverse()
, OrderBy()
and GroupBy()
. However, the pull done by the operator is only performed when the operator itself is pulled meaning that you still need a foreach
loop "at the end" of the LINQ statement before anything is executed. You could argue that these operators use a cache because they immediately pull the entire source sequence. However, this cache is built each time the operator is iterated so it is really an implementation detail and not something that will magically detect that you are applying the same OrderBy()
operation multiple times to the same sequence.
In your example the ToList()
will do a pull. The action in the outer Select
will execute 100 times. Each time this action is executed the Aggregate()
will do another pull that will parse the XML attributes. In total your code will call Int32.Parse()
200 times.
You can improve this by pulling the attributes once instead of on each iteration:
var X = XElement.Parse (@"
<ROOT>
<MUL v='2' />
<MUL v='3' />
</ROOT>
")
.Elements ()
.Select (t => Int32.Parse (t.Attribute ("v").Value))
.ToList ();
Enumerable.Range (1, 100)
.Select (s => x.Aggregate (s, (t, u) => t * u))
.ToList ()
.ForEach (s => Console.WriteLine (s));
Now Int32.Parse()
is only called 2 times. However, the cost is that a list of attribute values have to be allocated, stored and eventually garbage collected. (Not a big concern when the list contains two elements.)
Note that if you forget the first ToList()
that pulls the attributes the code will still run but with the exact same performance characteristics as the original code. No space is used to store the attributes but they are parsed on each iteration.