Does LINQ cache computed values?

G

2

7

Suppose I have the following code:

var X = XElement.Parse (@"
    <ROOT>
        <MUL v='2' />
        <MUL v='3' />
    </ROOT>
");
Enumerable.Range (1, 100)
    .Select (s => X.Elements ()
        .Select (t => Int32.Parse (t.Attribute ("v").Value))
        .Aggregate (s, (t, u) => t * u)
    )
    .ToList ()
    .ForEach (s => Console.WriteLine (s));

What is the .NET runtime actually doing here? Is it parsing and converting the attributes to integers each of the 100 times, or is it smart enough to figure out that it should cache the parsed values and not repeat the computation for each element in the range?

Moreover, how would I go about figuring out something like this myself?

Thanks in advance for your help.

Griseofulvin answered 25/4, 2012 at 1:56 Comment(2)

"how would I go about figuring out something like this myself" - the best shot is to study the IL that's generated out of this code. – Spectatress 25/4, 2012 at 1:59

You could set a debugger breakpoint on the Parse() method and see how often it hits. – Monarchism 25/4, 2012 at 2:39

M

2

It has been a while since I dug through this code but, IIRC, the way Select works is to simply cache the Func you supply it and run it on the source collection one at a time. So, for each element in the outer range, it will run the inner Select/Aggregate sequence as if it were the first time. There isn't any built-in caching going on -- you would have to implement that yourself in the expressions.

If you wanted to figure this out yourself, you've got three basic options:

Compile the code and use ildasm to view the IL; it's the most accurate but, especially with lambdas and closures, what you get from IL may look nothing like what you put into the C# compiler.
Use something like dotPeek to decompile System.Linq.dll into C#; again, what you get out of these kinds of tools may only approximately resemble the original source code, but at least it will be C# (and dotPeek in particular does a pretty good job, and is free.)
My personal preference - download the .NET 4.0 Reference Source and look for yourself; this is what it's for :) You have to just trust MS that the reference source matches the actual source used to produce the binaries, but I don't see any good reason to doubt them.
As pointed out by @AllonGuralnek you can set breakpoints on specific lambda expressions within a single line; put your cursor somewhere inside the body of the lambda and press F9 and it will breakpoint just the lambda. (If you do it wrong, it will highlight the entire line in the breakpoint color; if you do it right, it will just highlight the lambda.)

Might answered 25/4, 2012 at 2:43 Comment(5)

4. Put your cursor after the => and press F9. That'll put a breakpoint inside the lambda and break when it reaches it. Repeat for each lambda and you get a nice trace of what is called when. – Sabayon 25/4, 2012 at 9:23

@AllonGuralnek that's a good point, I tend to forget about breakpointing lambdas because I usually use the mouse to set them :) – Might 25/4, 2012 at 14:25

Thanks a lot for the tip about putting breakpoints by hitting F9. Until now, I had always re-written the code to use a return statement on a different line, and then put a breakpoint there. Your method will save me a lot of time. – Griseofulvin 25/4, 2012 at 17:2

@Shredderroy, Michel: Yep, it's a feature that has a mouse discoverability issue. Michel, as for your notes about positioning your cursor, VS isn't actually that picky. You can place the cursor anywhere between the => and the end of the lambda to get the right position (basically anywhere in the lambda body). In fact, since a lambda is compiled to just a method, you can set a breakpoint at any one of its statements by placing the cursor on one them and hitting F9, though typically lambdas only have a single statement or expression. – Sabayon 26/4, 2012 at 18:24

Yes, you're right; the problem I was having was if the cursor was at the ^ here: =>^ Int32.Parse then it's "outside" the lambda, but if it's here: => ^Int32.Parse then it's "inside". but I got it eventually :) – Might 26/4, 2012 at 21:36

I

4

LINQ and IEnumerable<T> is pull based. This means that the predicates and actions that are part of the LINQ statement in general are not executed until values are pulled. Furthermore the predicates and actions will execute each time values are pulled (e.g. there is no secret caching going on).

Pulling from an IEnumerable<T> is done by the foreach statement which really is syntactic sugar for getting an enumerator by calling IEnumerable<T>.GetEnumerator() and repeatedly calling IEnumerator<T>.MoveNext() to pull the values.

LINQ operators like ToList(), ToArray(), ToDictionary() and ToLookup() wraps a foreach statement so these methods will do a pull. The same can be said about operators like Aggregate(), Count() and First(). These methods have in common that they produce a single result that has to be created by executing a foreach statement.

Many LINQ operators produce a new IEnumerable<T> sequence. When an element is pulled from the resulting sequence the operator pulls one or more elements from the source sequence. The Select() operator is the most obvious example but other examples are SelectMany(), Where(), Concat(), Union(), Distinct(), Skip() and Take(). These operators don't do any caching. When then N'th element is pulled from a Select() it pulls the N´th element from the source sequence, applies the projection using the action supplied and returns it. Nothing secret going on here.

Other LINQ operators also produce new IEnumerable<T> sequences but they are implemented by actually pulling the entire source sequence, doing their job and then producing a new sequence. These methods include Reverse(), OrderBy() and GroupBy(). However, the pull done by the operator is only performed when the operator itself is pulled meaning that you still need a foreach loop "at the end" of the LINQ statement before anything is executed. You could argue that these operators use a cache because they immediately pull the entire source sequence. However, this cache is built each time the operator is iterated so it is really an implementation detail and not something that will magically detect that you are applying the same OrderBy() operation multiple times to the same sequence.

In your example the ToList() will do a pull. The action in the outer Select will execute 100 times. Each time this action is executed the Aggregate() will do another pull that will parse the XML attributes. In total your code will call Int32.Parse() 200 times.

You can improve this by pulling the attributes once instead of on each iteration:

var X = XElement.Parse (@"
    <ROOT>
        <MUL v='2' />
        <MUL v='3' />
    </ROOT>
")
.Elements ()
.Select (t => Int32.Parse (t.Attribute ("v").Value))
.ToList ();
Enumerable.Range (1, 100) 
    .Select (s => x.Aggregate (s, (t, u) => t * u)) 
    .ToList () 
    .ForEach (s => Console.WriteLine (s));

Now Int32.Parse() is only called 2 times. However, the cost is that a list of attribute values have to be allocated, stored and eventually garbage collected. (Not a big concern when the list contains two elements.)

Note that if you forget the first ToList() that pulls the attributes the code will still run but with the exact same performance characteristics as the original code. No space is used to store the attributes but they are parsed on each iteration.

Ionogen answered 25/4, 2012 at 9:23 Comment(0)

M

2

It has been a while since I dug through this code but, IIRC, the way Select works is to simply cache the Func you supply it and run it on the source collection one at a time. So, for each element in the outer range, it will run the inner Select/Aggregate sequence as if it were the first time. There isn't any built-in caching going on -- you would have to implement that yourself in the expressions.

If you wanted to figure this out yourself, you've got three basic options:

Compile the code and use ildasm to view the IL; it's the most accurate but, especially with lambdas and closures, what you get from IL may look nothing like what you put into the C# compiler.
Use something like dotPeek to decompile System.Linq.dll into C#; again, what you get out of these kinds of tools may only approximately resemble the original source code, but at least it will be C# (and dotPeek in particular does a pretty good job, and is free.)
My personal preference - download the .NET 4.0 Reference Source and look for yourself; this is what it's for :) You have to just trust MS that the reference source matches the actual source used to produce the binaries, but I don't see any good reason to doubt them.
As pointed out by @AllonGuralnek you can set breakpoints on specific lambda expressions within a single line; put your cursor somewhere inside the body of the lambda and press F9 and it will breakpoint just the lambda. (If you do it wrong, it will highlight the entire line in the breakpoint color; if you do it right, it will just highlight the lambda.)

Might answered 25/4, 2012 at 2:43 Comment(5)

4. Put your cursor after the => and press F9. That'll put a breakpoint inside the lambda and break when it reaches it. Repeat for each lambda and you get a nice trace of what is called when. – Sabayon 25/4, 2012 at 9:23

@AllonGuralnek that's a good point, I tend to forget about breakpointing lambdas because I usually use the mouse to set them :) – Might 25/4, 2012 at 14:25

Thanks a lot for the tip about putting breakpoints by hitting F9. Until now, I had always re-written the code to use a return statement on a different line, and then put a breakpoint there. Your method will save me a lot of time. – Griseofulvin 25/4, 2012 at 17:2

@Shredderroy, Michel: Yep, it's a feature that has a mouse discoverability issue. Michel, as for your notes about positioning your cursor, VS isn't actually that picky. You can place the cursor anywhere between the => and the end of the lambda to get the right position (basically anywhere in the lambda body). In fact, since a lambda is compiled to just a method, you can set a breakpoint at any one of its statements by placing the cursor on one them and hitting F9, though typically lambdas only have a single statement or expression. – Sabayon 26/4, 2012 at 18:24

Yes, you're right; the problem I was having was if the cursor was at the ^ here: =>^ Int32.Parse then it's "outside" the lambda, but if it's here: => ^Int32.Parse then it's "inside". but I got it eventually :) – Might 26/4, 2012 at 21:36

Recommended topics

Hot tags