How do I get lazy loading with PLINQ?
Asked Answered
J

2

5

One of the nice things about linq was having infinite data sources processed lazily on request. I tried parallelizing my queries, and found that lazy loading was not working. For example...

class Program
{
    static void Main(string[] args)
    {
        var source = Generator();
        var next = source.AsParallel().Select(i => ExpensiveCall(i));
        foreach (var i in next)
        {
            System.Console.WriteLine(i);
        }
    }

    public static IEnumerable<int> Generator()
    {
        int i = 0;
        while (true)
        {
            yield return i;
            i++;
        }
    }

    public static int ExpensiveCall(int arg)
    {
        System.Threading.Thread.Sleep(5000);
        return arg*arg;
    }
}

This program fails to produce any results, presumably because at each step, its waiting for all calls to the generator to dry up, which of course is never. If I take out the "AsParallel" call, it works just fine. So how do I get my nice lazy loading while using PLINQ to improve performance of my applications?

Juliettajuliette answered 4/2, 2013 at 4:18 Comment(0)
Y
5

Take a look at MergeOptions

 var next = source.AsParallel()
              .WithMergeOptions(ParallelMergeOptions.NotBuffered)
              .Select(i => ExpensiveCall(i));
Ylem answered 4/2, 2013 at 4:28 Comment(0)
V
3

I think you're confusing two different things. The problem here is not lazy loading (i.e. loading only as much as is necessary), the problem here is output buffering (i.e. not returning results immediately).

In your case, you will get your results eventually, although it might take a while (for me, it requires something like 500 results for it to return the first batch). The buffering is done for performance reasons, but in your case, that doesn't make sense. As Ian correctly pointed out, you should use .WithMergeOptions(ParallelMergeOptions.NotBuffered) to disable output buffering.

But, as far as I know, PLINQ doesn't do lazy loading and there is no way to change that. What that means is that if your consumer (in your case, the foreach loop) is too slow, PLINQ will generate results faster than necessary and it will stop only when you finish iterating the results. This means PLINQ can be wasting CPU time and memory.

Vinnie answered 4/2, 2013 at 12:59 Comment(2)
very good point... the buffering in PLINQ only masks the problems of no lazy loading. Maybe a way to go is have an extension method that batches the next n items, and executes those in parallel, then yields the results back. That could produce pseudo lazy behavior...Juliettajuliette
@Juliettajuliette Yeah, something like that would work. Another option would be to use BlockingCollection with BoundedCapacity set.Vinnie

© 2022 - 2024 — McMap. All rights reserved.