Where to call .AsParallel() in a LINQ query
Asked Answered
G

2

22

The question

In a LINQ query I can correctly (as in: the compiler won't complain) call .AsParallel() like this:

(from l in list.AsParallel() where <some_clause> select l).ToList();

or like this:

(from l in list where <some_clause> select l).AsParallel().ToList();

what exactly is the difference?

What I've tried

Judging from the official documentation I've almost always seen the first method used so I thought that was the way to go.
Today tho, I've tried to run some benchmark myself and the result was surprising. Here's the code I've run:

var list = new List<int>();
var rand = new Random();
for (int i = 0; i < 100000; i++)
    list.Add(rand.Next());

var treshold= 1497234;

var sw = new Stopwatch();

sw.Restart();
var result = (from l in list.AsParallel() where l > treshold select l).ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() before: {sw.ElapsedMilliseconds}");

sw.Restart();
result = (from l in list where l > treshold select l).AsParallel().ToList();
sw.Stop();

Console.WriteLine($"call .AsParallel() after: {sw.ElapsedMilliseconds}");

Output

call .AsParallel() before: 49
call .AsParallel() after: 4

So, apparently, despite what the documentation says, the second method is much faster. What's exactly happening here?

Gout answered 23/10, 2016 at 16:54 Comment(4)
your machine is single core or multi core?Stereochemistry
@viveknuna multi-coreGout
It will give you different results every timeStereochemistry
@viveknuna yep, but the difference is always the sameGout
C
23

The trick to using AsParallel in general is to decide if the savings from parallelism outweigh the overhead of doing things in parallel.

When conditions are easy to evaluate, such as yours, the overhead of making multiple parallel streams and collecting their results at the end greatly outweigh the benefit of performing comparisons in parallel.

When conditions are computationally intense, making AsParallel call early speeds things up quite a bit, because the overhead is now small in comparison to the benefit of running multiple Where computations in parallel.

For an example of a computationally hard condition, consider a method that decides whether a number is prime or not. Doing this in parallel on a multi-core CPU will show significant improvement over the non-parallelised implementation.

Casas answered 23/10, 2016 at 17:4 Comment(2)
thank you for your answer but I still don't quite understand the difference between the two invocation. Do you mean that when I call .AsParallel() in the second way (at the end of the query) I'm not actually parallelizing anything?Gout
@Mahatma Yes, by then the work is already done in sequential mode. All LINQ needs is to collect results from parallel streams to a single list.Casas
T
10

The second using of AsParallel is not necessary, it does not affect some_clause.

See also the test code below:

[TestMethod]
public void Test()
{
    var items = Enumerable.Range(0, 10);
    int sleepMs;
    for (int i = 0; i <= 4; i++)
    {
        sleepMs = i * 25;
        var elapsed1 = CalcDurationOfCalculation(() => items.AsParallel().Select(SomeClause).ToArray());
        var elapsed2 = CalcDurationOfCalculation(() => items.Select(SomeClause).AsParallel().ToArray());

        Trace.WriteLine($"{sleepMs}: T1={elapsed1} T2={elapsed2}");
    }

    long CalcDurationOfCalculation(Action calculation)
    {
        var watch = new Stopwatch();
        watch.Start();
        calculation();
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    int SomeClause(int value)
    {
        Thread.Sleep(sleepMs);
        return value * 2;
    }
}

and the output:

0: T1=77 T2=11
25: T1=103 T2=272
50: T1=202 T2=509
75: T1=303 T2=758
100: T1=419 T2=1010
Tamis answered 23/10, 2018 at 17:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.