Ordered PLINQ ForAll
Asked Answered
P

6

13

The msdn documentation about order preservation in PLINQ states the following about ForAll().

  • Result when the source sequence is ordered: Executes nondeterministically in parallel
  • Result when the source sequence is unordered: Executes nondeterministically in parallel

Does this mean that ordered execution of the ForAll method is never guaranteed?

I haven't used PLINQ before, but the following Code Review question seemed like an appropriate usage for it. At the bottom of my answer I write:

Events.AsParallel().AsOrdered().ForAll( eventItem =>
{
    ...
} );    

After reading the documentation I believe the AsOrdered() wouldn't change anything?
I'm also suspecting the previous query can't replace a simple for loop where order is important?
Probably parallel calls to the StringBuilder will also occur, resulting in a wrong output?

Permanent answered 18/3, 2011 at 13:16 Comment(1)
What use would parallel linq be if the code happened sequentially?Ratite
F
18

Order preservation is usually only applied to results - i.e. the input can be processed in any order, but is returned in the original order.

As ForAll doesn't return anything, it doesn't really have any effect that I'm aware of.

The only way of making ordering apply to the processing would be to finish item 0 before processing item 1, before processing item 2 etc... at which point you've got no parallelism.

Fossil answered 18/3, 2011 at 13:23 Comment(2)
K, so my suspicions were correct. Thanks for the explanation. I believe parallism should still be an option in the given example when all strings can be parsed separately and then afterwards merged in order. What would be the most suitable construct for this?Permanent
Steven: You could use PLINQ for just the parsing, make it give the results in order, and use a normal foreach loop to do the actual appending.Fossil
B
8

As others have rightly answered, the ForAll method is never guaranteed to execute an action for enumerable elements in any particular order, and will ignore the AsOrdered() method call silently.

For the benefit of readers having a valid reason to execute an action for enumerable elements in a way that stay's as close to the original order (as far as is reasonable in a parallel processing context) the extension methods below might help.

public static void ForAllInApproximateOrder<TSource>(this ParallelQuery<TSource> source, Action<TSource> action) {

    Partitioner.Create( source )
               .AsParallel()
               .AsOrdered()
               .ForAll( e => action( e ) );

}

This can then be used as follows:

orderedElements.AsParallel()
               .ForAllInApproximateOrder( e => DoSomething( e ) );

It should be noted that the above extension method uses PLINQ ForAll and not Parallel.ForEach and so inherits the threading model used interally by PLINQ (which is different to that used by Parallel.ForEach -- less aggressive by default in my experience). A similar extension method using Parallel.ForEach is below.

public static void ForEachInApproximateOrder<TSource>(this ParallelQuery<TSource> source, Action<TSource> action) {

    source = Partitioner.Create( source )
                        .AsParallel()
                        .AsOrdered();

    Parallel.ForEach( source , e => action( e ) );

}

This can then be used as follows:

orderedElements.AsParallel()
               .ForEachInApproximateOrder( e => DoSomething( e ) );

There is no need to chain AsOrdered() to your query when using either of the above extension methods, it gets called internally anyway.

I have found these methods useful in processing elements that have coarse grained significance. It can be useful, for example, to process records starting at the oldest and working towards the newest. In many cases the exact order of records isn't required - so far as older records generally get processed before newer records. Similarly, records having low/med/high priority levels can be processed such that high priority records will be processed before lower priority records for the majority of cases, with the edge cases not far behind.

Bivalent answered 5/1, 2014 at 1:42 Comment(1)
Mr. Fantastic, is it necessary to use the explicit Partitioner.Create(), vs just all inline like Parallel.ForEach(source.AsParallel().AsOrdered(), e => ...)? Thanks!Graziano
A
6

AsOrdered() wouldn't change anything - if you want to enforce order on the result of a parallel query you can simply use foreach() ForAll() is there to take advantage of parallelism, that means executing the side effect on more than one item in the collection at a time. In fact ordering only applies to the results of a query (the order of items in the result collection), but this has nothing to do with ForAll(), since ForAll() does not affect the order at all.

In PLINQ, the goal is to maximize performance while maintaining correctness. A query should run as fast as possible but still produce the correct results. In some cases, correctness requires the order of the source sequence to be preserved

Note that ForAll() is not transforming the collection (it's not i.e projecting to a new collection), it's purely for executing side effects on the results of a PLINQ query.

Antonomasia answered 18/3, 2011 at 13:22 Comment(0)
C
4

Does this mean that ordered execution of the ForAll method is never guaranteed?

Yes - order is not guaranteed.

The parallelisation means that the work is allocated to different threads and their separate outputs are then later combined.

If you need to order the output then don't use PLinq - or add some later step to put the ordering back in.


Also, if you are accessing objects like a StringBuilder from within the plinq execution, then please ensure that those objects are threadsafe - and also be aware that this thread safety may in fact make the plinq slower than the non-parallel linq.

Corker answered 18/3, 2011 at 13:22 Comment(0)
V
2

Now as an extension method:

It will process on multiple cores then will order the results, so there's the overhead of ordering. Here's an answer on benchmarking simple for vs parallel.

 public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
    {
        var unorderedResult = new ConcurrentBag<(long, T1)>();
        Parallel.ForEach(list, (o, state, i) =>
        {
            unorderedResult.Add((i, action.Invoke(o)));
        });
        var ordered = unorderedResult.OrderBy(o => o.Item1);
        return ordered.Select(o => o.Item2);
    }

use like:

var result = Events.OrderedParallel(eventItem => ...);

Hope this will save you some time.

Vedette answered 14/3, 2020 at 10:45 Comment(0)
H
-1

The ForAll runs the action in multiple threads, in parallel. At any given moment multiple actions will be running concurrently, and in these circumstances the notion of "order" is not applicable. To run the actions in order you must run them sequentially, and the simplest way to do it is to run them in a single thread. This can be achieved by just enumerating the result of the query in a standard foreach loop:

var query = Events.AsParallel().AsOrdered();
foreach (var eventItem in query)
{
    // do something with the eventItem
}

If you prefer the fluent ForAll syntax, you can add a static class in your project with the ForEach extension method below:

public static void ForEach<TSource>(this IEnumerable<TSource> source,
    Action<TSource> action)
{
    foreach (TSource item in source)
    {
        action(item);
    }
}

And use it like this:

Events.AsParallel().AsOrdered().ForEach(eventItem =>
{
    // do something with the eventItem
});

It should be noted though that in the given example the use of Parallel LINQ is redundant. The query Events.AsParallel().AsOrdered() performs no transformation to the source enumerable, so no actual computation is taking place. You could remove the .AsParallel().AsOrdered() part and get the same outcome.

Hsiuhsu answered 14/3, 2020 at 12:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.