I want to execute a query over a stream of data while processing items in parallel with a certain degree of parallelism. Normally, I'd use PLINQ for that, but my work items are not CPU bound but IO bound. I want to use async IO. PLINQ does not support async work.
What's the smartest way of running a PLINQ-style query, but with async work items?
Here's a more detailed illustration of the problem:
My goal is to process a potentially infinite stream of "items" in a way that is logically described by the following query:
var items = new int[10]; //simulate data
var results =
from x in items.AsParallel().WithDegreeOfParallelism(100)
where Predicate(x)
select ComputeSomeValue(x);
foreach (var result in results)
PerformSomeAction(result);
This query is just a sketch of the real query. Now I want each of the placeholder functions to be asynchronous (returning a Task
and internally being based on async IO).
Note, that there might be far more items than can be stored in memory. I also must control the degree of parallelism to max out the underlying network and disk hardware.
This question is not about multi-core. It fully applies to machines with only one CPU core because the IO can still benefit from parallelism. Think of slow web-service calls and the like.
Predicate
,ComputeSomeValue
,PerformSomeAction
, they should be async and returnTask<X>
, right? – Hubbub