TPL DataFlow vs BlockingCollection
Asked Answered
P

2

8

I understand that a BlockingCollection is best suited for a consumer/producer pattern. However, when do I use a ActionBlock from the TPL DataFlow library?

My initial understanding is for IO operations, keep the BlockingCollection while CPU intensive operations are bested suited for an ActionBlock. But I feel like this isn't the whole story... Any additional insight?

Plush answered 16/1, 2014 at 13:44 Comment(1)
BlockingCollection is not better for I/O -- it is in fact worse due to not supporting async.Legere
B
11

TPL Dataflow is better suited for an actor based design. That means that if you want to chain producers and consumers it's much easier with TDF.

Another big plus for TPL dataflow is that it was built with async in mind. You can both produce and consume in a synchronous way and in an async way (and both at the same time), which is very useful. (I mostly produce in a synchronous way and consume in a non-blocking async way).

You can also very easily set a bounded capacity and degree of parallelism.

TL;DR: BlockingCollection is a simple and general tool. TPL Dataflow is much more robust, but can be an overkill or a bad fit for specific problems.

Bandolier answered 16/1, 2014 at 15:10 Comment(6)
@i3amon Can you give guidance or examples? How could TPL Dataflow be overkill and a bad fit? Robustness can't hurt even in simple situations. Is there a performance problem? Thanks.Bimetallic
@Bimetallic TPL Dataflow makes you operate in a certain way, BlockingCollection is open to do with as you wish. The problem with BlockingCollection is that it's only synchronous. But nowadays you have the System.Threading.Channels library to avoid that. So it depends on whether you want structured or unstructured concurrency.Bandolier
@Bimetallic I will avoid BlockingCollection almost in all cases now.Bandolier
@i3amon can we use data flow to process an sqs queue with multithreaded approach mostly IO operationsRely
@pankysharma It depends on how you use it.. but sure. SQS queue messages need to be explicitly deleted when handled so you need to pass that context throughout your data flow and delete when done.Bandolier
Thanks, can you please answer my question here #68961503Rely
R
3

Not sure if the repeated use of the word Block is causing confusion here. They are very different things.

You're right, a BlockingCollection is well suited to a producer consumer situation, in that it will block an attempt to read from it until data is available. However, BlockingCollection is not a part of TPL Dataflow. It was introduced in .NET 4.0 as one of the new thread safe collection types.

An ActionBlock, however, is a type of 'Block' defined by TPL Dataflow, and can be used to perform an action. Block, in this sense, more refers to it's use as a part of a data flow.

Data flows, as defined in TPL data flow are made up of blocks, and there are three main types. From the documentation:

The TPL Dataflow Library consists of dataflow blocks, which are data structures that buffer and process data. The TPL defines three kinds of dataflow blocks: source blocks, target blocks, and propagator blocks. A source block acts as a source of data and can be read from. A target block acts as a receiver of data and can be written to. A propagator block acts as both a source block and a target block, and can be read from and written to. The TPL defines the System.Threading.Tasks.Dataflow.ISourceBlock interface to represent sources, System.Threading.Tasks.Dataflow.ITargetBlock to represent targets, and System.Threading.Tasks.Dataflow.IPropagatorBlock to represent propagators. IPropagatorBlock inherits from both ISourceBlock, and TargetBlock. The TPL Dataflow Library provides several predefined dataflow block types that implement the ISourceBlock, ITargetBlock, and IPropagatorBlock interfaces. These dataflow block types are described in this document in the section Predefined Dataflow Block Types.

An ActionBlock is a type of ITargetBlock, which takes an input, performs an action, and then stops.

To answer your first question, I would think that you may use a BlockingCollection when your process is simple. You would use TPL Dataflow when your process is more complicated, and in that case, you probably wouldn't need a BlockingCollection.

There are examples of the Producer-Consumer problem using BlockingCollection here: http://blogs.msdn.com/b/csharpfaq/archive/2010/08/12/blocking-collection-and-the-producer-consumer-problem.aspx?Redirected=true and here: http://programmerfindings.blogspot.co.uk/2012/07/producer-consumer-problem-using-tpl-and.html

Neither of these use Dataflow. There is an example of one using Dataflow here:

http://msdn.microsoft.com/en-us/library/hh228601(v=vs.110).aspx

Plus, I would strongly suggest reading the TPL Dataflow documentation here:

http://msdn.microsoft.com/en-us/library/hh228601(v=vs.110).aspx

if you are implementing anything complex.

Respectability answered 9/2, 2014 at 11:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.