TPL Dataflow: design for parallelism while keeping order
Asked Answered
V

3

5

I have never worked with TPL before so I was wondering whether this can be done with it: My application creates a gif image animation file from a lot of frames. I start with a list of Bitmap which represents the frames of the gif file and need to do the following for each frame:

  1. paint a number of text/bitmaps onto the frame
  2. crop the frame
  3. resize the frame
  4. reduce the image to 256 colors

Obviously this process can be done in parallel for all the frames in the list but for each frame the order of steps needs to be the same. After that, I need to write all the frames to the gif file. Therefore all the frames need to be received in the same order they were in in the original list. On top of that, this process can start when the first frame is ready for it, there is no need to wait until all frames are processed.

So that's the situation. Is TPL Dataflow suitable for this? If yes, can anyone give me a hint in the right direction on how to design the tpl block structure to reflect the process explained above? It seems quite complex to me compared to some samples I've found.

Viburnum answered 3/2, 2014 at 0:35 Comment(0)
M
5

I think it makes sense to use TPL Dataflow for this, especially since it automatically keeps the processed elements in the right order, even with parallelism turned on.

You could create a separate block for each step in the process, but I think there is no need for that here, one block for processing the frames and one for writing them will be enough:

public Task CreateAnimationFileAsync(IEnumerable<Bitmap> frames)
{
    var frameProcessor = new TransformBlock<Bitmap, Bitmap>(
        frame => ProcessFrame(frame),
        new ExecutionDataflowBlockOptions
        { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });

    var animationWriter = new ActionBlock<Bitmap>(frame => WriteFrame(frame));

    frameProcessor.LinkTo(
        animationWriter,
        new DataflowLinkOptions { PropagateCompletion = true });

    foreach (var frame in frames)
    {
        frameProcessor.Post(frame);
    }

    frameProcessor.Complete();

    return animationWriter.Completion;
}

private Bitmap ProcessFrame(Bitmap frame)
{
    …
}

private async Task WriteFrame(Bitmap frame)
{
    …
}
Metaplasia answered 17/2, 2014 at 10:54 Comment(4)
The benefit to using multiple blocks is that you can easily change them to process it in a different way.Arbitrage
@MattCarkci Yeah, that could be useful in some cases. But here, I think changing ProcessFrame() is as simple as changing the blocks and it results in simpler code.Metaplasia
You should return the animationWriter.ompletion and not the transformblock.Fenwick
The MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded is a bit scary. It means that the parallelism will be throttled essentially by the starved ThreadPool. I would prefer MaxDegreeOfParallelism = Environment.ProcessorCount. Or maybe MaxDegreeOfParallelism = Environment.ProcessorCount * 2 to balance the chunky workload, at the cost of some overhead caused by the thread-switching. In the later case I would also increase the minimum number of threads spawned immediately by the thread-pool: ThreadPool.SetMinThreads(Environment.ProcessorCount * 2, 10)Hanna
A
3

Your problem is a perfect example of where dataflow excels.

Here is the simplest code that can get you started.

// Try increasing MaxDegreeOfParallelism
var opt = new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 2 };

// Create the blocks
// You must define the functions to do what you want
var paintBlock = new TransformBlock<Bitmap, Bitmap>(fnPaintText, opt);
var cropBlock = new TransformBlock<Bitmap, Bitmap>(fnCrop, opt);
var resizeBlock = new TransformBlock<Bitmap, Bitmap>(fnResize, opt);
var reduceBlock = new TransformBlock<Bitmap, Bitmap>(fnReduce,opt);

// Link the blocks together
paintBlock.LinkTo(cropBlock);
cropBlock.LinkTo(resizeBlock);
resizeBlock.LinkTo(reduceBlock);

// Send data to the first block
// ListOfImages contains your original frames
foreach (var img in ListOfImages) { 
   paintBlock.Post(img);
}

// Receive the modified images
var outputImages = new List<Bitmap>();
for (int i = 0; i < ListOfImages.Count; i++) {
   outputImages.Add(reduceBlock.Receive());
}

// outputImages now holds all of the frames
// reassemble them in order
Arbitrage answered 3/2, 2014 at 8:14 Comment(1)
This is wrong. Even with MaxDegreeOfParallelism set, Dataflow always keeps messages in the right order automatically, there is no need for the Wrapper.Metaplasia
H
2

I think you will find that DataFlow is the right way to go. For every frame, from your frame list, try to create one TransformBlock. For each of the four steps, chain together the frames in the correct order. If you want to process the framelist concurrently, you might use a bufferblock for the framelist.

please find the full sample on how to use transformblock on msdn:

Hampden answered 3/2, 2014 at 3:8 Comment(1)
I think that using one block per frame doesn't make much sense. One block per step in the process might.Metaplasia

© 2022 - 2024 — McMap. All rights reserved.