How to synchronize TPL Tasks, by using Monitor / Mutex / Semaphore? Or should one use something else entirely?
Asked Answered
C

2

5

I'm trying to move some of my old projects from ThreadPool and standalone Thread to TPL Task, because it supports some very handy features, like continuations with Task.ContinueWith (and from C# 5 with async\await), better cancellation, exception capturing, and so on. I'd love to use them in my project. However I already see potential problems, mostly with synchronization.

I've written some code which shows a Producer / Consumer problem, using a classic stand-alone Thread:

class ThreadSynchronizationTest
{
    private int CurrentNumber { get; set; }
    private object Synchro { get; set; }
    private Queue<int> WaitingNumbers { get; set; }

    public void TestSynchronization()
    {
        Synchro = new object();
        WaitingNumbers = new Queue<int>();

        var producerThread = new Thread(RunProducer);
        var consumerThread = new Thread(RunConsumer);

        producerThread.Start();
        consumerThread.Start();

        producerThread.Join();
        consumerThread.Join();
    }

    private int ProduceNumber()
    {
        CurrentNumber++;
        // Long running method. Sleeping as an example
        Thread.Sleep(100);
        return CurrentNumber;
    }

    private void ConsumeNumber(int number)
    {
        Console.WriteLine(number);
        // Long running method. Sleeping as an example
        Thread.Sleep(100);
    }

    private void RunProducer()
    {
        while (true)
        {
            int producedNumber = ProduceNumber();

            lock (Synchro)
            {
                WaitingNumbers.Enqueue(producedNumber);
                // Notify consumer about a new number
                Monitor.Pulse(Synchro);
            }
        }
    }

    private void RunConsumer()
    {
        while (true)
        {
            int numberToConsume;
            lock (Synchro)
            {
                // Ensure we met out wait condition
                while (WaitingNumbers.Count == 0)
                {
                    // Wait for pulse
                    Monitor.Wait(Synchro);
                }
                numberToConsume = WaitingNumbers.Dequeue();
            }
            ConsumeNumber(numberToConsume);
        }
    }
}

In this example, ProduceNumber generates a sequence of increasing integers, while ConsumeNumber writes them to the Console. If producing runs faster, numbers will be queued for consumption later. If consumption runs faster, the consumer will wait until a number is available. All synchronization is done using Monitor and lock (internally also Monitor).

When trying to 'TPL-ify' similar code, I already see a few issues I'm not sure how to go about. If I replace new Thread().Start() with Task.Run():

  1. TPL Task is an abstraction, which does not even guarantee that the code will run on a separate thread. In my example, if the producer control method runs synchronously, the infinite loop will cause the consumer to never even start. According to MSDN, providing a TaskCreationOptions.LongRunning parameter when running the task should hint the TaskScheduler to run the method appropriately, however I didn't find any way to ensure that it does. Supposedly TPL is smart enough to run tasks the way the programmer intended, but that just seems like a bit of magic to me. And I don't like magic in programming.
  2. If I understand how this works correctly, a TPL Task is not guaranteed to resume on the same thread as it started. If it does, in this case it would try to release a lock it doesn't own while the other thread holds the lock forever, resulting in a deadlock. I remember a while ago Eric Lippert writing that it's the reason why await is not allowed in a lock block. Going back to my example, I'm not even sure how to go about solving this issue.

These are the few issues that crossed my mind, although there may be (probably are) more. How should I go about solving them?

Also, this made me think, is using the classical approach of synchronizing via Monitor, Mutex or Semaphore even the right way to do TPL code? Perhaps I'm missing something that I should be using instead?

Chrysalis answered 2/2, 2016 at 0:22 Comment(3)
I believe your assumptions are correct. Because your code is producer-consumer you may want to look into TPL Dataflow instead. By the way, spinning up an explicit Thread rather than using Task still has its place for certain things. Long-running non-IO-bound jobs for example.Olathe
Long-running non-IO-bound jobs is exactly what my project does. Mostly. However I thought since TPL uses ThreadPool internally, one would be able to do everything you can do with ThreadPool, plus the "sugar". Hence why I decided to invest in switching over. And the problem is called producer-consumer, not provider-consumer... doh! Now I feel silly.Chrysalis
lol :) Generally TPL Tasks are for short-lived nuggets of work whilst a Thread is more for the long-running non-I/O stuff. Also, another really great tech for consumers is Microsoft's Reactive Extensions (RX). It can look a bit "what-tha?" at first but it is perhaps elegant. This site has some jolly good tutorials. Check them all out and pick the tech you think is suitable for you.Olathe
K
5

Your question pushes the limits of broadness for Stack Overflow. Moving from plain Thread implementations to something based on Task and other TPL features involves a wide variety of considerations. Taken individually, each concern has almost certainly been addressed in a prior Stack Overflow Q&A, and taken in aggregate there are too many considerations to address competently and comprehensively in a single Stack Overflow Q&A.

So, with that said, let's look just at the specific issues you've asked about here.

  1. TPL Task is an abstraction, which does not even guarantee that the code will run on a separate thread. In my example, if the producer control method runs synchronously, the infinite loop will cause the consumer to never even start. According to MSDN, providing a TaskCreationOptions.LongRunning parameter when running the task should hint the TaskScheduler to run the method appropriately, however I didn't find any way to ensure that it does. Supposedly TPL is smart enough to run tasks the way the programmer intended, but that just seems like a bit of magic to me. And I don't like magic in programming.

It is true that the Task object itself does not guarantee asynchronous behavior. For example, an async method which returns a Task object could contain no asynchronous operations at all, and could run for an extended period of time before returning an already-completed Task object.

On the other hand, Task.Run() is guaranteed to operate asynchronously. It is documented as such:

Queues the specified work to run on the ThreadPool and returns a task or Task<TResult> handle for that work

While the Task object itself abstracts the idea of a "future" or "promise" (to use synonymous terms found in programming), the specific implementation is very much tied to the thread pool. When used correctly, you can be assured of asynchronous operation.

  1. If I understand how this works correctly, a TPL Task is not guaranteed to resume on the same thread as it started. If it does, in this case it would try to release a lock it doesn't own while the other thread holds the lock forever, resulting in a deadlock. I remember a while ago Eric Lippert writing that it's the reason why await is not allowed in a lock block. Going back to my example, I'm not even sure how to go about solving this issue.

Only some synchronization objects are thread-specific. For example, Monitor is. But Semaphore is not. Whether this is useful to you or not depends on what you are trying to implement. For example, you can implement the producer/consumer pattern with a long running thread that uses BlockingCollection<T>, without needing to call any explicit synchronization objects at all. If you did want to use TPL techniques, you could use SemaphoreSlim and its WaitAsync() method.

Of course, you could also use the Dataflow API. For some scenarios this would be preferable. For very simple producer/consumer, it would probably be overkill. :)

Also, this made me think, is using the classical approach of synchronizing via Monitor, Mutex or Semaphore even the right way to do TPL code? Perhaps I'm missing something that I should be using instead?

IMHO, this is the crux of the matter. Moving from Thread-based programming to the TPL is not simply a matter of a straight-forward mapping from one construct to another. In some cases, doing so would be inefficient, and in other cases it simply won't work.

Indeed, I would say a key feature of TPL and especially of async/await is that synchronization of threads is much less necessary. The general idea is to perform operations asynchronously, with minimal interaction between threads. Data flows between threads only at well-defined points (i.e. retrieved from the completed Task objects), reducing or even eliminating the need for explicit synchronization.

It's impossible to suggest specific techniques, as how best to implement something will depend on what exactly the goal is. But the short version is to understand that when using TPL, very often it is simply unnecessary to use synchronization primitives such as what you're used to using with the lower-level API. You should strive to develop enough experience with the TPL idioms that you can recognize which ones apply to which programming problems, so that you apply them directly rather than trying to mentally map your old knowledge.

In a way, this is (I think) analogous to learning a new human language. At first, one spends a lot of time mentally translating literally, possibly remapping to adjust to grammar, idioms, etc. But ideally at some point, one internalizes the language and is able to express oneself in that language directly. Personally, I've never gotten to that point when it comes to human languages, but I understand the concept in theory :). And I can tell you firsthand, it works quite well in the context of programming languages.


By the way, if you are interested in seeing how TPL ideas taken to extremes work out, you might like to read through Joe Duffy's recent blog articles on the topic. Indeed, the most recent version of .NET and associated languages have borrowed heavily from concepts developed in the Midori project he's describing.

Kolva answered 2/2, 2016 at 3:33 Comment(1)
You are right, the bit about other gotchas would make my question too broad. I removed it, leaving the specifics. I loved the analogy with learning new human languages :) Great answer, thank you.Chrysalis
P
3

Tasks in .Net are a hybrid. TPL brought tasks in .Net 4.0, but async-await only came with .Net 4.5.

There's a difference between the original tasks and the truly asynchronous tasks that came with async-await. The first is simply an abstraction of a "unit of work" that runs on some thread, but asynchronous tasks don't need a thread, or run anywhere at all.

The regular tasks (or Delegate Tasks) are queued on some TaskScheduler (usually by Task.Run that uses the ThreadPool) and are executed by the same thread throughout the task's lifetime. There's no problem at all in using a traditional lock here.

The asynchronous tasks (or Promise Tasks) usually don't have code to execute, they just represent an asynchronous operation that will complete in the future. Take Task.Delay(10000) for example. The task is created, and completed after 10 seconds but there's nothing running in the meantime. Here you can still use the traditional lock when appropriate (but not with an await inside the critical section) but you can also lock asynchronously with SemaphoreSlim.WaitAsync (or other async synchronization constructs)

Is using the classical approach of synchronizing via Monitor, Mutex or Semaphore even the right way to do TPL code?

It may be, that depends on what the code actually does and whether it uses TPL (i.e. Tasks) or async-await. However, there are many other tools you can now use like async synchronization constructs (AsyncLock) and async data structures (TPL Dataflow)

Pandora answered 2/2, 2016 at 6:59 Comment(2)
Thanks for your answer, I read the blogs by Stephen Cleary, very useful read. I also read the article about constructing an AsyncLock, and while doing so, found a library that seemingly does exactly that and more. github.com/StephenCleary/AsyncEx It has async implementations of Monitor, Mutex, and so on. If you recommended using AsyncLock, you may find this library useful.Chrysalis
@GediminasMasaitis I'm familiar Stephen and his works. You might also want to check out Microsoft.VisualStudio.ThreadingPandora

© 2022 - 2024 — McMap. All rights reserved.