A call to CancellationTokenSource.Cancel never returns
Asked Answered
G

3

18

I have a situation where a call to CancellationTokenSource.Cancel never returns. Instead, after Cancel is called (and before it returns) the execution continues with the cancellation code of the code that is being cancelled. If the code that is cancelled does not subsequently invoke any awaitable code then the caller that originally called Cancel never gets control back. This is very strange. I would expect Cancel to simply record the cancellation request and return immediately independent on the cancellation itself. The fact that the thread where Cancel is being called ends up executing code that belongs to the operation that is being cancelled and it does so before returning to the caller of Cancel looks like a bug in the framework.

Here is how this goes:

  1. There is a piece of code, let’s call it “the worker code” that is waiting on some async code. To make things simple let’s say this code is awaiting on a Task.Delay:

    try
    {
        await Task.Delay(5000, cancellationToken);
        // … 
    }
    catch (OperationCanceledException)
    {
        // ….
    }
    

Just before “the worker code” invokes Task.Delay it is executing on thread T1. The continuation (that is the line following the “await” or the block inside the catch) will be executed later on either T1 or maybe on some other thread depending on a series of factors.

  1. There is another piece of code, let’s call it “the client code” that decides to cancel the Task.Delay. This code calls cancellationToken.Cancel. The call to Cancel is made on thread T2.

I would expect thread T2 to continue by returning to the caller of Cancel. I also expect to see the content of catch (OperationCanceledException) executed very soon on thread T1 or on some thread other than T2.

What happens next is surprising. I see that on thread T2, after Cancel is called, the execution continues immediately with the block inside catch (OperationCanceledException). And that happens while the Cancel is still on the callstack. It is as if the call to Cancel is hijacked by the code that it is being cancelled. Here's a screenshot of Visual Studio showing this call stack:

Call stack

More context

Here is some more context about what the actual code does: There is a “worker code” that accumulates requests. Requests are being submitted by some “client code”. Every few seconds “the worker code” processes these requests. The requests that are processed are eliminated from the queue. Once in a while however, “the client code” decides that it reached a point where it wants requests to be processed immediately. To communicate this to “the worker code” it calls a method Jolt that “the worker code” provides. The method Jolt that is being called by “the client code” implements this feature by cancelling a Task.Delay that is executed by the worker’s code main loop. The worker’s code has its Task.Delay cancelled and proceeds to process the requests that were already queued.

The actual code was stripped down to its simplest form and the code is available on GitHub.

Environment

The issue can be reproduced in console apps, background agents for Universal Apps for Windows and background agents for Universal Apps for Windows Phone 8.1.

The issue cannot be reproduced in Universal apps for Windows where the code works as I would expect and the call to Cancel returns immediately.

Gardas answered 18/7, 2015 at 20:44 Comment(1)
The issue cannot be reproduced in Universal apps - because in this case there's a synchronization context on the thread where you call await Task.Delay(...), so the continuation triggered by CancellationTokenSource.Cancel is asynchronously posted to that context. Hence, there's no deadlock.Eldwun
G
19

CancellationTokenSource.Cancel doesn't simply set the IsCancellationRequested flag.

The CancallationToken class has a Register method, which lets you register callbacks that will be called on cancellation. And these callbacks are called by CancellationTokenSource.Cancel.

Let's take a look at the source code:

public void Cancel()
{
    Cancel(false);
}

public void Cancel(bool throwOnFirstException)
{
    ThrowIfDisposed();
    NotifyCancellation(throwOnFirstException);            
}

Here's the NotifyCancellation method:

private void NotifyCancellation(bool throwOnFirstException)
{
    // fast-path test to check if Notify has been called previously
    if (IsCancellationRequested)
        return;

    // If we're the first to signal cancellation, do the main extra work.
    if (Interlocked.CompareExchange(ref m_state, NOTIFYING, NOT_CANCELED) == NOT_CANCELED)
    {
        // Dispose of the timer, if any
        Timer timer = m_timer;
        if(timer != null) timer.Dispose();

        //record the threadID being used for running the callbacks.
        ThreadIDExecutingCallbacks = Thread.CurrentThread.ManagedThreadId;

        //If the kernel event is null at this point, it will be set during lazy construction.
        if (m_kernelEvent != null)
            m_kernelEvent.Set(); // update the MRE value.

        // - late enlisters to the Canceled event will have their callbacks called immediately in the Register() methods.
        // - Callbacks are not called inside a lock.
        // - After transition, no more delegates will be added to the 
        // - list of handlers, and hence it can be consumed and cleared at leisure by ExecuteCallbackHandlers.
        ExecuteCallbackHandlers(throwOnFirstException);
        Contract.Assert(IsCancellationCompleted, "Expected cancellation to have finished");
    }
}

Ok, now the catch is that ExecuteCallbackHandlers can execute the callbacks either on the target context, or in the current context. I'll let you take a look at the ExecuteCallbackHandlers method source code as it's a bit too long to include here. But the interesting part is:

if (m_executingCallback.TargetSyncContext != null)
{

    m_executingCallback.TargetSyncContext.Send(CancellationCallbackCoreWork_OnSyncContext, args);
    // CancellationCallbackCoreWork_OnSyncContext may have altered ThreadIDExecutingCallbacks, so reset it. 
    ThreadIDExecutingCallbacks = Thread.CurrentThread.ManagedThreadId;
}
else
{
    CancellationCallbackCoreWork(args);
}

I guess now you're starting to understand where I'm going to look next... Task.Delay of course. Let's look at its source code:

// Register our cancellation token, if necessary.
if (cancellationToken.CanBeCanceled)
{
    promise.Registration = cancellationToken.InternalRegisterWithoutEC(state => ((DelayPromise)state).Complete(), promise);
}

Hmmm... what's that InternalRegisterWithoutEC method?

internal CancellationTokenRegistration InternalRegisterWithoutEC(Action<object> callback, Object state)
{
    return Register(
        callback,
        state,
        false, // useSyncContext=false
        false  // useExecutionContext=false
     );
}

Argh. useSyncContext=false - this explains the behavior you're seeing as the TargetSyncContext property used in ExecuteCallbackHandlers will be false. As the synchronization context is not used, the cancellation is executed on CancellationTokenSource.Cancel's call context.

Gustave answered 18/7, 2015 at 23:7 Comment(1)
Good explanation but what's the way around the SyncContext issue?Toh
A
11

This is the expected behavior of CancellationToken/Source.

Somewhat similar to how TaskCompletionSource works, CancellationToken registrations are executed synchronously using the calling thread. You can see that in CancellationTokenSource.ExecuteCallbackHandlers that gets called when you cancel.

It's much more efficient to use that same thread than to schedule all these continuations on the ThreadPool. Usually this behavior isn't a problem, but it can be if you call CancellationTokenSource.Cancel inside a lock as the thread is "hijacked" while the lock is still taken. You can solve such issues by using Task.Run. You can even make it an extension method:

public static void CancelWithBackgroundContinuations(this CancellationTokenSource)
{
    Task.Run(() => CancellationTokenSource.Cancel());
    cancellationTokenSource.Token.WaitHandle.WaitOne(); // make sure to only continue when the cancellation completed (without waiting for all the callbacks)
}
Avenge answered 18/7, 2015 at 22:55 Comment(7)
Oh god, not another reentrancy problem in the TPL. Bad choices. I'm glad somebody else stepped on this mine before I did.Gabble
Thank you i3arnon. Your answer explains what is going on here. BTW, I don't think I can simply remove the lock. The lock was there to make sure that GetCurrentCancellationToken does not get an obsolete cancellation token at a point in time when a more recent one is in already in effect. However, I can apply your suggestion about using Task.Run. And I don't have to wait until the cancellation completed.Gardas
Did you mean requested rather than completed here: // make sure to only continue when the cancellation completed? - Otherwise, any cancellation callbacks possibly registered via Token.Register may be called after Token.WaitHandle has been signaled. Another potential issue with using Task.Run like this is that any exceptions thrown by those callbacks will be lost. I'd rather use QueueUserWorkItem. This might not be the case with @Ladi's logic, but generally I think it'd be more appropriate to do it where the Token is observed, with something like this.Eldwun
@Gardas This wait only waits for the actual cancellation, not all the callbacks. This should be extremely quick. You don't necessarily need to wait, but it's safer.Avenge
@Noseratio no. I meant completed, not counting the callbacks. Just the cancellation itself.Avenge
I don't understand how is that possible to call CancellationTokenSource.Cancel() from a different thread. The documentation clearly states Only the requesting object can issue the cancellation request. Is the documentation wrong?Scenery
@JérômeMEVEL it means the you can only cancel the token if you're holding the CancelltionTokenSource. If you're only holding the CancellationToken you can't cancel the operation, only be notified when a cancellation is requested. The requesting object here is the source.Avenge
C
3

Because of the reasons already listed here, I believe you want to actually utilize the CancellationTokenSource.CancelAfter method with a zero millisecond delay. This will allow the cancellation to propagate in a different context.

The source code for CancelAfter is here.

Internally it uses a TimerQueueTimer to make the cancel request. This is not documented but should resolve op's issue.

Documentation here.

Cavalierly answered 4/10, 2020 at 17:58 Comment(2)
Where is it documented that using the CancelAfter with zero delay will invoke the callbacks on a different context?Lemniscate
Looking at the source code, it utilizes a TimerQueueTimer to execute after a specified time. CancelAfter does not block.Cavalierly

© 2022 - 2024 — McMap. All rights reserved.