Do asynchronous context managers need to protect their cleanup code from cancellation?
Asked Answered
L

2

15

The problem (I think)

The contextlib.asynccontextmanager documentation gives this example:

@asynccontextmanager
async def get_connection():
    conn = await acquire_db_connection()
    try:
        yield conn
    finally:
        await release_db_connection(conn)

It looks to me like this can leak resources. If this code's task is cancelled while this code is on its await release_db_connection(conn) line, the release could be interrupted. The asyncio.CancelledError will propagate up from somewhere within the finally block, preventing subsequent cleanup code from running.

So, in practical terms, if you're implementing a web server that handles requests with a timeout, a timeout firing at the exact wrong time could cause a database connection to leak.

Full runnable example

import asyncio
from contextlib import asynccontextmanager

async def acquire_db_connection():
    await asyncio.sleep(1)
    print("Acquired database connection.")
    return "<fake connection object>"

async def release_db_connection(conn):
    await asyncio.sleep(1)
    print("Released database connection.")

@asynccontextmanager
async def get_connection():
    conn = await acquire_db_connection()
    try:
        yield conn
    finally:
        await release_db_connection(conn)

async def do_stuff_with_connection():
    async with get_connection() as conn:
        await asyncio.sleep(1)
        print("Did stuff with connection.")

async def main():
    task = asyncio.create_task(do_stuff_with_connection())

    # Cancel the task just as the context manager running
    # inside of it is executing its cleanup code.
    await asyncio.sleep(2.5)
    task.cancel()
    try:
        await task
    except asyncio.CancelledError:
        pass

    print("Done.")

asyncio.run(main())

Output on Python 3.7.9:

Acquired database connection.
Did stuff with connection.
Done.

Note that Released database connection is never printed.

My questions

  • This is a problem, right? Intuitively to me, I expect .cancel() to mean "cancel gracefully, cleaning up any resources used along the way." (Otherwise, why would they have implemented cancellation as exception propagation?) But I could be wrong. Maybe, for example, .cancel() is meant to be fast instead of graceful. Is there an authoritative source that clarifies what .cancel() is supposed to do here?
  • If this is indeed a problem, how do I fix it?
Lamanna answered 2/8, 2021 at 15:32 Comment(1)
It seems that the very fact that you can catch the CancelledError implies that you can still do the cleanup yourself -- but you'd have to cache the equivalent of the conn state somewhere accessible to other parts of the code.Synsepalous
P
2

Focusing on protecting the cleanup from cancellation is a red herring. There is a multitude of things that can go wrong and the context manager has no way to know

  • which errors can occur, and
  • which errors must be protected against.

It is the responsibility of the resource handling utilities to properly handle errors.

  • If release_db_connection must not be cancelled, it must protect itself against cancellation.
  • If acquire/release must be run as a pair, it must be a single async with context manager. Further protection, e.g. against cancellation, may be involved internally as well.
async def release_db_connection(conn):
    """
    Cancellation safe variant of `release_db_connection`

    Internally protects against cancellation by delaying it until cleanup.
    """
    # cleanup is run in separate task so that it
    # cannot be cancelled from the outside.
    shielded_release = asyncio.create_task(asyncio.sleep(1))
    # Wait for cleanup completion – unlike `asyncio.shield`,
    # delay any cancellation until we are done.
    try:
        await shielded_release
    except asyncio.CancelledError:
        await shielded_release
        # propagate cancellation when we are done
        raise
    finally:
        print("Released database connection.")

Note: Asynchronous cleanup is tricky. For example, a simple asyncio.shield is not sufficient if the event loop does not wait for shielded tasks. Avoid inventing your own protection and rely on the underlying frameworks to do the right thing.


The cancellation of a task is a graceful shutdown that a) still allows async operations and b) may be delayed/suppressed. Coroutines being prepared to handle the CancelledError for cleanup is explicitly allowed.

Task.cancel

The coroutine then has a chance to clean up or even deny the request by suppressing the exception with a try … … except CancelledError … finally block. […] Task.cancel() does not guarantee that the Task will be cancelled, although suppressing cancellation completely is not common and is actively discouraged.

A forceful shutdown is coroutine.close/GeneratorExit. This corresponds to an immediate, synchronous shutdown and forbids suspension via await, async for or async with.

coroutine.close

[…] it raises GeneratorExit at the suspension point, causing the coroutine to immediately clean itself up.

Peasecod answered 3/8, 2021 at 11:42 Comment(5)
Could you clarify what you mean by "it's the responsibility of the resource handling utilities" and "avoid inventing your own protection"? If you're in the position of writing a context manager to manage a resource's lifetime, you are writing the resource handling utilities.Lamanna
@Lamanna The contextlib example is about using existing resource handlers and giving them a nicer coat of paint. In that case, you should trust that the handler is robust unless it is explicitly documented not to be – you simply have no way to know which recovery (cancellation, retries, timeouts, ...) is needed otherwise. That also ties into not trying to add protection unless you know how – the naive case of asyncio.shield leaks on shutdown, the means shown in the answer can suppress cancellations if the task raises exceptions – because they depend on the resource handler's details.Peasecod
@Lamanna Now if we are talking about creating your own, safe resource handler – that's only slightly related to contextlib.contextmanager (though it is definitely advisable). Most importantly, it is about a) what kind of resource you are handling – e.g. if it can be cleaned up synchronously as well – and what kind of event loop you are using – e.g. asyncio has a lot of sharp edges. Either way, if the individual acquire/release actions are not safe, an outer context manager has a hard time making them safe – at least not in a manner that would better fit into the acquire/release.Peasecod
This approach feels wrong to me: whether or not a coroutine " must not be cancelled" depends on the situation. For example, a file deletion coroutine could be called as a context manager's cleanup (which should not be cancelled) or as part of the main operation of a program (which should be cancellable). Wouldn't your approach force all coroutines that could possibly be used in a cleanup to use this try: await x; except CancelledError: await x construct, which makes them unusable in contexts where they should be cancellable?Sprain
@Sprain There are operations that literally "must not be cancelled" once they have started. These are generally multi-step cleanup procedures, where partial cleanup may leave a system in undefined state (for example, finishing the compression of a journal). If your cleanup sometimes is fine to cancel, then its not one that "must not be cancelled" in itself and does not need such protection. If there is calling code that requires it not to be cancelled (e.g. delete multiple files at once, to avoid partial construction/cleanup to look the same) then the calling code must shield its it.Peasecod
I
1

You can protect the task with asyncio.shield to guarantee graceful shutdown of the context manager, I did changes only in main():

async def main():
    task = asyncio.create_task(do_stuff_with_connection())
    # shield context manager from cancellation
    sh_task = asyncio.shield(task)
    # Cancel the task just as the context manager running
    # inside of it is executing its cleanup code.
    await asyncio.sleep(2.5)
    sh_task.cancel()  # cancel shielded task
    try:
        await sh_task
    except asyncio.CancelledError:
        pass

    await asyncio.sleep(5)  # wait till shielded task is done

    print("Done.")
Impassioned answered 3/8, 2021 at 11:18 Comment(6)
But this still doesn't call release_db_connection(conn), does it?Synsepalous
@AndrewJaffe I got the following output when I ran the code Acquired database connection. Did stuff with connection. Released database connection. Done. So it looks like it runs release_db_connectionImpassioned
Oops, I left off the final await asyncio.sleep(5) which meant that the code ended before it could run release_db_connection (but which is an interesting failure mode, too!).Synsepalous
Yes, you have to make loop busy with some work, otherwise it does not wait for tasksImpassioned
In place of await asyncio.sleep(5), you could use await task (not await sh_task) to make it more deterministic, right? That would also ensure that exceptions from the task propagate properly.Lamanna
@Lamanna I wanted to demonstrate that despite task was cancelled, It still could be gracefully stopped, while app (e.g. some server) can continue working with other tasks. Since this code snippet is not real server, which listens for some connections all the time, I used asyncio.sleep to simulate some server work.Impassioned

© 2022 - 2024 — McMap. All rights reserved.