asyncio, wrapping a normal function as asynchronous
Asked Answered
D

1

19

Is a function like:

async def f(x):
    time.sleep(x)

await f(5)

properly asynchronous/non-blocking?

Is the sleep function provided by asyncio any different?

and finally, is aiorequests a viable asynchronous replacement for requests?

(to my mind it basically wraps main components as asynchronous)

https://github.com/pohmelie/aiorequests/blob/master/aiorequests.py

Destitution answered 3/8, 2019 at 8:6 Comment(0)
C
32

The provided function is not a correctly written async function because it invokes a blocking call, which is forbidden in asyncio. (A quick hint that there's something wrong with the "coroutine" is that it doesn't contain a single await.) The reason that it is forbidden is that a blocking call such as sleep() will pause the current thread without giving other coroutines a chance to run. In other words, instead of pausing the current coroutine, it will pause the whole event loop, i.e. all coroutines.

In asyncio (and other async frameworks) blocking primitives like time.sleep() are replaced with awaitables like asyncio.sleep(), which suspend the awaiter and resume it when the time is right. Other coroutines and the event loop are not only unaffected by suspension of a coroutine, but that's precisely when they get the chance to run. Suspension and resumption of coroutines is the core of async-await cooperative multitasking.

Asyncio supports running legacy blocking functions in a separate thread, so that they don't block the event loop. This is achieved by calling run_in_executor which will hand off the execution to a thread pool (executor in the parlance of Python's concurrent.futures module) and return an asyncio awaitable:

async def f(x):
    loop = asyncio.get_event_loop()
    # start time.sleep(x) in a separate thread, suspend
    # the current coroutine, and resume when it's done
    await loop.run_in_executor(None, time.sleep, x)

This is the technique used by aiorequests to wrap request's blocking functions. Native asyncio functions like asyncio.sleep() do not use this approach; they directly tell the event loop to suspend them and how to wake them up (source).

run_in_executor is useful and effective for quick wrapping of legacy blocking code, and not much else. It is always inferior to a native async implementation, for several reasons:

  • It doesn't implement cancellation. Unlike threads, asyncio tasks are fully cancelable, but this doesn't extend to run_in_executor, which shares the limitations of threads.

  • It doesn't provide light-weight tasks which may number in tens of thousands and run in parallel. run_in_executor uses a thread pool under the hood, so if you await more functions than the maximum number of workers, some functions will have to wait their turn to even start working. The alternative, to increase the number of workers, will swamp the OS with too many threads. Asyncio allows the number of parallel operations to match what you'd have in a hand-written state machine using poll to listen for events.

  • It is likely incompatible with more complex APIs, such as those that expose user-provided callbacks, iterators, or that provide their own thread-based async functionality.

It is recommended to avoid crutches like aiorequests and dive directly into aiohttp. The API is very similar to that of requests, and it is almost as pleasant to use.

Catalepsy answered 3/8, 2019 at 12:24 Comment(17)
I wasn’t even hoping to get such an exhaustive answer, thank you very much.Donegan
The only thing I still don’t get is why run_in_executor uses threads under the hood? Isn’t it completely violating the idea of asyncio in Python, as because of the GIL, async seems to be an alternative approach to threading, (I would even call it a replacement)? What about context switching overhead and locking? I thought that using asyncio is especially beneficial in Python because it lowers the unnecessary threading overhead in anyway single threaded environment.Donegan
@KubaChrabański run_in_executor uses threads because threads are the only way to get sync functions to cooperate with an async code base. But native asyncio code doesn't use run_in_executor. Libraries such as aiohttp are built using async functions (also known as coroutines) which are designed from the ground up to suspend themselves instead of blocking, and then you get the benefits. That's why run_in_executor a "crutch" and should be avoided.Catalepsy
Ok got it now, “...which are designed from the ground up” explains everything, thank youDonegan
Let’s say asyncio.sleep() does not exist, and I want to implement it. I would need to use Python C API, am I right?Donegan
@KubaChrabański No, asyncio.sleep() is written in Python, but it uses the primitives provided by the event loop (also written in Python). The way it works is that it allows the current task to suspend, but only after instructing the event loop to resume it after the specified delay elapses. Either way, it's definitely not some kind of simple wrapper over time.time().Catalepsy
@KubaChrabański If you wish to understand how the whole thing works, I warmly recommend this lecture by David Beazley where he implements a small but fully functional event loop in front of a live audience. The code uses the older yield from syntax, but don't let that put you off, await is just a tiny syntactic sugar over it and works in exactly the same way under the hood.Catalepsy
Whats the difference if I did this instead async def async_time(): time.sleep(1) async def time(): await async_time() then time()Nesto
@Renan Can you elaborate - did what?Catalepsy
Nvm I was trying to be cheekyNesto
Really awesome answer! The asyncio tasks on the same event loop are naturally thread-safe because the event loop is indeed running on a single thread. But with run_in_executor(), do we need to do anything explicitly for thread safety? @CatalepsyDharna
@Dharna Not sure what kind of thread safety concerns you? run_in_executor is primarily for running blocking code without breaking everything else, and it uses threads as a (necessary) implementation detail. It doesn't care whether the function it invokes creates additional threads or using them internally, it only cares about it completing and returning a value. But perhaps I misunderstand the question.Catalepsy
@Catalepsy Suppose there are 2 legacy sync functions which do some "blocking" IO and then access a same global variable. Originally they ran in a sync mode and the global variable will never be accessed concurrently. But after I wrap them in run_in_executor(), they will be run on different threads and the global variable can be accessed concurrently. That may cause some issue I think. This is different from the native asyncio, where all the tasks belonging to the same event loop is essentially running on the same thread.Dharna
“...which are designed from the ground up” -- seems all existing io libs need to be re-worked if we want to leverage the asyncio paradigm... That's exactly what I wondered before. @CatalepsyDharna
@Dharna Re threads, I now see what you're saying. Yes, you should consider functions run by run_in_executor() in different tasks the same as running them from different threads in blocking code. Re "from the ground up", yes, but that has largely already happened. Asyncio is no longer a new thing, and most modern network libs are async-aware.Catalepsy
@Catalepsy Thanks. Just double confirm about threading. You mean run_in_executor() is also guaranteed to be thread-safe?Dharna
@Dharna I've noticed the final question only now. So run_in_executor itself doesn't need to be thread-safe because it is always run from the event loop thread. The function passed to run_in_executor has to be thread-safe by definition, since it will be run outside the main thread, but its level of thread safety will depend on what it actually does. (For example, it might never need to do any explicit locking if it, say, just opens a file and reads from it.)Catalepsy

© 2022 - 2024 — McMap. All rights reserved.