Working on replacing my implementation of a server query tool that uses ThreadPoolExecutors
with all asynchronous calls using asyncio
and aiohttp
. Most of the transition is straight forward since network calls are non-blocking IO, it's the saving of the responses that has me in a conundrum.
All the examples I am using, even the docs for both libraries, use asyncio.gather()
which collects all the awaitable results. In my case, these results can be files in the many GB range, and I don't want to store them in memory.
Whats an appropriate way to solve this? Is it to use asyncio.as_completed()
and then:
for f in as_completed(aws):
earliest_result = await f
# Assumes `loop` defined under `if __name__` block outside coroutine
loop = get_event_loop()
# Run the blocking IO in an exectuor and write to file
_ = await loop.run_in_executor(None, save_result, earliest_result)
Doesn't this introduce a thread (assuming I use a ThreadPoolExecutor
by default) thus making this an asynchronous, multi-threaded program vice an asynchronous, single-threaded program?
Futher, does this ensure only 1 earliest_result
is being written to file at any time? I dont want the call to await loop.run_in_executor(...)
to be running, then another result comes in and I try to run to the same file; I could limit with a semaphore I suppose.