Should using cache between Python coroutines be blocking?
Asked Answered
F

0

2

I needed to use cache in a web application built in Python. Since I couldn’t directly use lru_cache in a coroutine, I built a simple decorator that would allow me to use it:

from asyncio import Lock, sleep


async def acquire_lock_async(lock: Lock) -> None:
    await lock.acquire()
    return
    

class AsyncCacheable:
    def __init__(self, coro_func: Awaitable) -> Any:
        self.coro_func = coro_func
        self.done = False
        self.result = None
        self.lock = Lock()

    def __await__(self):
        """
        A class that wraps a coroutine function and caches its result.
        Safe to be used in an asynchronous context.
        """
        while True:
            if self.done:
                return self.result
            if not self.lock.locked():
                try:
                    yield from acquire_lock_async(self.lock).__await__()
                    self.result = yield from self.coro_func().__await__()
                    self.done = True
                finally:
                    self.lock.release()
                return self.result
            else:
                yield from sleep(0.05)


def async_cacheable(coro_func: Awaitable) -> Awaitable:
    def wrapper(*args, **kwargs):
        return AsyncCacheable(lambda: coro_func(*args, **kwargs))
    return wrapper

@lru_cache(maxsize=8)
    @async_cacheable
    async def get_company_id(self, simulation_id: int):
        simulation_in_db = await self.get_by_id(_id=simulation_id)
        if not simulation_in_db:
            raise ValueError("Simulation not found")
        company_id = simulation_in_db["company_id"]
        return company_id

I tested it and it works fine. But now I have doubts about whether I have been on the right path. Does what I’ve done with the Lock make sense to make it safe between coroutines? It should be like that?

Thanks!

EDIT:

In these cases, it may be more convenient/simpler to use lru_cache and store Future instances. It would be something like this:

async def get_company_id(self, simulation_id: int):
    simulation_task = self.task_get_company_id(simulation_id)
    simulation_in_db = await simulation_task
    if not simulation_in_db:
        raise ValueError("Simulation not found")
    company_id = simulation_in_db["company_id"]
    return company_id

@lru_cache(maxsize=8)
def task_get_company_id(self, simulation_id: int):
    return create_task(self.get_by_id(_id=simulation_id))
Foushee answered 11/2 at 19:50 Comment(4)
What exactly should the lock prevent from happening? Is it not safe to run the same coroutine multiple times concurrently? Or do you think it is only inefficient because of the cache? In the latter case, the lock should be acquired before self.done is checked. Otherwise it can happen that one instance of a coroutine function runs while a second waits, then the first ends and cache is stored but the second runs nevertheless because the cache was checked beforehand and wasn't present then.Casavant
For this specific case, I think it would not be a problem for two coroutines to run concurrently since I am only reading things from the database (although possibly in other cases it may be). So at the moment it is just an issue of inefficiency. Your observation is correct, I should do the lock.acquired() before checking self.done.Foushee
However, it may be more convenient/simpler to use lru_cache and store Future instances. (As I show in the question below EDIT). Could you give me your opinion on this approach? @MichaelButscherFoushee
The EDIT variant is much simpler and looks much easier to understand to me.Casavant

© 2022 - 2024 — McMap. All rights reserved.