Django Channels Redis: Exception inside application: Lock is not acquired
Asked Answered
E

1

7

Fully loaded multi-tenant Django application with 1000's of WebSockets using Daphne/Channels, running fine for a few months and suddenly tenants all calling it the support line the application running slow or outright hanging. Narrowed it down to WebSockets as HTTP REST API hits came through fast and error free.

None of the application logs or OS logs indicate some issue, so only thing to go on is the exception noted below. It happened over and over again here and there throughout 2 days.

I don't expect any deep debugging help, just some off-the-cuff advice on possibilities.

AWS Linux 1
Python 3.6.4
Elasticache Redis 5.0
channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0
Django==2.2.13

Split configuration HTTP served by uwsgi, daphne serves asgi, Nginx

May 10 08:08:16 prod-b-web1: [pid 15053] [version 119.5.10.5086] [tenant_id -] [domain_name -] [pathname /opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/daphne/server.py] [lineno 288] [priority ERROR] [funcname application_checker] [request_path -] [request_method -] [request_data -] [request_user -] [request_stack -] Exception inside application: Lock is not acquired.
Traceback (most recent call last):
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 435, in receive
    real_channel
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 484, in receive_single
    await self.receive_clean_locks.acquire(channel_key)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 152, in acquire
    return await self.locks[channel].acquire()
  File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 176, in acquire
    yield from fut
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/sessions.py", line 183, in __call__
    return await self.inner(receive, self.send)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/middleware.py", line 41, in coroutine_call
    await inner_instance(receive, send)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/consumer.py", line 59, in __call__
    [receive, self.channel_receive], self.dispatch
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/utils.py", line 58, in await_many_dispatch
    await task
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 447, in receive
    self.receive_lock.release()
  File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 201, in release
    raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
Elizabethelizabethan answered 15/5, 2021 at 22:11 Comment(4)
How are you managing python dependencies?Fukuoka
Are you getting this above error when you interrupt via a key event? Because the above stacktrace is pointing to self.stop() which is only done in a cleanup activity of the application and there was an exception of type KeyboardInterrupt. How often do you see the above stakctrace getting printing?Elliotelliott
Is the traffic pattern still the same? How are you doing on file locks? Is redis holding up, are there any connections to redis that are stuck closing or similar? You can use ss or netstat to check for those.Publish
this looks like a multithreading issue.Chock
C
3

First, lets have a look at the source of the RuntimeError: Lock is not acquired. error. As given by the traceback, the release() method in the file /opt/python3.6/lib/python3.6/asyncio/locks.py is defined like so:

    def release(self):
        """Release a lock.

        When the lock is locked, reset it to unlocked, and return.
        If any other coroutines are blocked waiting for the lock to become
        unlocked, allow exactly one of them to proceed.

        When invoked on an unlocked lock, a RuntimeError is raised.

        There is no return value.
        """
        if self._locked:
            self._locked = False
            self._wake_up_first()
        else:
            raise RuntimeError('Lock is not acquired.')

A primitive lock is a synchronization primitive that is not owned by a particular thread when locked.

When attempting to release an unlocked lock by calling the release() method, the RuntimeError will be raised, as the method should only be called in the locked state. The state changes to unlocked when called in the locked state.

Now for the previous error raised in the acquire() method in the same file, the acquire() method is defined like so:

    async def acquire(self):
        """Acquire a lock.

        This method blocks until the lock is unlocked, then sets it to
        locked and returns True.
        """
        if (not self._locked and (self._waiters is None or
                all(w.cancelled() for w in self._waiters))):
            self._locked = True
            return True

        if self._waiters is None:
            self._waiters = collections.deque()
        fut = self._loop.create_future()
        self._waiters.append(fut)

        # Finally block should be called before the CancelledError
        # handling as we don't want CancelledError to call
        # _wake_up_first() and attempt to wake up itself.
        try:
            try:
                await fut
            finally:
                self._waiters.remove(fut)
        except exceptions.CancelledError:
            if not self._locked:
                self._wake_up_first()
            raise

        self._locked = True
        return True

So in order for the concurrent.futures._base.CancelledError error you're getting to be raised, the await fut must've caused the issue.

To fix it, you can have a look at Awaiting an asyncio.Future raises concurrent.futures._base.CancelledError instead of waiting for a value/exception to be set

Basically, you might have an awaitable in your code that you didn't await, and by not awaiting it, you never handed control back to the event loop or store the awaitable, causing it to be immediately cleaned up, completely cancelling it (and all of the awaitables it controlled).

Simply make sure you await the results of the awaitables in your code, finding any you missed.

Carboy answered 27/6, 2021 at 13:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.