Celery Redis instance filling up despite queue looking empty
Asked Answered
F

2

7

We have a Django app that needs to fetch lots of data using Celery. There are 20 or so celery workers running every few minutes. We're running on Google Kubernetes Engine with a Redis queue using Cloud memorystore.

The Redis instance we're using for celery is filling up, even when the queue is empty according to Flower. This results in the Redis DB eventually being full and Celery throwing errors.

In Flower I see tasks coming in and out, and I have increased workers to the point where the queue is always empty now.

If I run redis-cli --bigkeys I see:


# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest set    found so far '_kombu.binding.my-queue-name-queue' with 1 members
[00.00%] Biggest list   found so far 'default' with 611 items
[00.00%] Biggest list   found so far 'my-other-queue-name-queue' with 44705 items
[00.00%] Biggest set    found so far '_kombu.binding.celery.pidbox' with 19 members
[00.00%] Biggest list   found so far 'my-queue-name-queue' with 727179 items
[00.00%] Biggest set    found so far '_kombu.binding.celeryev' with 22 members

-------- summary -------

Sampled 12 keys in the keyspace!
Total key length in bytes is 271 (avg len 22.58)

Biggest   list found 'my-queue-name-queue' has 727179 items
Biggest    set found '_kombu.binding.celeryev' has 22 members

4 lists with 816144 items (33.33% of keys, avg size 204036.00)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
0 strings with 0 bytes (00.00% of keys, avg size 0.00)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
8 sets with 47 members (66.67% of keys, avg size 5.88)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

If I inspect the queue using LRANGE I see lots of objects like this:

"{\"body\": \"W1syNDQ0NF0sIHsicmVmZXJlbmNlX3RpbWUiOiBudWxsLCAibGF0ZXN0X3RpbWUiOiBudWxsLCAicm9sbGluZyI6IGZhbHNlLCAidGltZWZyYW1lIjogIjFkIiwgIl9udW1fcmV0cmllcyI6IDF9LCB7ImNhbGxiYWNrcyI6IG51bGwsICJlcnJiYWNrcyI6IG51bGwsICJjaGFpbiI6IG51bGwsICJjaG9yZCI6IG51bGx9XQ==\", \"content-encoding\": \"utf-8\", \"content-type\": \"application/json\", \"headers\": {\"lang\": \"py\", \"task\": \"MyDataCollectorClass\", \"id\": \"646910fc-f9db-48c3-b5a9-13febbc00bde\", \"shadow\": null, \"eta\": \"2019-08-20T02:31:05.113875+00:00\", \"expires\": null, \"group\": null, \"retries\": 0, \"timelimit\": [null, null], \"root_id\": \"beeff557-66be-451d-9c0c-dc622ca94493\", \"parent_id\": \"374d8e3e-92b5-423e-be58-e043999a1722\", \"argsrepr\": \"(24444,)\", \"kwargsrepr\": \"{'reference_time': None, 'latest_time': None, 'rolling': False, 'timeframe': '1d', '_num_retries': 1}\", \"origin\": \"gen1@celery-my-queue-name-worker-6595bd8fd8-8vgzq\"}, \"properties\": {\"correlation_id\": \"646910fc-f9db-48c3-b5a9-13febbc00bde\", \"reply_to\": \"e55a31ed-cbba-3d79-9ffc-c19a29e77aac\", \"delivery_mode\": 2, \"delivery_info\": {\"exchange\": \"\", \"routing_key\": \"my-queue-name-queue\"}, \"priority\": 0, \"body_encoding\": \"base64\", \"delivery_tag\": \"a83074a5-8787-49e3-bb7d-a0e69ba7f599\"}}"

We're using django-celery-results to store results, so these shouldn't be going in there, and we're using a separate Redis instance for Django's cache.

If I clear Redis with a FLUSHALL it slowly fills up again.

I'm kind of stumped at where to go next. I don't know Redis well - maybe I can do something to inspect the data to see what's filling this? Maybe it's Flower not reporting properly? Maybe Celery keeps completed tasks for a bit despite us using the Django DB for results?

Thanks loads for any help.

Frodin answered 20/8, 2019 at 10:46 Comment(1)
wondering if you ever found a solution to this?Ut
F
1

It sounds like Redis is not set up to delete completed items or report & delete failed items--i.e. it may be putting the tasks on the list, but it's not taking them off.

Check out pypi packages: rq, django-rq, django-rq-scheduler

You can read here a little bit about how this should work: https://python-rq.org/docs/

Fichtean answered 8/9, 2019 at 13:18 Comment(0)
S
0

This seems to be a known (or intentional) issue with Celery, with various solutions/workarounds proposed: https://github.com/celery/celery/issues/436

Sindhi answered 2/7, 2021 at 14:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.