We have a Django app that needs to fetch lots of data using Celery. There are 20 or so celery workers running every few minutes. We're running on Google Kubernetes Engine with a Redis queue using Cloud memorystore.
The Redis instance we're using for celery is filling up, even when the queue is empty according to Flower. This results in the Redis DB eventually being full and Celery throwing errors.
In Flower I see tasks coming in and out, and I have increased workers to the point where the queue is always empty now.
If I run redis-cli --bigkeys
I see:
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type. You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).
[00.00%] Biggest set found so far '_kombu.binding.my-queue-name-queue' with 1 members
[00.00%] Biggest list found so far 'default' with 611 items
[00.00%] Biggest list found so far 'my-other-queue-name-queue' with 44705 items
[00.00%] Biggest set found so far '_kombu.binding.celery.pidbox' with 19 members
[00.00%] Biggest list found so far 'my-queue-name-queue' with 727179 items
[00.00%] Biggest set found so far '_kombu.binding.celeryev' with 22 members
-------- summary -------
Sampled 12 keys in the keyspace!
Total key length in bytes is 271 (avg len 22.58)
Biggest list found 'my-queue-name-queue' has 727179 items
Biggest set found '_kombu.binding.celeryev' has 22 members
4 lists with 816144 items (33.33% of keys, avg size 204036.00)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
0 strings with 0 bytes (00.00% of keys, avg size 0.00)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
8 sets with 47 members (66.67% of keys, avg size 5.88)
0 zsets with 0 members (00.00% of keys, avg size 0.00)
If I inspect the queue using LRANGE I see lots of objects like this:
"{\"body\": \"W1syNDQ0NF0sIHsicmVmZXJlbmNlX3RpbWUiOiBudWxsLCAibGF0ZXN0X3RpbWUiOiBudWxsLCAicm9sbGluZyI6IGZhbHNlLCAidGltZWZyYW1lIjogIjFkIiwgIl9udW1fcmV0cmllcyI6IDF9LCB7ImNhbGxiYWNrcyI6IG51bGwsICJlcnJiYWNrcyI6IG51bGwsICJjaGFpbiI6IG51bGwsICJjaG9yZCI6IG51bGx9XQ==\", \"content-encoding\": \"utf-8\", \"content-type\": \"application/json\", \"headers\": {\"lang\": \"py\", \"task\": \"MyDataCollectorClass\", \"id\": \"646910fc-f9db-48c3-b5a9-13febbc00bde\", \"shadow\": null, \"eta\": \"2019-08-20T02:31:05.113875+00:00\", \"expires\": null, \"group\": null, \"retries\": 0, \"timelimit\": [null, null], \"root_id\": \"beeff557-66be-451d-9c0c-dc622ca94493\", \"parent_id\": \"374d8e3e-92b5-423e-be58-e043999a1722\", \"argsrepr\": \"(24444,)\", \"kwargsrepr\": \"{'reference_time': None, 'latest_time': None, 'rolling': False, 'timeframe': '1d', '_num_retries': 1}\", \"origin\": \"gen1@celery-my-queue-name-worker-6595bd8fd8-8vgzq\"}, \"properties\": {\"correlation_id\": \"646910fc-f9db-48c3-b5a9-13febbc00bde\", \"reply_to\": \"e55a31ed-cbba-3d79-9ffc-c19a29e77aac\", \"delivery_mode\": 2, \"delivery_info\": {\"exchange\": \"\", \"routing_key\": \"my-queue-name-queue\"}, \"priority\": 0, \"body_encoding\": \"base64\", \"delivery_tag\": \"a83074a5-8787-49e3-bb7d-a0e69ba7f599\"}}"
We're using django-celery-results to store results, so these shouldn't be going in there, and we're using a separate Redis instance for Django's cache.
If I clear Redis with a FLUSHALL
it slowly fills up again.
I'm kind of stumped at where to go next. I don't know Redis well - maybe I can do something to inspect the data to see what's filling this? Maybe it's Flower not reporting properly? Maybe Celery keeps completed tasks for a bit despite us using the Django DB for results?
Thanks loads for any help.