Redis Queue + python-rq: Right pattern to prevent high memory usage?
Asked Answered
T

3

15

we are currently using Redis to Go with our Heroku-hosted Python application.

We use Redis with python-rq purely as a task queue to provide for a delayed execution of some time-intense tasks. A task is retrieving some data from a PostgreSQL database and is writing the results back to it - thus no valuable data is saved at all in the Redis Instance. We notice that, depending on the amount of jobs executed, Redis is consuming more and more memory (growth @ ~10 MB/hour). A FLUSHDB command on the CLI fixes this (takes it down to ~700kB of RAM used) until RAM is full again.

According to our (unchanged standard) settings, a job result is kept for 500 seconds. Over time, some jobs of course fail, and they are moved to the failed queue.

  • What do we have to do differently to get our tasks done with a stable amount of RAM?
  • Where does the RAM consumption come from?
  • Can I turn off persistence at all?
  • From the docs I know that the 500 sec TTL means that a key is then "expired", but not really deleted. Does the key still consume memory at this point? Can I somehow change this behavior?
  • Does it have something to do with the failed queue (which apparently does not have a TTL attached to the jobs, meaning (I think) that these are kept forever)?
  • Just curious: When using RQ purely as a queue, what is saved in the Redis DB? Is it actual executable code or just a reference to where the function to be executed can be found?

Sorry for the pretty noobish questions, but I'm new to the topic of queuing stuff and after researching for 2+ days I've reached a point where I don't know that to do next. Thanks, KH

Tijerina answered 21/1, 2014 at 22:54 Comment(0)
T
18

After two more days of playing around, I have found the problem. I would like to share this with you, along with the tools that were helpful:

Core Problem

The actual problem was that we had overlooked to cast an object to a string before saving it to the PostgreSQL database. Without this cast, the string representation ended up in the DB (due to the __str__() function of the respective object returning exactly the representation we wanted); however, to Redis, the whole object was passed. After passing it to Redis, the associated task crashed with an UnpickleError exception. This consumed 5 MB RAM that were not freed up after the crash.

Additional Actions

To reduce memory footprint further, we implemented the following supplementary actions (mind that we are saving everything to a separate DB so the results that Redis saves are not used at all in our application):

  • We set the TTL of the task result to 0 with the call enqueue_call([...] result_ttl=0)
  • We defined a custom Exception handler - black_hole - to take all exceptions and return False. This prevents Redis from moving a task to the failed queue where it would still use a bit of memory. Exceptions are beforehand sent via e-mail to us to keep track of them.

Useful tools along the way:

We just worked with redis-cli.

  • redis-cli info | grep used_memory_human --> shows current memory usage. ideal to compare memory footprint before and after a task was executed.
  • redis-cli keys '*' --> shows all current keys that exist. This overview led me to the insight that some tasks are not deleted even though they should have been (as written above, they crashed with an UnpickleError and because of this were not removed).
  • redis-cli monitor --> shows a realtime overview of what is happening in Redis. This helped me find out that the objects that were moved back and forth were too massive.
  • redis-cli debug object <key> --> shows a dump of the key's value.
  • redis-cli hgetall <key> --> shows a more readable dump of the key's value (especially useful for the specific use case of using Redis purely as task queue, since it seems that the tasks are created by python-rq in this format.

Furthermore, I can answer some of the questions I had posted above:

From the docs I know that the 500 sec TTL means that a key is then "expired", but not really deleted. Does the key still consume memory at this point? Can I somehow change this behavior?

Actually, they are deleted, just as the docs imply.

Does it have something to do with the failed queue (which apparently does not have a TTL attached to the jobs, meaning (I think) that these are kept forever)?

Surprisingly, the jobs for which Redis itself crashed were not moved to the Failed Queue, they were just "abandoned", meaning the values remained but RQ didn't care about it the normal way it does with failed jobs.

Relevant Documentation

Tijerina answered 24/1, 2014 at 0:19 Comment(0)
K
3

If you are using the "Black Hole" exception handler from http://python-rq.org/docs/exceptions/, you should also add job.cancel() there:

def black_hole(job, *exc_info):
    # Delete the job hash on redis, otherwise it will stay on the queue forever
    job.cancel()
    return False
Kirschner answered 2/12, 2014 at 20:44 Comment(0)
C
2

A thing that wasn't immediately obvious to me is that an RQ job has both 'description' and 'data' properties. If not specified, the description is set as a string representation of the data, which in my case was unnecessarily verbose. Explicitly setting the description to a short summary saved me that overhead.

enqueue(func, longdata, description='short job summary')
Conservancy answered 6/12, 2016 at 9:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.