Compress Python objects before saving to cache
Asked Answered
S

3

6

What's the fast method to compress Python objects (list, dictionary, string, etc) before saving them to cache and decompress after read from cache?

I'm using Django and I hope to add compress/decompress support directly in Django's cache backend which makes it available to all my Django apps.

I looked into django/core/cache/backends/memcached.py

import cmemcache as memcache

class CacheClass(BaseCache):

    def __init__(self, server, params):
        BaseCache.__init__(self, params)
        self._cache = memcache.Client(server.split(';'))

    def get(self, key, default=None):
        val = self._cache.get(smart_str(key))
        if val is None:
            return default
        return val

    def set(self, key, value, timeout=0):
        self._cache.set(smart_str(key), value, self._get_memcache_timeout(timeout))

Looks like pickle/unpickle is done by cmemcache library. I dont know where to put the compress/decompress code.

Skepticism answered 18/8, 2010 at 7:35 Comment(0)
S
5

I looked further into python-memcache's source code.

It already supported compressing values by zlib before sending them to memcached.

lv = len(val)
# We should try to compress if min_compress_len > 0 and we could
# import zlib and this string is longer than our min threshold.
if min_compress_len and _supports_compress and lv > min_compress_len:
    comp_val = compress(val)
    # Only retain the result if the compression result is smaller
    # than the original.
    if len(comp_val) < lv:
        flags |= Client._FLAG_COMPRESSED
        val = comp_val

def _set(self, cmd, key, val, time, min_compress_len = 0):

Here is Django's implemention for the "set" command in its memcache backend:

def set(self, key, value, timeout=0):
    self._cache.set(smart_str(key), value, self._get_memcache_timeout(timeout))

Apparently it does not have "min_compress_len" parameter.

Skepticism answered 19/8, 2010 at 1:28 Comment(0)
E
5

Firstly - are you sure you need it? Are your data structures too big just to fit uncompressed in the cache? There is going to be an overhead for compression/decompression, that may void any gains you've made by caching in the first place.

If you really do need compression, then you probably want to use zlib.

If you are going to use zlib, you might want to experiment with the different compression levels available in the compress method, to balance CPU time vs compression levels:

zlib.compress(string[, level])
Compresses the data in string, returning a string contained compressed data. level is an integer from 1 to 9 controlling the level of compression; 1 is fastest and produces the least compression, 9 is slowest and produces the most. The default value is 6. Raises the error exception if any error occurs.

Endocrine answered 18/8, 2010 at 7:39 Comment(3)
My server is IO bound and RAM bound, not CPU bound. Current memcached allocation uses 1.3GB of RAM. So compressing the data by 50% saves 650MB RAM or make it possible to store twice more items in the cache.Skepticism
thanks, I voted up your answer. But I hope to find a more generic solution which modifies in the cache backend.Skepticism
@Skepticism I wonder if your best bet is a custom cache backend, that wraps around memcaches and compresses around setting and decompresses around retrieval - see docs.djangoproject.com/en/dev/topics/cache/…Endocrine
S
5

I looked further into python-memcache's source code.

It already supported compressing values by zlib before sending them to memcached.

lv = len(val)
# We should try to compress if min_compress_len > 0 and we could
# import zlib and this string is longer than our min threshold.
if min_compress_len and _supports_compress and lv > min_compress_len:
    comp_val = compress(val)
    # Only retain the result if the compression result is smaller
    # than the original.
    if len(comp_val) < lv:
        flags |= Client._FLAG_COMPRESSED
        val = comp_val

def _set(self, cmd, key, val, time, min_compress_len = 0):

Here is Django's implemention for the "set" command in its memcache backend:

def set(self, key, value, timeout=0):
    self._cache.set(smart_str(key), value, self._get_memcache_timeout(timeout))

Apparently it does not have "min_compress_len" parameter.

Skepticism answered 19/8, 2010 at 1:28 Comment(0)
A
1

If you've landed here because you're upgrading django from Django3 to Django5 ( or 4 ) and PyMemcachedCache is storing objects higher then the Memcached default object size limit of 1mb, which was not happening before with MemcachedCache in Django 3, you can add an option for PyMemcachedCache to use compression with OPTIONS key, like in the example below

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
        'LOCATION': '127.0.0.1:11211',
         "OPTIONS": {
            "serde": pymemcache.serde.compressed_serde,
        }
    }
}

of course, remember to import pymemcache in your django settings file. This sufficed for my particular case, but there are ways to increase compression levels but it would involve modifying django's memcached source code.

Source: PyMemcache Documentation

To clarify:

  • Django 3, in-memory python object is 3.5 mb, saves fine in memcache using MemcachedCache
  • Django 5, in-memory python object is 3.5 mb, raises b'Object too large' error when saving in memcache using PyMemcachedCache

for this case, add the option given above to your CACHE setting in django settings file

Armenian answered 1/5, 2024 at 20:34 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.