Storing images and thumbnails on s3 in django
Asked Answered
C

3

25

I'm trying to get my images thumbnailed and stored on s3 using django-storages, boto, and sorl-thumbnail. I have it working, but it's very slow, even with small images. I don't mind it being slow when I save the form and upload the images to s3, but I'd like it to display the image quickly after that.

The answer to this SO question explains that the thumbnail won't be created until first access, but that you can use get_thumbnail() to create it beforehand.

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

I'm doing that, and now it seems that all entries into the thumbnail_kvstore table are created when uploading the image, rather than when it is displayed.

The problem is that the page displaying the image is still really slow. Looking at the logging panel in the debug toolbar, it looks like there is still lots of communication with s3. It seems like after the image and thumbnails are uploaded and cached, page should render quickly without communicating with s3.

What am I doing wrong? Thanks!

Update: weak hack seems to have gotten it working, but I'd love to know how to do this properly:

https://github.com/asciitaxi/sorl-thumbnail/commit/545cce3f5e719a91dd9cc21d78bb973b2211bbbf

Update: more information for @sorl

I'm working with 2 views:

ADD VIEW: In this view I submit the form to create the model with the image in it. The image is uploaded to s3. In a post_save signal, I call get_thumbnail() to generate the thumbnail before it's needed:

im = get_thumbnail(instance.image, '360x360')

DISPLAY VIEW: In this view I display the thumbnail generated in the add view:

    {% thumbnail object.image "360x360" as im %}
    <img src="{{ im.url }}" width="{{ im.width }}" height="{{ im.height }}">
    {% endthumbnail %}

Without the patch:

ADD VIEW: creates 3 entries in the kvstore table, accesses the cache 10 times (6 sets, 4 gets), logging tab of debug toolbar says "establishing HTTP connection" 12 times

DISPLAY VIEW: still just 3 entries in the kvstore table, just 1 get from cache, but debug toolbar says "establishing HTTP connection" 3 times still

With only the change on line 122:

ADD VIEW: same as above, except the logging only says "establishing HTTP connection" 2 times DISPLAY VIEW: same as above, except the logging only says "establishing HTTP connection" 1 time

Also adding the change on line 118:

ADD VIEW: same as above, but now we are down to 2 "establishing HTTP connection" messages DISPLAY VIEW: same as above, with no logging messages at all

UPDATE: It looks like storage._setup() is called twice, and storage.url() is called once. Based on the timing, I'd say each one makes connections to s3:

1304711315.4
_setup
1304711317.84
1304711317.84
_setup
1304711320.3
1304711320.39
_url
1304711323.66

This seems to be reflected by the boto logging, which says "establishing HTTP connection" 3 times.

Carl answered 4/5, 2011 at 1:24 Comment(3)
I have the same problem, please keep me updatedPassed
Any update on this? Also, what are you using as your S3_UPLOAD_URL in that patch?Lent
I know it's quite old but I am experiencing the same slowness, just curious if there is an update for this?Juxtapose
G
7

As the author of sorl thumbnail I am really interested in solving this if it is not working as I intended. If the key value sotre is populated it will currently store: name, storage and size. I have made the assumption that the url is based on the name and thus should not cause any storage calls. Looking at django storages, https://github.com/e-loue/django-storages/blob/master/storages/backends/s3boto.py#L214 it seems like a safe assumption to make. In your patch you have patched the read method for some reason. When creating a thumbnail a ImageFile instance is fetched from cache (if not create it) then you can of course call read which will read the file, but the intended use is .url which calls url on the storage with the cached name which inturn should be a non storage access op. Could you try to isolate your problem to exacly where in your code this storage access happends?

Also make sure you have THUMBNAIL_DEBUG on and that you have the key value store properly set up.

Gunpaper answered 5/5, 2011 at 19:47 Comment(1)
Thanks for your answer and the great piece of software. There seemed to be 2 series of storage calls, each taking 2 seconds or so (my dev server is far from the s3 data center). Each of those lines I added addresses one of those series of calls. I'll look into this a bit more closely and comment again.Carl
L
2

I'm not sure if you problem is the same as mine, but I found that accessing the width or height property of a normal Django ImageField would read the file from the storage backend, load it into PIL, and return the dimensions from there. This is especially costly with a remote backend like we're using, and we have very media-heavy pages.

https://code.djangoproject.com/ticket/8307 was opened to address this but the Django devs closed as wontfix because they want the width and height properties to always return the true values. So I just monkeypatch _get_image_dimensions() to use those fields, which does prevent a large number of the boto messages and improves my page-load times.

Below is my code modified from the patch attached to that ticket. I stuck this in a place which gets executed early, such as a models.py.

from django.core.files.images import ImageFile, get_image_dimensions
def _get_image_dimensions(self):
    from numbers import Number
    if not hasattr(self, '_dimensions_cache'):
        close = self.closed
        if self.field.width_field and self.field.height_field:
            width = getattr(self.instance, self.field.width_field)
            height = getattr(self.instance, self.field.height_field)
            #check if the fields have proper values
            if isinstance(width, Number) and isinstance(height, Number):
                self._dimensions_cache = (width, height)
            else:
                self.open()
                self._dimensions_cache = get_image_dimensions(self, close=close)
        else:
            self.open()
            self._dimensions_cache = get_image_dimensions(self, close=close)

    return self._dimensions_cache
ImageFile._get_image_dimensions = _get_image_dimensions
Lent answered 9/9, 2011 at 14:50 Comment(2)
Hi, what is get_image_dimensions (without the leading underscore)?Kuvasz
That is a function in django.core.files.images. github.com/django/django/blob/…Lent
S
0

After looking at the @shadfc django ticket, I reimplemented the monkeypatch as follows:

from django.core.files.images import ImageFile
def _get_image_dimensions(self):
    if not hasattr(self, '_dimensions_cache'):
        if getattr(self.storage, 'IGNORE_IMAGE_DIMENSIONS', False):
            self._dimensions_cache = (0, 0)
        else:
            close = self.closed
            self.open()
            self._dimensions_cache = get_image_dimensions(self, close=close)
    return self._dimensions_cache
ImageFile._get_image_dimensions = _get_image_dimensions

To use it, just add a IGNORE_IMAGE_DIMENSIONS = True to your storage class and it will not be touched to get image dimensions. Likely:

from storages.backends.s3boto import S3BotoStorage
S3BotoStorage.IGNORE_IMAGE_DIMENSIONS = True

I still need to investigate where the numbers are used, to know if simple returning (0, 0) can lead to any problem, but no bug raised for now.

Somatic answered 13/1, 2012 at 21:23 Comment(1)
Btw: this problem strikes easy-thumbnails too.Somatic

© 2022 - 2024 — McMap. All rights reserved.