Pointers on using celery with sorl-thumbnails with remote storages?
Asked Answered
T

3

11

I'm surprised I don't see anything but "use celery" when searching for how to use celery tasks with sorl-thumbnails and S3.

The problem: using remote storages causes massive delays when generating thumbnails (think 100s+ for a page with many thumbnails) while the thumbnail engine downloads originals from remote storage, crunches them, then uploads back to s3.

Where is a good place to set up the celery task within sorl, and what should I call?

Any of your experiences / ideas would be greatly appreciated.

I will start digging around Sorl internals to find a more useful place to delay this task, but there are a few more things I'm curious about if this has been solved before.

  1. What image is returned immediately? Sorl must be told somehow that the image returned is not the real thumbnail. The cache must be invalidated when celery finishes the task.

  2. Handle multiple thumbnail generation requests cleanly (only need the first one for a given cache key)

For now, I've temporarily solved this by using an nginx reverse proxy cache that can serve hits while the backend spends time generating expensive pages (resizing huge PNGs on a huge product grid) but it's a very manual process.

Tomb answered 3/5, 2012 at 2:48 Comment(7)
djangosnippets.org/snippets/1562 might helpAn
@An thanks, but that is 3 years old - sorl already works with remote storages. What I need help with is asynchronously generating remote storage thumbnails...Cheryllches
@YujiTomita Have you had any progress with this? Would be good to hear your discoveries.Floristic
@jamesc, nothing yet! I may add a bounty here..Cheryllches
@Yuji'Tomita'Tomita Any updates with this highly relevant question? +1edEmergent
@JosvicZammit, unfortunately I haven't had the time to review it. I will most likely be going back to hosting files myself (have run into many issues when trying to clean up the filesystem / creating local development environments) and using something like CloudFront for the CDN + 99.99% of hosting needs. That post below by Aidan looks very promising. Somehow, a cached HTML page including dummy images must be invalidated when the celery task completes.Cheryllches
@Yuji'Tomita'Tomita Thank you for your reply. I will try to keep the setup with solr-thumbnail as simple as possible, as I need to use it in conjunction with CloudFiles (Rackspace)... we'll see how it goes. Thanks.Emergent
M
4

I think what you want to do is set THUMBNAIL_BACKEND to a custom class that overrides the _create_thumbnail method. Instead of generating the thumbnail in that function, kick of a celery task that calls _create_thumbnail with the same arguments as given to the function. The thumbnail won't be available during the request, but it will get generated in the background.

Macronucleus answered 14/6, 2012 at 18:16 Comment(0)
D
4

As I understand Sorl works correctly with the S3 storage but it's very slow.

I believe that you know what image sizes do you need.

You should launch the celery task after the image was uploaded. In task you call to sorl.thumbnail.default.backend.get_thumbnail(file, geometry_string, **options)

Sorl will generate a thumbnail and upload it to S3. Next time you request an image from template it's already cached and served directly from Amazon's servers

a clean way to handle a placeholder thumbnail image while the image is being processed.

For this you will need to override the Sorl backend. Add new argument to get_thumbnail function, e.g. generate=False. When you will call this function from celery pass generate=True

And in function change it's logic, so if thumb is not present and generate is True you work just like the standard backend, but if generate is false you return your placeholder image with text like "We process your image now, come back later" and do not call backend._create_thumbnail. You can launch a task in this case, if you think that thumbnail can be accidentally deleted.

I hope this helps

Dragster answered 15/6, 2012 at 9:32 Comment(0)
B
3

You can use Sorlery. It combines sorl and celery to create thumbnails via workers. It's very careful not to do any filesystem access outside of the worker thread.

The thumbnail returned immediately (before the worker has had a chance) can be controlled by setting your THUMBNAIL_DUMMY_SOURCE to an appropriate placeholder.

The job is created the first time the thumbnail is requested, subsequent requests are served the dummy image until the worker thread completes.

Briannebriano answered 25/5, 2013 at 17:8 Comment(2)
long overdue comment: this looks fantastic. Thank you so much! I will check it out as soon as my next project. This has been the major bottleneck with using remote storages.. the engineering challenges added by using remote storages with the django ecosystem vs the operations hurdles solved by having app servers truly standalone.Cheryllches
Although this was an interesting experiment I think you should use something like Cloudinary or Imgix.Briannebriano

© 2022 - 2024 — McMap. All rights reserved.