How to tell cloudfront to not cache 302 responses from S3 redirects, or, how else to workaround this image caching generation issue
Asked Answered
P

3

18

I'm using Imagine via the LIIPImagineBundle for Symfony2 to create cached versions of images stored in S3.

Cached images are stored in an S3 web enabled bucket served by CloudFront. However, the default LIIPImagineBundle implementation of S3 is far too slow for me (checking if the file exists on S3 then creating a URL either to the cached file or to the resolve functionality), so I've worked out my own workflow:

  1. Pass client the cloudfront URL where the cached image should exist
  2. Client requests the image via the cloudfront URL, if it does not exist then the S3 bucket has a redirect rule which 302 redirects the user to an imagine webserver path which generates the cached version of the file and saves it to the appropriate location on S3
  3. The webserve 301 redirects the user back to the cloudfront URL where the image is now stored and the client is served the image.

This is working fine as long as I don't use cloudfront. The problem appears to be that cloudfront is caching the 302 redirect response (even though the http spec states that they shouldn't). Thus, if I use cloudfront, the client is sent in an endless redirect loop back and forth from webserver to cloudfront, and every subsequent request to the file still redirects to the webserver even after the file has been generated.

If I use S3 directly instead of cloudfront there are no issues and this solution is solid.

According to Amazon's documentation S3 redirect rules don't allow me to specify custom headers (to set cache-control headers or the like), and I don't believe that CloudFront allows me to control the caching of redirects (if they do it's well hidden). CloudFront's invalidation options are so limited that I don't think they will work (can only invalidate 3 objects at any time)...I could pass an argument back to cloudfront on the first redirect (from the Imagine webserver) to fix the endless redirect (eg image.jpg?1), but subsequent requests to the same object will still 302 to the webserver then 301 back to cloudfront even though it exists. I feel like there should be an elegant solution to this problem but it's eluding me. Any help would be appreciated!!

Piero answered 12/3, 2015 at 15:42 Comment(2)
ha i am on the same through train as youFluorosis
Years later ... did you ever find an elegant solution for this? The solution below (set the default TTL to 0 and have your service set cache metadata when writing the images to S3) is still the best I've seen, but feels clunky :-/Shoup
A
20

I'm solving this same issue by setting the "Default TTL" in CloudFront "Cache Behavior" settings to 0, but still allowing my resized images to be cached by setting the CacheControl MetaData on the S3 file with max-age=12313213.

This way redirects will not be cached (default TTL behavior) but my resized images will be (CacheControl max-age on s3 cache hit).

Afterlife answered 23/12, 2016 at 0:28 Comment(2)
I got the CloudFront default TTL to 0, but I'm stuck on the S3 stuff. Is this something that you have to do when you upload the image to the bucket? I'm having trouble figuring out how I would set that metadata.Boudreaux
@TurnerHoughton You need to set the "Cache-control" meta tag for each S3 object to, eg. "max-age: 31536000" (or whatever number of seconds you want it cached for). You can set this meta tag in the S3 UI or at the time of uploading to S3.Ezar
M
1

If you really need to use CloudFront here, the only thing I can think of is that you don’t directly subject the user to the 302, 301 dance. Could you introduce some sort of proxy script / page to front S3 and that whole process? (or does that then defeat the point).

So a cache miss would look like this:

  • Visitor requests proxy page through Cloudfront.
  • Proxy page requests image from S3
  • Proxy page receives 302 from S3, follows this to Imagine web server
  • Ideally just return the image from here (while letting it update S3), or follow 301 back to S3
  • Proxy page returns image to visitor
  • Image is cached by Cloudfront
Mongoloid answered 13/3, 2015 at 20:44 Comment(1)
Interesting solution thank you for sharing! Too bad this is the only way to do it through cloudfront, seems like I'm better off serving directly from S3 than using this but thanks again for the input.Piero
B
1

TL;DR: Make use of Lambda@Edge

We face the same problem using LiipImagineBundle.


For development, an NGINX serves the content from the local filesystem and resolves images that are not yet stored using a simple proxy_pass:

location ~ ^/files/cache/media/ {
    try_files $uri @public_cache_fallback;
}

location @public_cache_fallback {
    rewrite ^/files/cache/media/(.*)$ media/image-filter/$1 break;
    proxy_set_header X-Original-Host $http_host;
    proxy_set_header X-Original-Scheme $scheme;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_pass http://0.0.0.0:80/$uri;
}

As soon as you want to integrate CloudFront things get more complicated due to caching. While you can easily add S3 (static website, see below) as a distribution, CloudFront itself will not follow the resulting redirects but return them to the client. In the default configuration CloudFront will then cache this redirect and NOT the desired image (see https://mcmap.net/q/678807/-how-to-tell-cloudfront-to-not-cache-302-responses-from-s3-redirects-or-how-else-to-workaround-this-image-caching-generation-issue for a workaround with S3).

The best way would be to use a proxy as described here. However, this adds another layer which might be undesirable. Another solution is to use Lambda@Edge functions as (see here). In our case, we use S3 as a normal distribution and make use of the "Origin Response"-Event (you can edit them in the "Behaviors" tab of your distribution). Our Lambda function just checks if the request to S3 was successful. If it was, we can just forward it. If it was not, we assume that the desired object was not yet created. The lambda function then calls our application that generates the object and stores it in S3. For simplicity, the application replies with a redirect (to CloudFront again), too - so we can just forward that to the client. A drawback is that the client itself will see one redirect. Also make sure to set the cache headers so that CloudFront does not cache the lambda redirect.

enter image description here

Here is an example Lambda Function. This one just redirects the client to the resolve url (which then redirects to CloudFront again). Keep in mind that this will result in more round trips for the client (which is not perfect). However, it will reduce the execution time of your Lambda function. Make sure to add the Base Lambda@Edge policy (related tutorial).

env = {
    'Protocol': 'http',
    'HostName': 'localhost:8000',
    'HttpErrorCodeReturnedEquals': '404',
    'HttpRedirectCode': '307',
    'KeyPrefixEquals': '/cache/media/',
    'ReplaceKeyPrefixWith': '/media/resolve-image-filter/'
}


def lambda_handler(event, context):
    response = event['Records'][0]['cf']['response']
    
    if int(response['status']) == int(env['HttpErrorCodeReturnedEquals']):
        request = event['Records'][0]['cf']['request']
        original_path = request['uri']
        
        if original_path.startswith(env['KeyPrefixEquals']):
            new_path = env['ReplaceKeyPrefixWith'] + original_path[len(env['KeyPrefixEquals']):]
        else:
            new_path = original_path
            
        location = '{}://{}{}'.format(env['Protocol'], env['HostName'], new_path)
        
        response['status'] = env['HttpRedirectCode']
        response['statusDescription'] = 'Resolve Image'
        response['headers']['location'] = [{
                 'key': 'Location',
                 'value': location
             }]
        response['headers']['cache-control'] = [{
                 'key': 'Cache-Control',
                 'value': 'no-cache'    # Also make sure that you minimum TTL is set to 0 (for the distribution)
             }]
        
    return response

If you just want to use S3 as a cache (without CloudFront). Using static website hosting and a redirect rule will redirect clients to the resolve url in case of missing cache files (you will need to rewrite S3 Cache Resolver urls to the website version though):

 <RoutingRules>
  <RoutingRule>
    <Condition><HttpErrorCodeReturnedEquals>403</HttpErrorCodeReturnedEquals>
      <KeyPrefixEquals>cache/media/</KeyPrefixEquals>
    </Condition>
    <Redirect>
      <Protocol>http</Protocol>
      <HostName>localhost</HostName>
      <ReplaceKeyPrefixWith>media/image-filter/</ReplaceKeyPrefixWith>
      <HttpRedirectCode>307</HttpRedirectCode>
    </Redirect>
  </RoutingRule>
</RoutingRules>
Bohrer answered 4/10, 2020 at 14:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.