Set cache-control for entire S3 bucket automatically (using bucket policies?)
Asked Answered
G

11

170

I need to set cache-control headers for an entire s3 bucket, both existing and future files and was hoping to do it in a bucket policy. I know I can edit the existing ones and I know how to specify them on put if I upload them myself but unfortunately the app that uploads them cannot set the headers as it uses s3fs to copy the files there.

Guyenne answered 3/5, 2012 at 16:15 Comment(1)
For anyone looking at this question in the context of setting cache-control: max-age on a CloudFront distribution: see this answer on another thread; another solution, for anyone pushing assets via a deploy script with aws s3: set a value via --cache-control max-age=.Fancher
N
267

There are now 3 ways to get this done: via the AWS Console, via the command line, or via the s3cmd command line tool.


AWS Console Instructions

This is now the recommended solution. It is straight forward, but it can take some time.

  • Log in to AWS Management Console
  • Go into S3 bucket
  • Select all files by route
  • Choose "More" from the menu
  • Select "Change metadata"
  • In the "Key" field, select "Cache-Control" from the drop down menu max-age=604800 Enter (7 days) for Value
  • Press "Save" button

(thanks to @biplob - please give him some love below)


AWS Command Line Solution

Originally, when I created this bucket policies were a no go, so I figured how to do it using aws-cli, and it is pretty slick. When researching I couldn't find any examples in the wild, so I thought I would post some of my solutions to help those in need.

NOTE: By default, aws-cli only copies a file's current metadata, EVEN IF YOU SPECIFY NEW METADATA.

To use the metadata that is specified on the command line, you need to add the '--metadata-directive REPLACE' flag. Here are a some examples.

For a single file

aws s3 cp s3://mybucket/file.txt s3://mybucket/file.txt --metadata-directive REPLACE \
--expires 2034-01-01T00:00:00Z --acl public-read --cache-control max-age=2592000,public

For an entire bucket (note --recursive flag):

aws s3 cp s3://mybucket/ s3://mybucket/ --recursive --metadata-directive REPLACE \
--expires 2034-01-01T00:00:00Z --acl public-read --cache-control max-age=2592000,public

A little gotcha I found, if you only want to apply it to a specific file type, you need to exclude all the files, then include the ones you want.

Only jpgs and pngs:

aws s3 cp s3://mybucket/ s3://mybucket/ --exclude "*" --include "*.jpg" --include "*.png" \
--recursive --metadata-directive REPLACE --expires 2034-01-01T00:00:00Z --acl public-read \
--cache-control max-age=2592000,public

Here are some links to the manual if you need more info:

Known Issues:

"Unknown options: --metadata-directive, REPLACE"

this can be caused by an out of date awscli - see @eliotRosewater's answer below


S3cmd tool

S3cmd is a "Command line tool for managing Amazon S3 and CloudFront services". While this solution requires a git pull it might be a simpler and more comprehensive solution.

For full instructions, see @ashishyadaveee11's post below


Nepos answered 26/3, 2015 at 14:13 Comment(16)
Thanks for the actual examples of exactly what to do. I was having trouble discovering what was even possible just reading the docs.Huang
I think some browsers do not accept dates later than 2035. Otherwise, very cool, thanks for the examples.Flatt
I am getting "Unknown options: --metadata-directive, REPLACE" when i am running the any of the above command.Please help me.Societal
@Societal the '\' is used to extend a command to multiple lines. Removing it will make it work identically, you will just need to run the entire command on a single line. Hope that helps explain things!Nepos
Note, THERE IS NOW AN EASIER WAY. You can now change the meta data for ALL files in a bucket via AWS Console. See CoderBoy's answer below: https://mcmap.net/q/143184/-set-cache-control-for-entire-s3-bucket-automatically-using-bucket-policiesHutment
Note that using --meta-directive REPLACE will overwrite any previous metadata that is not copied in the command! For instance "content-encoding gzip" will be removed when not explicitly added to the cp command.Theatrics
Does the cp download and re-upload everything?Herzberg
@HarmenJanssen mentions this but I missed it. If you run this, you remove all prior metadata. If you're using S3 as a CDN, this means for example your images will download rather than render in the browser. You cannot use --cache-control without REPLACE, but can you use --metadata-directive COPY --metadata {"cache-control": "max-age=31536000"} to add rather than replace?Adrianadriana
You can also use CacheControl property to of S3 Client when uploading the objects according to the SDK docs docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/…Sesquicarbonate
I believe cache-control or expires not 'and' ... Right?Krenn
Is there any way to set cache policy of a file automatically if it get replaced with the new version ?Daisie
In my case, I sync the folder with the s3 bucket instead of a single file. so I used aws s3 cp s3://BucketName/index.html s3://BucketName/index.html --metadata-directive REPLACE --cache-control max-age=0,no-store,must-revalidatePaternal
Cool but why does it need to touch expires? developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires If there is a Cache-Control header with the max-age or s-maxage directive in the response, the Expires header is ignored.Vulcanism
Oh they are gonna love me when I run this recursive over and over again for tens of thousands of files. My upload software can't set this kind of metadata, so it'll just run your entire bucket line even if I just synced a single file :D Why can't there be a default... Many use S3 as an origin for CF. I presume the need for a bucket-wide cache control directive is not that far fetched...Vulcanism
Just skip --metadata-directive entirely and only use --cache-control max-age=31536000 it's a valid option and this way you preserve the content-type...Vulcanism
The OP asked "for an entire s3 bucket, both existing and future files". What do the solutions above do for new files?Fancher
M
70

Now, it can be changed easily from the AWS console.

  • Log in to AWS Management Console
  • Go into S3 bucket
  • Select all files by route
  • Choose "More" from the menu
  • Select "Change metadata"
  • In the "Key" field, select "Cache-Control" from the drop down menu
  • max-age=604800 Enter (7 days) for Value
  • Press "Save" button

It takes time to execute depends on your bucket files. Redo from the beginning if you accidentally close the browser.

Mauritamauritania answered 2/11, 2017 at 10:17 Comment(8)
What does "select all files by route" mean?Monoplegia
Select all/some files from the directory you want to set metaMauritamauritania
Sorry for late reply. No, it doesn't. You should set it from your application.Mauritamauritania
Does this replace previous metadata or add to it? (I don't want to lose all my content types!)Adrianadriana
I just confirmed it does NOT remove existing values. Only sets the keys you specify (overwriting key if it exists)Budgerigar
max-age=604800Enter (7 days) for Value what does this mean??Indulgence
@Adrianadriana it did reset all of my content types (Content-Type: binary/octet-stream) UGHVulcanism
"If REPLACE is used, the copied object will only have the metadata values that were specified by the CLI command." docs.aws.amazon.com/cli/latest/reference/s3/cp.htmlVulcanism
P
25

steps

  1. git clone https://github.com/s3tools/s3cmd
  2. Run s3cmd --configure (You will be asked for the two keys - copy and paste them from your confirmation email or from your Amazon account page. Be careful when copying them! They are case sensitive and must be entered accurately or you'll keep getting errors about invalid signatures or similar. Remember to add s3:ListAllMyBuckets permissions to the keys or you will get an AccessDenied error while testing access.)
  3. ./s3cmd --recursive modify --add-header="Cache-Control:public ,max-age= 31536000" s3://your_bucket_name/
Pastry answered 4/2, 2016 at 13:27 Comment(3)
Super answer. Thank you! Is there a way to ONLY update those that don't already have this header setting?Ectosarc
Anyone have a similar solution for use with window's S3Express?Carnation
You can also install with HomebrewCoonhound
C
16

I had been banging my head on this problem for a while now. Until I found & read the docs. Sharing that here in case it helps anyone else:

What ended up reliably working for me was this command. I chose a 1 second expiration time for testing to verify expected results:

aws s3 cp \
  --metadata-directive REPLACE \
  --cache-control max-age=1,s-maxage=1 \
  s3://bucket/path/file \
  s3://bucket/path/file
  • --metadata-directive REPLACE is required when "cp" modifying metadata on an existing file in S3
  • max-age sets Browser caching age, in seconds
  • s-maxage sets CloudFront caching, in seconds

Likewise, if setting these Cache-Control header values on a file while uploading to S3, the command would look like:

aws s3 cp \
  --cache-control max-age=1,s-maxage=1 \
  /local/path/file \
  s3://bucket/path/file
Calves answered 1/2, 2018 at 21:53 Comment(0)
S
8

I don't think you can specify this at the bucket level but there are a few workarounds for you.

  1. Copy the object to itself on S3 setting the appropriate cache-control headers for the copy operation.

  2. Specify response headers in the url to the files. You need to use pre-signed urls for this to work but you can specify certain response headers in the querystring including cache-control and expires. For a full list of the available options see: http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectGET.html?r=5225

Starryeyed answered 4/5, 2012 at 8:28 Comment(2)
Thanks Geoff, I knew about (1) but not (2). Not what I had hoped for (though I fear it's not possible)Guyenne
Do you have an example AWS CLI command on how to do #1? docs.aws.amazon.com/cli/latest/reference/s3/cp.htmlFurlough
B
5

You can always configure a lambda with a trigger on PUTOBJECT on S3, the lambda will simply change the header of this particular object that was just put.

Then you can run the copy command mentioned above one last time, and all the new objects will be fixed by the lambda.

UPDATE:

Here is a good place to start from: https://www.aaronfagan.ca/blog/2017/how-to-configure-aws-lambda-to-automatically-set-cache-control-headers-on-s3-objects/

Burck answered 3/6, 2017 at 21:35 Comment(3)
Can you provide some more details on how to make this lamba? Sounds like a great solution.Opaque
@Opaque sure, I found you a link that actually can help you get there easily, aaronfagan.ca/blog/2017/… If you need any help or support I would be more than happy to help.Burck
i know this is old however i found this resource to be useful for what i was looking for. thank you for this. upvotes for you my dude!Melaniamelanic
V
2

To those attempting to use Dan's answer and getting the error:

"Unknown options: --metadata-directive, REPLACE"

I ran into the issue, and the problem was that I installed awscli using

sudo apt-get install awscli

This installed an old version of the awscli which is missing the --metadata-directive command. So I used sudo apt-get remove awscli to remove it.

Then reinstalled following the procedure from amazon: http://docs.aws.amazon.com/streams/latest/dev/kinesis-tutorial-cli-installation.html

The only difference is that I had to use sudo -H because of permission issues which others might run into also.

Verbiage answered 16/10, 2016 at 21:11 Comment(0)
E
2

Bucket policies are to give permissions to the bucket and the object stored inside, so this road won't yield the results you are looking for. The other answers modify the object metadata using automated means, but you can also use Lambda@Edge if you are willing to move the bucket behind CloudFront.

With Lambda@Edge you can run arbitrary code for each client request and it can change the headers returned from the origin (S3 bucket in this case). It requires a bit more configuration and it costs some money, but here's a blueprint of the solution:

  • create a CloudFront distribution
  • add the S3 bucket as the origin
  • create a lambda function that modifies the response header
  • use the CloudFront distribution's URL to access the files

The AWS documentation has an example how to modify response headers. If you happen to use Terraform to manage the infrastructure I've written an article how to do it.

Epoch answered 31/8, 2020 at 17:38 Comment(0)
K
1

Previous answers either don't really correspond with the question or incur a cost (Lambda).

What you should do is to set "cache-control" header when you upload the file (PutObject or MultiPartUpload).

Depending on your language, it can be somewhat different. The documentation is not very clear (as presumably AWS hopes you would pay them with the other solutions).

An example with PHP:

$uploader = new MultipartUploader ($s3,$filename,[
    ...,
    'before_initiate' => function(\Aws\Command $command){
        $command['CacheControl'] = 'max-age=31536000,public';
    },
...
]);

Another example with Go:

cc := "max-age=31536000,public"
input := &s3.PutObjectInput{
    ...,
    CacheControl: &cc,
}
Kamala answered 7/9, 2021 at 19:54 Comment(0)
V
1

Figured I'd share my usage since previous answers misled me. Only two commands with AWS CLI:

aws s3 cp s3://bucketname/ s3://bucketname/ --cache-control max-age=12345 --recursive

That's it for already existing stuff, using cp. Setting --cache-control like that is a valid option.

If you are uploading you might as well sync, for which the command is:

aws s3 sync z:\source\folder s3://bucketname/folder --delete --cache-control max-age=12345 --acl public-read

Notice that I do not use --metadata-directive AT ALL, since by using it you'll lose your guessed content types which will make stuff like images not display by a browser but get downloaded instantly. My solution preserves the guessed value, and allows the guessing with the sync.

Vulcanism answered 4/11, 2021 at 16:26 Comment(0)
W
0

Adding to @roens answer.

If you use S3 with Cloudfront, you can easily use not just Cloudfront, but also browser-caching. For that to work, just specify the cache-control header in the response.

  1. Go to Cloudfront
  2. Go to your distribution and edit behaviour
  3. Go to response headers policy
  4. Define a custom policy with cache-control "max-age=604800" (7 days) and orgin-override enabled.
  5. Now you should have a 7 day browser cache on all your files within this S3 bucket.

Remark: In S3 you would have to specify this for every file, which might no be suitable for your application. Also you would have to define a reponse header policy to pass the cache-control header, if you use S3 together with Cloudfront.

Wiegand answered 24/6, 2023 at 9:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.