Ideal HTTP cache control headers for different types of resources
Asked Answered
C

2

82

I want to find a minimal set of headers, that work with "all" caches and browsers (also when using HTTPS!)

On my web site, I'll have three kinds of resources:

(1) Forever cacheable (public / equal for all users)

Example: 0A470E87CC58EE133616F402B5DDFE1C.cache.html (auto generated by GWT)

  • These files are automatically assigned a new name, when they change content (based on the MD5).

  • They should get cached as much as possible, even when using HTTPS (so I assume, I should set Cache-Control: public, especially for Firefox?)

  • They shouldn't require the client to make a round-trip to the server to validate, if the content has changed.

(2) Changing occasionally (public / equal for all users)

Examples: index.html, mymodule.nocache.js

  • These files change their content without changing the URL, when a new version of the site is deployed.

  • They can be cached, but probably need a round-trip to be revalidated every time.

(3) Individual for each request (private / user specific)

Example: JSON responses

  • These resources should never be cached unencrypted to disk under no circumstances. (Except maybe I'll have a few specific requests that could be cached.)

I have a general idea on which headers I would probably use for each type, but there's always something I could be missing.

Constringe answered 4/6, 2010 at 1:23 Comment(3)
Thanks for your answers and the comments and the links. I'm still experimenting a bit, but I think, I will be able to derive a solution!Constringe
Achieving #3 is generally not possible.Snatchy
See also: #6492289Burn
M
91

I would probably use these settings:

  1. Cache-Control: max-age=31556926 – Representations may be cached by any cache. The cached representation is to be considered fresh for 1 year:

    To mark a response as "never expires," an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future.

  2. Cache-Control: no-cache – Representations are allowed to be cached by any cache. But caches must submit the request to the origin server for validation before releasing a cached copy.
  3. Cache-Control: no-store – Caches must not cache the representation under any condition.

See Mark Nottingham’s Caching Tutorial for further information.

Mathildemathis answered 8/6, 2010 at 21:49 Comment(12)
Makes sense, and looks very minimal. Question: Isn't Cache-Control an HTTP 1.1 header, while HTTP 1.0 only understands the Expires header (?) Should I still care about HTTP 1.0 proxies? And: Can I generally skip the "must-revalidate" directive?Constringe
@chris_l: I understand the values s-max, must-revalidate and public to only be useful when HTTP authentication/authorization takes place. Because if HTTP authentication/authorization takes place, a representation is automatically considered as private and these three values can change that.Mathildemathis
@Gumbo: One thing I'm pretty sure about is, that I need to set public, when I want Firefox 3+ to cache public files to disk while using HTTPS: #174848Constringe
@chris_l: Sorry, but I don’t know anything about browser quirks.Mathildemathis
Some browsers, such as IE, are starting to treat Cache-Control: no-cache as if it was no-store. This is admittedly not according to the RFC, but it is knowingly done to "fix" the mistake done by MANY of using no-cache to prevent sensitive data from being stored unencrypted on disk.Marc
@Gumbo: Your answer is already very helpful, and serves as a great starting point. Thanks also for the link! I also only know way too little about browser quirks, legacy proxies etc. - maybe somebody will come to our rescue :-) +1Constringe
@AviD: Which versions of IE are doing that - where can I find more information about this? And: Do you think that simply using "Cache-Control: max-age=0" could work better for my use case (2)?Constringe
@chris_l, I happened across this link: palisade.plynt.com/issues/2008Jul/cache-control-attributes . I don't remember how previous versions behaved, though I think IE7 did this too.Marc
IE has always treated no-cache as no-store, although neither token guarantees that the response will not be written to disk, simply that it will never be reused, even after validation. See blogs.msdn.com/b/ie/archive/2010/07/14/… for further discussion of caching.Snatchy
Also, Firefox no longer requires PUBLIC in the Cache-Control to cache HTTPS resources. But your best bet overall is to just test your site while watching the traffic, e.g. with Fiddler.Snatchy
Setting a cache control value of 100 years is not advised. First off, the spec recommends a max of 1 year. Secondly, any value over 68 years results in immediate expiration for IE8 and below: blogs.msdn.com/b/ieinternals/archive/2010/01/26/…Snatchy
@Snatchy -MSFT-: Thanks for your remark. Updated it accordingly.Mathildemathis
B
-2

Cases one and two are actually the same scenario. You should set Cache-Control: public and then generate a URL with includes the build number / version of the site so that you have immutable resources that could potentially last forever. You also want to set the Expires header a year or more in the future so that the client will not need to issue a freshness check.

For case 3, you could all of the following for maximum flexibility:

"Cache-Control", "no-cache, must-revalidate"
"Expires", 0
"Pragma", "no-cache"
Beginner answered 8/6, 2010 at 1:36 Comment(4)
Different URLs for new builds are probably not an option: a) This would force the client to re-download the forever-cacheable files. They get unique names to avoid that. b) The main URL to my site should be just https://www.example.com/ c) I want bookmarks to always refer to the newest version of my site (imagine, the bookmarks to a stackoverflow question would contain the build number of the site).Constringe
Hi Chris, This approach is generally used for CSS and JS resources rather than documents. I agree it's not applicable for document identifiers, in which case you should simply set cache-control public, Last-Modified and etag on the headers which will cause a freshness check each time and only a 304 will be sent back if there are no changes since the last download. Alternatively, you could download the actual dynamic page content in each page via JS so you preserve the URL while still allowing effective caching.Beginner
Yes, that's pretty much the way, GWT handles this for me: My index.html (changing occasionally) includes mymodule.nocache.js (changing occasionally), which automatically includes the correct forever-cacheable files (large parts of js, GWT managed image bundles, ...) The only thing it leaves to me, is setting the correct http headers for each type. I want to reduce these headers to a minimum, since they account for a large percentage of the transfer volume. So do I need e.g. both Last-Modified and ETag etc.?Constringe
"Expires" actually needs to be a date, not the number 0. It should have the same value as the "Date" header. See mnot.net/cache_docsBrachycephalic

© 2022 - 2024 — McMap. All rights reserved.