What heuristics do browsers use to cache resources not explicitly set to be cachable?
Asked Answered
G

5

31

13.2.2 Heuristic Expiration

Since origin servers do not always provide explicit expiration times, HTTP caches typically assign heuristic expiration times, employing algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time. The HTTP/1.1 specification does not provide specific algorithms, but does impose worst-case constraints on their results. Since heuristic expiration times might compromise semantic transparency, they ought to used cautiously, and we encourage origin servers to provide explicit expiration times as much as possible. HTTP/1.1 RFC 2616

What are the algorithms used by browsers to estimate plausible expiration times?

The ideal answer will cover all major browsers with evidence from source code or official blog posts.

Gery answered 15/1, 2013 at 20:4 Comment(2)
I would be curious if the detected mime type of the resource plays a role as well. And just to note in the same RFC it also is written Also, if the response does have a Last-Modified time, the heuristic expiration value SHOULD be no more than some fraction of the interval since that time. A typical setting of this fraction might be 10%. Would one way to test is to place a file on a server with none of those headers and then test in all the browsers and see what responses they all spit back then check your local browser cache for an expiration.Impossibility
If the browser exposes that info.Impossibility
P
24

Let's assume all browsers we are interested in are Internet Explorer 8 or newer (e.g. IE5 has some terrible behaviour with caching headers).

There is only ONE standards based way of controlling caching (introduced with HTTP/1.1) - the Cache-Control HTTP header.

Since at least 1996 IE has been using an opt-out policy for caching HTTPS content.

Seemingly since its introduction Chrome has done opt-out for HTTPS (i.e. it will cache it unless told not to). In 2011 Firefox 4 (but not Safari) switched to opt-out caching for HTTPS content. Source.

Recommendations

  1. Only use HTTP headers to control browser caching. If you decide to go against this be aware that IE only recognizes two cache control directives that are set inside HTML:

    <META HTTP-EQUIV="Pragma" CONTENT="no-cache">
    <META HTTP-EQUIV="Expires" CONTENT="-1">
    

    and seemingly only the former is useful in the HTTPS scenario. Further, there can be problems when trying to use Pragma in IE. Finally, Chrome ignores cache directives in meta tags reducing their usefulness even further.

  2. Don't use the Expires header. In modern browsers Expires is superseded by Cache-Control. Expires: 0 and Pragma: no-cache are technically invalid response headers. Yes, they have existed since the beginning but not all modern browsers (e.g. Chrome) use them and they have been superseded by Cache-Control.

  3. The Vary header is a minefield. How Vary behaves in older IEs. How Vary behaves with XHR. Finding the details out is left as an exercise to the reader - and leaves the impression it is preferable to use different URLs for different content...

  4. Allow the browser to make conditional requests by setting ETags. Etags allow a browser to do a lightweight check to see if the content has changed and it can avoid making a full request if it hasn't.

  5. Be aware some browsers are just broken and need hacks. IE 8 can have issues downloading files which it has been told not to cache.

Browser caching algorithms

See also

Poignant answered 6/8, 2015 at 9:36 Comment(2)
Great post, +1. Just a correction: Pragma is an HTTP/1.0 header and indeed superseeded, but the Expires header is perfectly valid and accepted in all browsers. Also, IE <9 is only interesting for a history lesson.Modie
NB : Firefox ComputeFreshnessLifetime() has changed : there now is a week upper bound for Last-Modified heuristicGateshead
G
13

From Chromium's source code: https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_response_headers.cc&l=1082&rcl=1421094684

  if ((response_code_ == 200 || response_code_ == 203 ||
       response_code_ == 206) && !must_revalidate) {
    // TODO(darin): Implement a smarter heuristic.
    Time last_modified_value;
    if (GetLastModifiedValue(&last_modified_value)) {
      // The last-modified value can be a date in the future!
      if (last_modified_value <= date_value) {
        lifetimes.freshness = (date_value - last_modified_value) / 10;
        return lifetimes;
      }
    }
  }
Gery answered 14/1, 2015 at 8:54 Comment(2)
I wonder if Darin ever thought that their TODO comment would end up in a SO answer, :p.Phenomenalism
I'm confused here: Is date_value the value of Date field? It looks like its value changes every time I issue a request, does it? And its value is always the date on the origin server. Does that mean the lifetimes.freshness also keeps increasing all the time (because date_value increases but last_modified_value should remain the same)?Peerage
C
9

Seems like webkit ("...the OS X system framework version of the engine that's used by Safari...") uses the same heuristics as Chromium.

The following is taken from CacheValidation.cpp:

return (creationTime - lastModifiedValue) * 0.1;
Compressive answered 15/1, 2015 at 21:1 Comment(0)
C
8

This blog post says that Internet Explorer 9 uses max-age = (DownloadTime - LastModified) * 0.1: http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx

Which is effectively the same as Mozilla (this post is rather old, I don't know if it has changed since): https://developer.mozilla.org/en-US/docs/HTTP_Caching_FAQ

Celestinecelestite answered 15/1, 2014 at 11:13 Comment(0)
A
4

Gecko estimates expiration at now + (now - lastModified)/10, last I checked.

Accuracy answered 16/1, 2013 at 2:52 Comment(1)
This is no longer true. Nowadays it's that, with an upper bound of 1 week from now. bugzilla.mozilla.org/show_bug.cgi?id=277813Modie

© 2022 - 2024 — McMap. All rights reserved.