Cache Expire Control with Last Modification
Asked Answered
A

4

15

In Apache's mod_expires module, there is the Expires directive with two base time periods, access, and modification.

ExpiresByType text/html "access plus 30 days"

understandably means that the cache will request for fresh content after 30 days.

However,

ExpiresByType text/html "modification plus 2 hours"

doesn't make intuitive sense.

How does the browser cache know that the file has been modified unless it makes a request to the server? And if it is making a call to the server, what is the use of caching this directive? It seems to me that I am not understanding some crucial part of caching. Please enlighten me.

Asparagus answered 18/2, 2009 at 20:55 Comment(0)
T
40

An Expires* directive with "modification" as its base refers to the modification time of the file on the server. So if you set, say, "modification plus 2 hours", any browser that requests content within 2 hours after the file is modified (on the server) will cache that content until 2 hours after the file's modification time. And the browser knows when that time is because the server sends an Expires header with the proper expiration time.

Let me explain with an example: say your Apache configuration includes the line

ExpiresDefault modification plus 2 hours

and you have a file index.html, which the ExpiresDefault directive applies to, on the server. Suppose you upload a version of index.html at 9:53 GMT, overwriting the previous existing index.html (if there was one). So now the modification time of index.html is 9:53 GMT. If you were running ls -l on the server (or dir on Windows), you would see it in the listing:

-rw-r--r--  1 apache apache    4096  Feb 18 09:53 index.html

Now, with every request, Apache sends the Last-Modified header with the last modification time of the file. Since you have that ExpiresDefault directive, it will also send the Expires header with a time equal to the modification time of the file (9:53) plus two hours. So here is part of what the browser sees:

Last-Modified: Wed, 18 Feb 2009 09:53:00 GMT
Expires: Wed, 18 Feb 2009 11:53:00 GMT

If the time at which the browser makes this request is before 11:53 GMT, the browser will cache the page, because it has not yet expired. So if the user first visits the page at 11:00 GMT, and then goes to the same page again at 11:30 GMT, the browser will see that its cached version is still valid and will not (or rather, is allowed not to) make a new HTTP request.

If the user goes to the page a third time at 12:00 GMT, the browser sees that its cached version has now expired (it's after 11:53) so it attempts to validate the page, sending a request to the server with a If-Modified-Since header. A 304 (not modified) response with no body will be returned since the page's date has not been altered since it was first served. Since the expiry date has passed -- the page is 'stale' -- a validation request will be made every subsequent time the page is visited until validation fails.

Now, let's pretend instead that you uploaded a new version of the page at 11:57. In this case, the browser's attempt to validate the old version of the page at 12:00 fails and it receives in the response, along with the new page, these two new headers:

Last-Modified: Wed, 18 Feb 2009 11:57:00 GMT
Expires: Wed, 18 Feb 2009 13:57:00 GMT

(The last modification time of the file becomes 11:57 upon upload of the new version, and Apache calculates the expiration time as 11:57 + 2:00 = 13:57 GMT.)

Validation (using the more recent date) will not be required now until 13:57.

(Note of course that many other things are sent along with the two headers I listed above, I just trimmed out all the rest for simplicity)

Tom answered 18/2, 2009 at 21:50 Comment(9)
Hi David, this makes sense, however I am still not sure, why and how the server knows to send the browser. So if I understand correct, the next time the browser is requesting the resource, the server somehow sends information to the browser about the files modification status-- but isnt this a getAsparagus
I figured this would be easiest to explain with an example, so I edited one in...Tom
the all point is when you say the browser will see that its cached version is still valid and will not (or rather, is allowed not to) make a new HTTP request what do you mean by that "or rather is allowed not to"??? Because I opened another question about caching issues and it seems even using an expire date far in the future, the damn browser behaves the same by making HTTP request to inquire server for newer version instead of taking it from the cache, see: #10049240Desideratum
@Marco it's just what I said, that the HTTP specification does not require a browser to use its cached version under any circumstances.Tom
thanks for your reply, but i really don't understand now. Why does Google suggests all this mod_expires stuff to leverage browser caching developers.google.com/speed/docs/best-practices/caching , if the browser at the end does not follow server directive to cache stuff and still comes back making HTTP requests to server on resources it was told to cache for a long time?Desideratum
I think that page is a little misleading, in that it says that browsers will not issue any GET requests for fresh cached content - but as you've seen, sometimes they do. But even giving the browser the option of not sending a new request can save network traffic.Tom
so you are telling me it's well known the browser still sends GET request even if it was told to store a certain file in cache and not to annoy the server anymore with useless GET requests until the expire header?! Do you know about any articles explaining this counterintuitive behavior? Because most of the stuff I read on the web explains mod_expires as the big thing that will kill latency and stop forever the browser from requesting cached stuff (as you pointed out in the Google article). But it does not seem to be true. You could also answer to my question if you want.Desideratum
I'm not sure about "well known," but it is allowed by the HTTP specification, and evidently it does happen.Tom
in other words, what would be the right way to force the browser to get the page from the server and not from the cache, let's say every 6 hours?Giantess
R
4

The server sends a header such as: "Last-Modified: Wed, 18 Feb 2009 00:00:00 GMT". The cache behaves based on either this header or the access time.

Say if the content is expected to be refreshed every day, then you want it to expire "modification plus 24 hours".

If you don't know when the content will be refreshed, then it's better to base it on the access time.

Rase answered 18/2, 2009 at 21:29 Comment(1)
Hi Andrew, Thanks for ur answer. When and How often does the server send Last Modified header? or does it happen during a browser sessionAsparagus
A
0

My understanding is that modification asks the browser to base the cache time based on the Last-Modificatied HTTP header's value. So, modification plus 2 hours would be the Last-Modificatied time + 2 hours.

Actin answered 18/2, 2009 at 21:0 Comment(0)
C
0

First of all, thanks to David Z for the detailed explanation above. In answer to bushman's question about why does it make sense to invoke caching if the server is still required to make a request, the answer is that the time is saved in what is returned by the server. If the cache directives indicate that a file's content is still fresh, instead of returning content, a 304 code is returned with an empty response body. That is where the time is saved.

A better explanation than I've given is here, from https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers :

Though conditional requests do invoke a call across the network, unmodified resources result in an empty response body – saving the cost of transferring the resource back to the end client. The backend service is also often able to very quickly determine a resource’s last modified date without accessing the resource which itself saves non-trivial processing time.

Time-based

A time-based conditional request ensures that only if the requested resource has changed since the browser’s copy was cached will the contents be transferred. If the cached copy is the most up-to-date then the server returns the 304 response code.

To enable conditional requests the application specifies the last modified time of a resource via the Last-Modified response header.

Cache-Control:public, max-age=31536000 Last-Modified: Mon, 03 Jan 2011 17:45:57 GMT

The next time the browser requests this resource it will only ask for the contents of the resource if they’re unchanged since this date using the If-Modified-Since request header

If-Modified-Since: Mon, 03 Jan 2011 17:45:57 GMT

If the resource hasn’t changed since Mon, 03 Jan 2011 17:45:57 GMT the server will return with an empty body with the 304 response code.

Consortium answered 9/10, 2014 at 21:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.