Stop browser to make HTTP requests for images that should stay cached - mod_expires
Asked Answered
I

10

49

After reading many articles and some questions on here, I finally succeded in activating the Apache mod_expires to tell the browser it MUST cache images for 1 year.

<filesMatch "\.(ico|gif|jpg|png)$">
  ExpiresActive On
  ExpiresDefault "access plus 1 year"
  Header append Cache-Control "public"
</filesMatch>

And thankfully server responses seem to be correct:

HTTP/1.1 200 OK 
Date: Fri, 06 Apr 2012 19:25:30 GMT 
Server: Apache 
Last-Modified: Tue, 26 Jul 2011 18:50:14 GMT 
Accept-Ranges: bytes 
Content-Length: 24884 
Cache-Control: max-age=31536000, public 
Expires: Sat, 06 Apr 2013 19:25:30 GMT
Connection: close
Content-Type: image/jpeg 

Well, I thought this would stop the browser to download and even inquire the server about the images for 1 year. But it's partially true: cause if you close and reopen the browser, the browser does NOT download the images from server anymore, but browser still inquires the server with an HTTP request for each image.

How do I force browser to stop making HTTP requests for each image? Even if these HTTP requests are not followed by an image being downloaded, they are still requests made to the server that unecessarely icrease latency and slow down the page rendering!

I already told the browser it MUST keep the images in cache for 1 year! Why does browser still inquire the server for each image (even if it does not download the image)?!


Looking at network graphs in FireBug (menu FireBug > Net > Images) I can see different caching behaviours (I obviously started with the browser cache completly empty, I forced a cache delete on browser using "Clear All History"):

  • When the page is loaded for the 1st time all images are downloaded (and same thing happens if I force a page reload by clicking on the browser's reload page button). This makes sense!

  • When I navigate the site and get back to the same page the images are not downloaded at all and the browser does NOT even inquire the server for any of the images. This makes sense, (and I would like to see this behaviour also when browser is closed)!

  • When I close the browser and open it again on the same page, the silly browser makes anyway HTTP request to the server one time per image: it does NOT downalod the image, but it still makes an HTTP request, it's like the browser inquires the server about the image (server replies with 200 OK). This is the one that irritates me!

I also attach the graphs below if you are interested:

enter image description here

enter image description here

EDIT: just tested now also with FireFox 11.0 just to make sure it wasn't an issue of my FireFox 3.6 being too old. The same thing happens!!! I also tested Google site and Stackoverflow site, they do both send the Cache-Control: max-age=... but the browser still makes an HTTP request to the server for each image once the browser is closed and opened again on the same page, after server response the browser does NOT download the image (as I explained above) but it still makes the damn request that increases time to see page.

EDIT2: and removing the Last-Modified header as suggested here, does not solve the problem, it does not make any difference.

Inspect answered 6/4, 2012 at 20:13 Comment(8)
Default behaviour is download if newer maybe?Leopardi
@Tont Hopkinson: but I tell the browser ExpiresDefault "access plus 1 year" (i.e. Cache-Control: max-age=31536000) so the browser should not hit the server asking/looking for such resource again, I already told him to keep it in the cache for 1 year form last access.Inspect
That will be why what you've done is working as you'd expect is it? Expires is deleted from browser cache, not don't check to see if the cache is up to date for one year....Leopardi
@Tony Hopkinson: sorry, but I'm missing your point. I want the browser NOT to download the image and NOT to even inquire the server EVER AGAIN for 1 year. From my test, it seems the browser does not download the image again, but it still inquires the server. I would expect the browser to get the image from its own cache and to not hit the server anymore for 1 year.Inspect
My point? My point is your problem is the mismatch between your expectation and reality. If the image was changed on the server, I wouldn't expect to have to wait a year or clear my browser cache to get the latest version. I'd expect it to check to see if it had changed, and you haven't mentioned nostore...Leopardi
@Tony Hopkinson, didn't mean to be rude, I simply did not understand what you were trying to explain. You say: "I'd expect it to check to see if it had changed". I agree, but not in the case I told the browser the image dopes not expire for 1 year. That's why I deliberately added Cache-Control: max-age=31536000 to each image request made to teh server. Google suggests to do this to speed up pages.Inspect
From what I've seen around this behaviour, expiry time seems to have been interpreted as now you can delete me, and then if the image is needed again, I'll get it. Bit of a fast and loose woolly area, open to huge abuse by providers. Some might take a dim view of you using up their resources for year so your site looks better, for instance....Leopardi
one thing to note is that actual http requests are made when you refresh the browser, no matter what headers have been set. server will still respond with a 304 and not many bytes will go over the wire, but you still get that latency hit. when following links and navigating otherwise, the browser local cache is hit (no http requests what so ever). just something to be aware when debugging.Hiro
L
17

You were using the wrong tool for analysing the requests.

I'd recommend the really useful Firefox addon Live HTTP headers so you can see what is really going on on the network.

And just to be sure, you can ssh/putty your server and do something like

tail -f /var/log/apache2/access.log
Lactoscope answered 16/5, 2012 at 14:52 Comment(2)
Absolutely right! By using the tool you suggested I can see HTTP requests are not sent again as supposed to be. Many thanks! I'm cluelss why "FireBug > Net" shows all those requests that are NOT done at all!!!Inspect
This may have been true at some time -- but my version of firebug clearly shows cached requested as grey/white hatched. Nothing wrong with analyzing apache logs, but nothing wrong with firebug either.Larainelarboard
M
28

The behavior you are seeing is the intended (see RFC7234 for more details), specified behavior:

All modern browsers will send HTTP requests to the server for every page element displayed, regardless of cache status. This was a design decision made at the request of web services (especially advertising networks) to ensure that HTTP servers were able to maintain records of every display of every element.

If the browsers did not make these requests, the server would never be notified that an image had been displayed to the user. For advertising networks, this would be catastrophic. Early on, advertising networks 'hacked' their way around this by serving the same ad image using randomly generated names (ex: 'coke_ad_1_98719283719283.gif'). However, for ISPs this practice caused a huge increase in data transfers, because every one of their users was re-downloading these identical ad images, bypassing any caching/proxy servers their ISP was operating.

So a truce was reached: Browsers would always send HTTP requests, even for un-expired cached elements. Servers would respond with HTTP 304 status codes ("not modified"). This allows the servers to record the fact that the image was displayed to the client. As a result, advertising networks generally stopped using randomized image names to bypass network cache servers.

This gave the ad networks what they wanted - a record of every image displayed - and it gave ISPs what they wanted - cache-able images and static content.

That is why there isn't much you can do to prevent browsers from sending HTTP requests for cached page elements.

But if you look at other available client-side solutions that came along with html5, there is a scope to prevent resource loading

  1. Cache Manifest (in spite of its gotchas)
  2. IndexedDB (nice asynchronous features, allows blob storage)
  3. Local Storage (not async)
Mudslinging answered 16/4, 2012 at 15:21 Comment(12)
I receive 200 OK (see the screenshots in my question) and NOT 304! I know browsers would normally receive 304 "not modified", it's what you recive back when you do NOT use Cache-Control: max-age=.... But I'm using Cache-Control: max-age=... and the ALL point of using it, is to NOT inquire the server...Inspect
...What surprised me is that the server seemed to be inquired anyway, replying with 200 OK. But as explained by Peter Lundsby (and according to this stackoverflow.com/questions/6797361 ) it's probably just FireBug showing the request, but the request is made to the browser CACHE and not really to the server, that's why is shown in grey color.Inspect
Jason, do you have any references to additional info? — This was very interesting, I was looking for something like this.Ketch
@jason-buberel As KajMagnus said, any references? The behavior stated by OP seems not to be the case based on further analysis of FireBug's color coding. Maybe a combination of your explanation and the findings in FireBug?Limacine
This behavior is detrimental to my html5 game that is constantly displaying and hiding the same images. Yeah I return a 304 code but the server is getting hammered with these requests that are hundreds of bytes in length with all the referrer, user agent, and other headers.Policlinic
Very interesting! But I am not sure this is the specified behavior. When I navigate to Google.com on chrome, I can see that all the images are served from local cache and no network request is made. Yes, it is a problem for ad-servers but they have ways around it (Similar to the cache-busting that you mentioned). For example, this is what Google's DoubleClick suggests to get around the caching problem support.google.com/dfp_premium/answer/1116933?hl=en If what you are saying is true then we will not need cache-busters for modern browsers.Waynant
-1: This is utterly wrong. An image sent with the correct cache control headers WILL NOT be re-requested by a modern browser until the cache expiry time, with some exceptions like the user forcing a page reload. The advertising problem is also trivially solvable by telling the browser not to cache the image.Lipread
To echo what @romkyns said, unless Google and many other developers are flat-out wrong about using server rewrites and version numbers to prevent server hits, this answer is incorrect.Anamorphism
Why did 35 people upvote this utter and easily testable nonsense? sighHarlan
"For every page element"? Damn that would be slow.Lardon
Nonsense. If the advertisers wanted an HTTP request for each page view all they had to do was set max-age=0.Subinfeudate
It's all wrong, read chugadie answer about reloading and refreshing.Majewski
L
17

You were using the wrong tool for analysing the requests.

I'd recommend the really useful Firefox addon Live HTTP headers so you can see what is really going on on the network.

And just to be sure, you can ssh/putty your server and do something like

tail -f /var/log/apache2/access.log
Lactoscope answered 16/5, 2012 at 14:52 Comment(2)
Absolutely right! By using the tool you suggested I can see HTTP requests are not sent again as supposed to be. Many thanks! I'm cluelss why "FireBug > Net" shows all those requests that are NOT done at all!!!Inspect
This may have been true at some time -- but my version of firebug clearly shows cached requested as grey/white hatched. Nothing wrong with analyzing apache logs, but nothing wrong with firebug either.Larainelarboard
R
12

There's a difference between "reloading" and "refreshing". Just navigating to a page with back and forward buttons usually doesn't initiate new HTTP requests, but specifically hitting F5 to "refresh" the page will cause the browser to double check its cache. This is browser dependent but seems to be the norm for FF and Chrome (i.e. the browsers that have the ability to easily watch their network traffic.) Hitting F6, enter should focus the URL address bar and then "go" to it, which should reload the page but not double check the assets on the page.

Update: clarification of back and forward navigating behavior. It's called "Back Forward Cache" or BFCache in browsers. When you navigate with back/forward buttons the intent is to show you exactly as the page was when you saw it in your own timeline. No server requests are made when using back and forward, even if a server cache header says that a particular item expired.

If you see (200 OK BFCache) in your developer network panel, then the server was never hit - even to ask if-modified-since.

http://www.softwareishard.com/blog/firebug/firebug-tip-what-the-heck-is-bfcache/

Ravioli answered 2/2, 2013 at 17:43 Comment(1)
The terms "reload" and "refresh" have been used interchangeably in the user interfaces of browsers for years. I am pretty sure Netscape Navigator 4 had a "Reload" button whereas IE 6 had a "Refresh" button. In each case the button sent an HTTP request to the server. Other than that, I believe your answer is correct.Subinfeudate
O
7

If I force a refresh using F5 or F5 + Ctrl, a request is send. However if I close the browser and enter the url again then NO reqeust is send. The way I tested if a request is send or not was by using breakpoints on begin request on the server even when a request is not send it still shows up in Firebug as having done a 7 ms wait, so beware of this.

Overexert answered 12/4, 2012 at 15:32 Comment(3)
It does not work, I mean it does not make any difference, as a guy has correctly commented on the same article, David Merrilees: "Does this work? I've removed both the Etag and Last-Modified headers, and added an expires header, but it always revalidates with 200 response."Inspect
It actually worked for me! If I force a refresh using F5 or F5 + Ctrl, a request is send. However if I close the browser and enter the url again then NO reqeust is send. The way I tested if a request is send or not was by using breakpoints on begin request on the server even when a request is not send it still shows up in Firebug as having done a 7 ms wait, so beware of this.Overexert
edit your answer by replacing your suggestion with your last comment and I'll accept your answer. I even found this explaining well what you are saying: #6797861Inspect
Y
6

What you are describing here does not reflect my experience. If content is served with a no-store directive or you do an explicit refresh, then yes, I'd expect it to go back to the origin server otherwise it should be cached across browser restarts (assuming it is allowed to, and can write a cache file).

Looking at your waterfalls in a bit more detail (which is tricky because they are a bit small & blurry) the browser appears to be doing exactly what it should - it has entries for the images - but these are just loading from the local cache not from the origin server - check the 'Date' header in the response (why do you think it's taking milliseconds instead of seconds?). That's why they are coloured differently.

Yeseniayeshiva answered 7/4, 2012 at 22:1 Comment(5)
Exactly. Firebug shows requests in a light grey colour if the response is already cached. To confirm, go to Firebug > Net > Request URL > Cache. Look at the fetch count. You should see that field increment.Chainman
Guys thanks for taking the time to reply. Probably my question is too long. TRUE that FF takes the file from the cache, but the point is that before doing that it does contact the server. The server replies with a 200 OK, FF does not download the file, and gets it from the cache. I'm not surprised FF gets the file form the cache, I'm surprised FF contacts 1st the server, I alreday told FF that the file does not expire Cache-Control: max-age=31536000, so why does FF keeps contacting the server. A server request for each image adds an appreciable latency (even if the image is not downloaded)Inspect
@symcbean: you can easy test this behavior by opening this Stackoverflow site (the server does send Cache-Control max-age=604800) so the image should get cached for 7 days. Well if you navigate SO site you will see in your "Firebug > Net" that the image http:...stackoverflow/img/tag-adobe.png does not even appear in "Firebug > Net", I assume that's because the image is taken from the cache. But if you close browser and open it again, you will see in "Firebug > Net" that the server is contacted again (grey color) for such image, then the image is not downloaded, but still the server is hit.Inspect
@Marco: if you don't believe me / firebug then use wireshark to see what's actually being sent to the server.Yeseniayeshiva
@symcbean: but how do you explain when navigating site without closing the browser and reopening it, Firbug > Net does not even show those greied requests?Inspect
L
4

After myself spending considerable time looking for a reasonable answer, I found the below link most useful and it does answer the question asked here.

https://webmasters.stackexchange.com/questions/25342/headers-to-prevent-304-if-modified-since-head-requests

Libradalibrarian answered 4/2, 2014 at 0:16 Comment(0)
S
0

If it is a matter of life or death (If you want to optimise page loading this way or if you want to reduce the load on the server as much as possible no matter what), then there IS a workaround.

Use HTML5 local storage to cache images after they were requested for the first time.

  • [+] You can prevent browser from sending HTTP requests, which in 99% would return 304 (Not Modified), no matter how hard user tries (F5, ctrl+F5, simply revisiting page, etc.)

  • [-] You have to put some extra efforts in javascript support for this.

  • [-] Images are stored in base64 (we cannot store binary data), thats why they are decoded each time at client side. Which is usually pretty fast and not big deal, but it is still some extra cpu usage at client side and should be kept in mind.

  • [-] Local storage is limited. You can aim at using ~5mb of data per domain (Note: base64 adds ~30% to original size of image).

  • [?] Supported by majority of browsers. http://caniuse.com/#search=localstorage

Example

Test

Subinfeudation answered 1/12, 2014 at 5:5 Comment(0)
B
0

What you are seeing in Chrome is not a record of the actual HTTP requests - it's a record of asset requests. Chrome does this to show you that an asset is actually being requested by the page. However, this view does not really actually indicate if the request is being made. If an asset is cached, Chrome will never actually create the underlying HTTP request.

You can also confirm this by hovering over the purple segments in the timeline. Cached resources will have a (from cache) in the tooltip.

In order to see the actual HTTP requests, you need to look on a lower level. In some browsers this can be done with a plugin (like Live HTTP Headers).

In reality though, to verify the requests are not actually being made you need to check your server logs or use a debugging proxy like Charles or Fiddler. This will work on an HTTP level to make sure the requests are not actually happening.

Bloodworth answered 18/12, 2014 at 16:15 Comment(0)
M
0

Cache Validation and the 304 response

There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:

  • The cached entry has no expiration date and the content is being accessed for the first time in a browser session

  • The cached entry has an expiration date but it has expired

  • The user has requested a page update by clicking the Refresh button or pressing F5

If the cached entry has a last modification date, IE sends it in the If-Modified-Since header of a GET request message:

GET /images/logo.gif HTTP/1.1
Accept: */*
Referer: http://www.google.com/
Accept-Encoding: gzip, deflate
If-Modified-Since: Thu, 23 Sep 2004 17:42:04 GMT
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Host: www.google.com

The server checks the If-Modified-Since header and responds accordingly. If the content has not been changed since the date/time specified, it replies with a status code of 304 and a response message that just contains headers:

HTTP/1.1 304 Not Modified
Content-Type: text/html
Server: GWS/2.1
Content-Length: 0
Date: Thu, 04 Oct 2004 12:00:00 GMT

The response can be quickly downloaded because it contains no content and causes IE to read the data it requires from the cache. In effect, it is like a redirection to the local browser cache.

If the requested object has actually changed since the date/time in the If-Modified-Since header, the server responses with a status code of 200 and supplies the modified version of the resource.

Monegasque answered 22/2, 2015 at 17:13 Comment(0)
S
0

This question has a better answer here at webmasters stack-exchange site.

More information, which is also cited in the above link, is on httpwatch

According to the article:

There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:

  • The cached entry has no expiration date and the content is being accessed for the first time in a browser session
  • The cached entry has an expiration date but it has expired
  • The user has requested a page update by clicking the Refresh button or pressing F5
Stole answered 12/11, 2015 at 23:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.