Will this protect me from Etag tracking?
Asked Answered
K

3

7

Background: ETag tracking is well explained here and also mentioned on Wikipedia.

An answer I wrote in a response to "How can I prevent tracking by ETags?" has driven me to write this question.

I have a browser-side solution which prevents ETag tracking. It works without modifying the current HTTP protocol. Is this a viable solution to ETag tracking?

Instead of telling the server our ETag we ASK the server about its ETag, and we compare it to the one we already have.

Pseudo code:

If (file_not_in_cache)
{
    page=http_get_request();     
    page.display();
    page.put_in_cache();
}
else
{
    page=load_from_cache();
    client_etag=page.extract_etag();
    server_etag=http_HEAD_request().extract_etag();

    //Instead of saying "my etag is xyz",
    //the client says: "what is YOUR etag, server?"

    if (server_etag==client_etag)
    {
        page.display();
    }
    else
    {
        page.remove_from_cache();
        page=http_get_request();     
        page.display();
        page.put_in_cache();
    }
}

HTTP conversation example with my solution:

Client:

HEAD /posts/46328
host: security.stackexchange.com

Server:

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
ETag: "EVIl_UNIQUE_TRACKING_ETAG"
Content-Type: text/html
Content-Length: 131

Case 1, Client has an identical ETag:

Connection closes, client loads page from cache.

Case 2, client has a mismatching ETag:

GET...... //and a normal http conversation begins.

Extras that do require modifying the HTTP specification

Think of the following as theoretical material, the HTTP spec probably won't change any time soon.

1. Removing HEAD overhead

It is worth noting that there is minor overhead, the server has to send the HTTP header twice: Once in response to the HEAD, and once in response to the GET. One theoretical workaround for this is modifying the HTTP protocol and adding a new method which requests header-less content. Then the client would request the HEAD only, and after that the content only, if the ETags mismatch.

2. Preventing cache based tracking (or at least making it a lot harder)

Although the workaround suggested by Sneftel is not an ETag tracking technique, it does track people even when they're using the "HEAD, GET" sequence I suggested. The solution would be restricting the possible values of ETags: Instead of being any sequence, the ETag has to be a checksum of the content. The client checks this, and in case there is a mismatch between the checksummed value and the value sent by the server, the cache is not used.

Side note: fix 2 would also eliminate the following Evercookie tracking techniques: pngData, etagData, cacheData. Combining that with Chrome's "Keep local data only until I quit my browser" eliminates all evercookie tracking techniques except Flash and Silverlight cookies.

Kovar answered 2/12, 2013 at 18:36 Comment(6)
Given that you posted this on StackOverflow, what is the actual programming problem you're trying to solve? This seems like a request for comments and opinions, which is not what SO is for and will probably get your question closed under the "asking for an opinion" reason.Trenna
I am trying to prevent etag tracking by modifying the way browsers ask for pages. This is a programming issue, because implementing it involves modifying the way browsers work and not the HTTP protocol. I am not asking for opinion, I am asking for objective objections to this fix and looking for possible flaws that would prevent this from working. However, This is highly interrelated with security and networking, and I agree that it may be more suitable on a different site. I can do nothing but wait for the decision of the SO guys.Kovar
I have omitted the word "opinion" from the question.Kovar
How are you implementing load_from_cache()? I'm not familiar with any JavaScript mechanism to allow direct access to the cache. Also, if you don't supply an ETag or any cookies (or any other means of identifying yourself) in your HEAD request, you're likely to get served a new ETag, which seems just about as useful as clearing your cache.Cloak
Note this is pseudo code, I haven't implemented load_from_cache yet. The idea is to modify the source of the browser, this has nothing to do with Javascript. Regarding your second argument: One is not supposed to get a new Etag unless the content changed, regardless of how your HEAD request looks like. IF you are getting a new Etag for each request, then the server is doing something nasty and not using the cache for that specific request would be the safe thing to do. This is more useful than clearing the cache because it's equivalent to clearing the cache only for Etag tracking servers.Kovar
Best solution would be to disable etag-caching all together in browser private mode (at the moment you can set etags in normal mode and identify the users after they started private mode). I see no workaround that would prevent this kind of tracking - only tracking implementation will differ.Pendent
C
5

It sounds reasonable, but workarounds exist. Suppose the front page was always given the same etag (so that returning visitors would always load it from cache), but the page itself referenced a differently-named image each time it was loaded. Your GET or HEAD request for this image would then uniquely identify you. Arguably this isn't an etag-based attack, but it still uses your cache to identify you.

Connivent answered 2/12, 2013 at 20:52 Comment(8)
Wonderful idea! I think I've found a defense against that too. I will modify my question to take this into account.Kovar
Question updated. assuming the HTTP protocol changes are applied, would people become immune to cache tracking? I firmly believe it's a yes.Kovar
Couple of problems: (1) the mtime is sometimes used as the etag; this would prevent proper caching, since it could not be properly verified. (2) MD5 is sometimes used for the etag; this is susceptible to collision attacks.Connivent
(1) What I proposed in "2. Preventing cache based tracking" is standardizing what an Etag should be. (2) I don't see how this is related to collision attacks, could you explain further?Kovar
(1) Yes, if you standardize it as a particular hash then that's fine, but good luck getting every website you'd like cached to go along with it.Connivent
(2) The existence of collision attacks means that the host could serve you one of many different pages, all of which had the same hash value. This would convince you to use the cached (but unique to you) page to request the linked resources.Connivent
(1) The sub title says "Extras that do require modifying the HTTP specification". (2) +1 Agreed, however that would require huge computational power, especially with big hashes, I don't think it's practical.Kovar
I chose this as the best answer because it is a valid, simple, minimalist workaround (Although it is NOT an ETag-based attack). And the HTTP spec isn't changing any time soon.Kovar
C
3

As long as any caching is used there's a potential exploit, even with the HTTP changes. Suppose the main page includes 100 images, each one randomly drawn from a potential pool of 2 images.

When a user returns to the site, her browser reloads the page (since the checksum doesn't match). On average, 25 of the 100 images will be cached from before. This combination can (almost certainly) be used to individually fingerprint the user.

Interestingly, this is almost exactly how DNA paternity testing works.

Connivent answered 3/12, 2013 at 11:26 Comment(5)
Thank you, that is very enlightening. However it's about exploiting the cache and not the ETag directly. My solution (Without the HTTP changes) still works against pure ETag-attacks. You've proven that cache tracking is indeed much harder to stop, even with the HTTP changes. I will post a separate question regarding cache-based tracking.Kovar
Very non critical for your point, but just wondering: How did you get to the number 25?Kovar
Sorry, that should read 50. The 25 was from a previous idea I was considering, where one of the items in each pair was randomly generated per-load.Connivent
This specific technique would fail in its current form. On average: 50 images will be requested after the first visit, 25 after the second, etc. after a couple of visits the browser will almost certainly not request any images and tracking will be lost. Though your point is still valid and I see the problem.Kovar
For maximum practicality, several sets of images would be used, with round-robin cache expiry dates. That would ensure that, for a reasonable range of revisit frequencies, at least one of the sets would provide effective fingerprinting.Connivent
T
0

The server could detect that for a number of resources you do a HEAD request which is not followed by a GET for the same resource. That's a tell if you were playing poker.

Just by having some resources cached, you are storing information. That information can be deduced by the server any time you do not re-request a resource named on the page.

Protecting your privacy in this manner comes at the cost of having to download every resource on the page with every visit. If you ever cache anything then you are storing information that can be inferred from your requests to the server.

Especially on mobile, where your bandwidth is more expensive and often slower, downloading all page resources on every visit could be impractical. I think at some level you have to accept that there are patterns in your interaction with the website which could be detected and profiled to identify you.

Terrier answered 2/12, 2013 at 18:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.