Ignoring cookies list efficiently in NGINX reverse proxy setup
Asked Answered
S

2

8

I am currently working/testing microcache feature in NGINX reverse proxy setup for dynamic content.

One big issue that occurs is sessions/cookies that need to be ignored otherwise people will logon with random accounts on the site(s).

Currently I am ignoring popular CMS cookies like this:

if ($http_cookie ~* "(joomla_[a-zA-Z0-9_]+|userID|wordpress_(?!test_)[a-zA-Z0-9_]+|wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+|comment_author_[a-zA-Z0-9_]+|woocommerce_cart_hash|woocommerce_items_in_cart|wp_woocommerce_session_[a-zA-Z0-9]+|sid_customer_|sid_admin_|PrestaShop-[a-zA-Z0-9]+") 
    {

# set ignore variable to 1
# later used in:
# proxy_no_cache                 $IGNORE_VARIABLE;
# proxy_cache_bypass             $IGNORE_VARIABLE;
# makes sense ?

    }

However this becomes a problem if I want to add more cookies to the ignore list. Not to mention that using too many "if" statements in NGINX is not recommended as per the docs.

My questions is, if this could be done using a map method ? I saw that regex in map is different( or maybe I am wrong ).

Or is there another way to efficiently ignore/bypass cookies ?

I have search a lot on stackoverflow, and whilst there are so many different examples; I could not find something specific for my needs.

Thank you

Update:

A lot of reading and "digging" on the internet ( we might as well just say Google ), and I found quite some interesting examples.

However I am very confused with these, as I do not fully understand the regex usage and I am afraid to implement such without understanding it.

Example 1:

map $http_cookie $cache_uid {
  default nil;
  ~SESS[[:alnum:]]+=(?<session_id>[[:alnum:]]+) $session_id;
}
  1. In this example I can notice that the regex is very different from the ones used in "if" blocks. I don't understand why the pattern starts without any "" and directly with just a ~ sign.

  2. I don't understand what does [[:alnum:]]+ mean ? I search for this but I was unable to find documentation. ( or maybe I missed it )

  3. I can see that the author was setting "nil" as default, this will not apply for my case.

Example 2:

map $http_cookie $cache_uid {
  default  '';
  ~SESS[[:alnum:]]+=(?<session_id>[[:graph:]]+)  $session_id;
}
  1. Same points as in Example 1, but this time I can see [[:graph:]]+. What is that ?

My Example (not tested):

map $http_cookie $bypass_cache {

    "~*wordpress_(?!test_)[a-zA-Z0-9_]+"  1;
    "~*wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+"  1;
    "~*comment_author_[a-zA-Z0-9_]+"  1;
    "~*[a-zA-Z0-9]+_session)"  1;

    default      0;
}

In my pseudo example, the regex must be wrong since I did not find any map cookie examples with such regex.

So once again my goal is to have a map style list of cookies that I can bypass the cache for, with proper regex.

Any advice/examples much appreciated.

Siderolite answered 11/7, 2019 at 9:10 Comment(3)
So you want to bypass cache if any of these cookies are found or you want to remove tamper with the cookies?Wendellwendi
@TarunLalwani - Yes, I want to bypass the cache if those cookies are met. Otherwise the micro-cache will cause terrible problems, and people will login and see other's accounts and data.Siderolite
thanks for all the fish, +1!Majolica
M
5

What exactly are you trying to do?

The way you're doing it, by trying to blacklist only certain cookies from being cached, through if ($http_cookie …, is a wrong approach — this means that one day, someone will find a cookie that is not blacklisted, and which your backend would nonetheless accept, and cause you cache poisoning or other security issues down the line.

There's also no reason to use the http://nginx.org/r/map approach to get the values of the individual cookies, either — all of this is already available through the http://nginx.org/r/$cookie_ paradigm, making the map code for parsing out $http_cookie rather redundant and unnecessary.

Are there any cookies which you actually want to cache? If not, why not just use proxy_no_cache $http_cookie; to disallow caching when any cookies are present?


What you'd probably want to do is first have a spec of what must be cached and under what circumstances, only then resorting to expressing such logic in a programming language like nginx.conf.

For example, a better approach would be to see which URLs should always be cached, clearing out the Cookie header to ensure that cache poisoning isn't possible (proxy_set_header Cookie "";). Else, if any cookies are present, it may either make sense to not cache anything at all (proxy_no_cache $http_cookie;), or to structure the cache such that certain combination of authentication credentials are used for http://nginx.org/r/proxy_cache_key; in this case, it might also make sense to reconstruct the Cookie request header manually through a whitelist-based approach to avoid cache-poisoning issues.

Majolica answered 15/7, 2019 at 3:11 Comment(4)
Thank you for your clear explanation, my ultimate purpose would be to ignore all cookies from micro-cache. I had no idea "proxy_no_cache $http_cookie;" would actually work and ignore all cookies ? Is this correct ?Siderolite
Also I really do not see a reason why one would cache "some" cookies at all.. I don't understand why all the tutorials and articles show how to ignore only specific cookies from microcache...Siderolite
@NorbertBoros if the user is logged in, why not have a user-specific cache? Or if cookies are used for no good reason — for example, I get rid of cookies from OpenGrok on BXR.SU, and cache every single response; see https://mcmap.net/q/1470052/-is-it-possible-to-set-up-nginx-without-cookies. As for proxy_no_cache $http_cookie;, yes, I think that'll do exactly what you want, then — see nginx.org/r/proxy_no_cache, the doc is very clear — "If at least one value of the string parameters is not empty and is not equal to “0” then the response will not be saved". :-)Majolica
@NorbertBoros, well, you gotta know what you're caching; if it's all static resources, then you should just serve it statically, and there's no need for caching; if it's dynamic content, chances are there's cookies involved, and then you can't really cache it without introducing issues.Majolica
W
1

You 2nd example that you have is what you actually need

map $http_cookie $bypass_cache {

    "~*wordpress_(?!test_)[a-zA-Z0-9_]+"  1;
    "~*wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+"  1;
    "~*comment_author_[a-zA-Z0-9_]+"  1;
    "~*[a-zA-Z0-9]+_session)"  1;

    default      0;
}

Basically here what you are saying the bypass_cache value will be 1 if the regex is matched else 0.

So as long as you got the pattern right, it will work. And that list only you can have, since you would only know which cookies to bypass cache on

Wendellwendi answered 13/7, 2019 at 15:43 Comment(7)
But there is literally no example on the internet with this regex style [a-zA-Z0-9_]+ inside a map ? Are you sure ?Siderolite
Basically its a regex pattern and people use only what is needed and in most it cases they want a value containing something and not a very specific thing that it has to be number or somethingWendellwendi
And there are different uses also, like see this post serverfault.com/questions/482372/…Wendellwendi
Thank you for your answer, however I do know it's a regex pattern... my question is different. Why does nobody use [a-zA-Z0-9_]+ style inside a map directive, and what does [[:alnum:]]+ differentiate ?Siderolite
There are just same, but :alnum: is POSIX way of providing the regex. regular-expressions.info/posixbrackets.html. I have never used the POSIX ones in nginx, I usually prefer using the [a-zA-Z0-9_]+ onlyWendellwendi
See this github.com/AntonRiab/slim_middle_samples/blob/…. Its not that people don't use it. It just that its not used a lotWendellwendi
Ok then, let me test my example in production and see.Siderolite

© 2022 - 2024 — McMap. All rights reserved.