Server side caching of dynamic content with Nginx and Etags
Asked Answered
H

1

5

I have a CouchDB DB, with an Nginx reverse proxy in front of it. Some responses from CouchDB take a long time to generate (yes, it was a bad choice, but need to stick with it for now), and I would like to cache them with Nginx. (Currently Nginx only does SSL.)

CouchDB supports Etags, so ideally what I would like is Nginx caching the Etags as well for dumb clients. The clients do not use Etags, they would just query Nginx, which goes to CouchDB with its cached Etag, and then either sends back the cached response, or the new one to the client.

My understanding based on the docs is that Nginx cannot do this currently. Have I missed something? Is there an alternative that supports this setup? Or the only solution is to invalidate the Nginx cache by hand?

Hornbill answered 25/2, 2015 at 16:48 Comment(0)
C
7

I am assuming that you already looked at varnish and did not find it suitable for your case. There are two ways you can achieve what you want.

With nginx

Nginx has a default caching mechanism that you can configure for your use.

If that does not help, you should give Nginx compiled with the 3rd party Ngx_Lua module a try. This is also conveniently packaged along with other useful modules and the required Lua environment as Openresty.

With Ngx_Lua, you can use the shared dictionary to cache your couchdb responses. As the name suggests shared dictionary uses a shared memory zone in Ngx_Lua's execution environment. This is similar to the way the proxy_cache works in Nginx(which also defines a shared memory zone in Nginx's execution environment) but comes with the added advantage that you can program it.

The steps required to build a couchdb cache are pretty simple (with this approach you don't need to send etags to the client)

  1. You make a request to couchdb
  2. You save the {Url-Etag:response} pair
  3. Next time the request comes to the same url query for etags using a HEAD request.
  4. If response etag matches the {Url-Etag:response} pair then send the cached response otherwise query couchdb again using (get/post) methods and update the {Url-Etag:response} pair before sending the response to the client.

Of course if you program a cache by hand you will have to define max cache size and a mechanism to remove old items from the cache. The lua_shared_dict directive can help you define a memory size for which the responses will be cached. When saving the values in the shared dictionary you can specify the time for which the value will remain the memory zone after which it will automatically expire. Combining the max cache size parameter and cache time parameter of the shared dictionary you should be able to program fairly complex caching mechanism for your users.

With erlang

Since couchdb is written in erlang you already have an erlang env on your machine. So if you can program in it you can create a very robust distributed cache with mnesia. The steps are the same. Erlang timers can be combined with gen_* behaviours to give you the automatic expiry of items and mnesia has functions to monitor it's memory usage and notify you about it. The two approaches are almost equivalent the only difference being that mnesia can be distributed.

Update

As @abyz suggested redis is also good choice when it comes to caching.

Counteraccusation answered 13/3, 2015 at 7:47 Comment(5)
Thanks, this is a lot of info. I'll dig through it. Btw. I think my approach might be fundamentally wrong. :( More soon.Hornbill
@AkshatJiwanSharma I suggest that you also add "Redis Based cache using Lua" to your answer. It adds another layer of managing Redis to the solution, but nearly all of caching requirements (TTL, Memory Management, ...) are already implemented.Prussiate
@AkshatJiwanSharma Thanks again, I think you are right and all these solutions work. They all require custom code, but it seems that I cannot avoid custom code, anyway. I haven't yet implemented any of youe suggestions, and the reason is that CouchDB might actually go to the disk and recalculate everything, even if the Etag is the same. Older CouchDB versions did this, and newer ones might still do the same. I'll have to test it for myself.Hornbill
@GaborCsardi for couchdb first try the dev mailing list. Couchdb might go to disk but recalculation is very small operation. For views the etag generation depends upon the query string parameters and update seq of the database. Only update seq should require a disk seek. Or maybe not even that if update seq are kept in memory .For individual docs etags are just the _rev numbers which do require a disk seek but no calculation. Since _rev is the etag. But to be sure ask the devs on the mailing lists. Or try their irc channel.Counteraccusation
@AkshatJiwanSharma Yeah, for me recalculation is costly, actually, because a list function goes over a lot of documents. Thanks for the mailing list tips.Hornbill

© 2022 - 2024 — McMap. All rights reserved.