How to avoid database query storms using cache-aside pattern
Asked Answered
Q

2

6

We are using a PostgreSQL database and AppFabric Server, running a moderately busy ASP.NET MVC e-commerce site.

Following the cache-aside pattern we request data from our cache, and if it is not available, we query the database.

This approach results in 'query storms' where the database recieves multiple queries for the same data in a short space of time, while a given object in the cache is being refreshed. This issue is exacerbated by longer running queries, and obviously multiple requests for the same data can cause the query to run longer, forming an unpleasant feedback loop.

One solution to this problem is to use read-locking on the cache. However this can itself cause performance issues in a web farm situation (or even on a single busy web server) as web servers are blocked on reads for no reason, in case there is a database query taking place.

Another solution is to drop the cache-aside pattern and seed the cache independently. This is the approach we have taken to mitigate the immediate issues we are seeing with this problem, however it is not possible with all data.

Am I missing something here? And what other approaches have people taken to avoid this behaviour?

Quarterdeck answered 10/3, 2011 at 11:6 Comment(0)
A
1

Depending on the number of servers you have and your current cache architecture it may be worthwhile to evaluate adding a server-level (or in-process) cache as well. In effect you use this as a fallback cache, and it's especially helpful where hitting the primary storage (database) is either very resource intensive or slow.

When I've used this I've used the cache-aside pattern for the primary cache and a read-through design for the secondary--in which the secondary is locking and ensures the database isn't over-saturated by the same request. With this architecture a primary cache-miss results in at most one query per entity per server (or process) to the database.

So the basic workflow is:

1) Try to retrieve from primary / shared cache pool

* If successful, return
* If unsuccessul, continue

2) Check in-process cache for value

* If successful, return (optionally seeding primary cache)
* If unsuccessul, continue

3) Get lock by cache key (and double-check in-process cache, in case it's been added by another thread)

4) Retrieve object from primary persistence (db)

5) Seed in-process cache and return

I've done this using injectable wrappers, my cache layers all implement the relevant IRepository interface, and StructureMap injects the correct stack of caches. This keeps the actual cache behaviors flexible, focused, and easy to maintain despite being fairly complex.

Accusal answered 21/8, 2013 at 16:24 Comment(0)
M
0

We've used AppFabric successfully with the seeding strategy you mention above. We actually do use both solutions:

  1. Seed known data where possible (we have a limited set, so this is actually easy for us to figure out)
  2. Within each cache access method, make sure to do look-aside as necessary, and populate cache on retrieval from data store.

The look-aside is necessary, as items may be evicted due to memory pressure, or simply because they were missed in the seeding operation. We have a "warming" service that pulses on an interval (an hour) and keeps the cache populated with the necessary data. We keep analysis on cache misses, and use that to tweak our warming strategy if we see frequent misses during the warming interval.

Monetmoneta answered 6/4, 2011 at 2:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.