Caffeine: How to come up with an appropriate cache size

Asked 15/9, 2016 at 4:26 Answered 15/9, 2016 at 5:3

Solved java caching garbage-collection jvm caffeine

I have a computationally intensive one-off offline processing task that takes me few hours to run and I am using Caffeine as my in-memory cache. What is a good heuristic to set the maximum cache size? I am running my Java program with 8GB of RAM and I am willing to give the cache about 4GB of it but I am unsure how memory translates to actual size of my cache entires. I decided to go with .softValues() to let the JVM decide but I ran into the following words in the JavaDoc of Caffeine:

Warning: in most circumstances it is better to set a per-cache maximum size instead of using soft references. You should only use this method if you are well familiar with the practical consequences of soft references.

Kaddish answered 15/9, 2016 at 4:26 Comment(0)

Soft references are conceptually attractive, but typically hurt performance in long-running JVMs. This is because they create heap pressure by filling up the old generation and are only collected during a full GC. This can result in GC thrashing where each time enough memory is freed, it is quickly consumed and another full GC is required. For latency sensitive applications this is further impacted as eviction is global, as there is no way to hint which caches are the most critical.

Soft references shouldn't be a default, go to strategy. It might be a reasonable simplification in a throughput, non-user facing task. But when GC time, latency, and predictable performance are important then it can be dangerous.

Unfortunately the best answer for sizing is to guess, measure, and repeat. Export the statistics, try a setting, and adjust appropriately. The hit rate curve can be obtained by capturing an access trace (log of key hashes) and simulating it with different sizes. Its interesting data but usually a few simple runs for tuning is good enough.

Ambages answered 15/9, 2016 at 5:3 Comment(0)

Soft references allows the VM to reclaim the object if it runs out of memory. This is in a way a different strategy to a cache. You could simply have a WeakHashMap(there is however a difference between SoftReference and WeakReference)

One big difference is that a cache typically let's you decide on a strategy for evicting objects (lru, fifo etc) while Soft/Weak references won't.

You should be able to guess the size of an object at least in magnitude. Is it 1k, 1mb, 10mb?

If you really have no idea how big your objects are, most caches let you add a listener to evictions and log it. That combined with a log for cache misses on lookup should give you a good idea of how the cache is performing.

Protrude answered 15/9, 2016 at 4:51 Comment(0)

Recommended topics

Hot tags