Stop Reentrancy on MemoryCache Calls
Asked Answered
S

2

7

The app needs to load data and cache it for a period of time. I would expect that if multiple parts of the app want to access the same cache key at the same time, the cache should be smart enough to only load the data once and return the result of that call to all callers. However, MemoryCache is not doing this. If you hit the cache in parallel (which often happens in the app) it creates a task for each attempt to get the cache value. I thought that this code would achieve the desired result, but it doesn't. I would expect the cache to only run one GetDataAsync task, wait for it to complete, and use the result to get the values for other calls.

using Microsoft.Extensions.Caching.Memory;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace ConsoleApp4
{
    class Program
    {
        private const string Key = "1";
        private static int number = 0;

        static async Task Main(string[] args)
        {
            var memoryCache = new MemoryCache(new MemoryCacheOptions { });

            var tasks = new List<Task>();
            tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
            tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
            tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));

            await Task.WhenAll(tasks);

            Console.WriteLine($"The cached value was: {memoryCache.Get(Key)}");
        }

        public static async Task<int> GetDataAsync()
        {
            //Simulate getting a large chunk of data from the database
            await Task.Delay(3000);
            number++;
            Console.WriteLine(number);
            return number;
        }
    }
}

That's not what happens. The above displays these results (not necessarily in this order):

2

1

3

The cached value was: 3

It creates a task for each cache request and discards the values returned from the other two.

This needlessly spends time and it makes me wonder if you can say this class is even thread-safe. ConcurrentDictionary has the same behaviour. I tested it and the same thing happens.

Is there a way to achieve the desired behaviour where the task doesn't run 3 times?

Schmaltzy answered 9/1, 2021 at 8:15 Comment(7)
You may find this interesting: Async threadsafe Get from MemoryCacheBryannabryansk
And this as wellApricot
Use Lazy (or preferably LazyWithNoExceptionCaching - https://mcmap.net/q/67008/-system-lazy-lt-t-gt-with-different-thread-safety-mode-duplicate) entries in the cache.Telefilm
dotnetfiddle.net/4Z0pwoTelefilm
@mjwills: Using Lazy<Task> is tricky. If the task fails the lazy value is still created as far as Lazy<T> is concerned. The exception is only thrown when the task is awaited and Lazy<T> will not await the task. However, when your code awaits the lazy task the exception is thrown and this will keep on happening. Lazy<T> will not try to create the value (the task) again even though the task is failed.Redraft
Good point @MartinLiversage.Telefilm
@Telefilm you may want to take a look at Stephen Cleary's AsyncLazy<T> implementation, and how they use the RetryOnFailure flag.Bryannabryansk
R
5

MemoryCache leaves it to you to decide how to handle races to populate a cache key. In your case you don't want multiple threads to compete to populate a key presumably because it's expensive to do that.

To coordinate the work of multiple threads like that you need a lock, but using a C# lock statement in asynchronous code can lead to thread pool starvation. Fortunately, SemaphoreSlim provides a way to do async locking so it becomes a matter of creating a guarded memory cache that wraps an underlying IMemoryCache.

My first solution only had a single semaphore for the entire cache putting all cache population tasks in a single line which isn't very smart so instead here is more elaborate solution with a semaphore for each cache key. Another solution could be to have a fixed number of semaphores picked by a hash of the key.

sealed class GuardedMemoryCache : IDisposable
{
    readonly IMemoryCache cache;
    readonly ConcurrentDictionary<object, SemaphoreSlim> semaphores = new();

    public GuardedMemoryCache(IMemoryCache cache) => this.cache = cache;

    public async Task<TItem> GetOrCreateAsync<TItem>(object key, Func<ICacheEntry, Task<TItem>> factory)
    {
        var semaphore = GetSemaphore(key);
        await semaphore.WaitAsync();
        try
        {
            return await cache.GetOrCreateAsync(key, factory);
        }
        finally
        {
            semaphore.Release();
            RemoveSemaphore(key);
        }
    }

    public object Get(object key) => cache.Get(key);

    public void Dispose()
    {
        foreach (var semaphore in semaphores.Values)
            semaphore.Release();
    }

    SemaphoreSlim GetSemaphore(object key) => semaphores.GetOrAdd(key, _ => new SemaphoreSlim(1));

    void RemoveSemaphore(object key)
    {
        if (semaphores.TryRemove(key, out var semaphore))
            semaphore.Dispose();
    }
}

If multiple threads try to populate the same cache key only a single thread will actually do it. The other threads will instead return the value that was created.

Assuming that you use dependency injection, you can let GuardedMemoryCache implement IMemoryCache by adding a few more methods that forward to the underlying cache to modify the caching behavior throughout your application with very few code changes.

Redraft answered 9/1, 2021 at 14:4 Comment(10)
Using a single SemaphoreSlim for the whole memory cache means that while creating asynchronously a single cache key, all other keys will be blocked. Which is a pretty serious flaw, hence my downvote. You may be able to fix it by using one SemaphoreSlim per key, but I think that the AsyncLazy solutions are easier to implement.Bryannabryansk
Martin your updated solution has a race condition: Thread A acquires and releases the semaphore of a specific key. Thread B immediately acquires the semaphore. Thread A removes the semaphore from the dictionary (RemoveSemaphore(key)). Thread C creates a new semaphore for the specific key. Result: B and C threads are concurrently invoking the factory for the specific key. Doing it correctly is quite tricky. Take a look at this question if you want.Bryannabryansk
At first glance, I thought this was wrong, but I can see how this works. Yes, I figured that keeping a dictionary of semaphores might be an approach that works. It just seems crazy that we have to go do this extent. Why isn't this just something builtin to the memory cache class?Schmaltzy
This is looking nice. I will try this out soon.Schmaltzy
Arguably, the whole point of MemoryCache is to deal with race conditions...Schmaltzy
@ChristianFindlay agreed, MemoryCache is garbageOutcross
Just that, this is not ironclad. ConcurrentDictionary.GetOrAdd(TKey, Func<TKey,TValue>) is not completely thread-safe (the authors didn't want to lock around the delegate). @Martin Liversage, could you please update your answer accordingly. See learn.microsoft.com/en-us/dotnet/api/…. I'd say use lazyCache, fusionCache, or handroll your own critical section impl (see github.com/rmandvikar/threading#cachewithtaskandlockpool-impl).Spoilage
@Spoilage In the case where two calls to GetOrAdd executes concurrently an extra unused SemaphoreSlim instance can be created. This instance will eventually be finalized. There will never be two semaphores for the same key in the dictionary. valueFactory may be called multiple times, but only one key/value pair will be added to the dictionary.Redraft
@MartinLiversage i understand that but the ask is valueFactory must be called only once, and this impl doesn't guarantee that. btw, your impl is also mentioned in this github issue (github.com/dotnet/runtime/issues/36499).Spoilage
I think this implementation has a problem, as it can dispose a SemaphoreSlim for a given key before subsequent waiters have called .Release on the same reference, hence causing an ObjectDisposedException when they do. But still found this useful, thanks.Indemnification
S
4

There are different solutions available, the most famous of which is probably LazyCache: it's a great library.

Another one that you may find useful is FusionCache ⚡🦥, which I recently released: it has the exact same feature (although implemented differently) and much more.

The feature you are looking for is described here and you can use it like this:

var result = await fusionCache.GetOrSetAsync(
  Key,
  _ => await GetDataAsync(),
  TimeSpan.FromMinutes(2)
);

You may also find some of the other features interesting, like fail-safe, advanced timeouts with background factory completion and support for an optional, distributed 2nd level.

If you will give it a chance please let me know what you think.

/shameless-plug

Schonfield answered 14/1, 2021 at 13:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.