AppFabric Cache concurrency issue?
Asked Answered
D

2

7

While stress testing prototype of our brand new primary system, I run into concurrent issue with AppFabric Cache. When concurrently calling many DataCache.Get() and Put() with same cacheKey, where I attempt to store relatively large objet, I recieve "ErrorCode:SubStatus:There is a temporary failure. Please retry later." It is reproducible by the following code:

        var dcfc = new DataCacheFactoryConfiguration
        {
            Servers = new[] {new DataCacheServerEndpoint("localhost", 22233)},
            SecurityProperties = new DataCacheSecurity(DataCacheSecurityMode.None, DataCacheProtectionLevel.None),
        };

        var dcf = new DataCacheFactory(dcfc);
        var dc = dcf.GetDefaultCache();

        const string key = "a";
        var value = new int [256 * 1024]; // 1MB

        for (int i = 0; i < 300; i++)
        {
            var putT = new Thread(() => dc.Put(key, value));
            putT.Start();               

            var getT = new Thread(() => dc.Get(key));
            getT.Start();
        }

When calling Get() with different key or DataCache is synchronized, this issue will not appear. If DataCache is obtained with each call from DataCacheFactory (DataCache is supposed to be thread-safe) or timeouts are prolonged it has no effect and error is still received. It seems to me very strange that MS would leave such bug. Did anybody faced similar issue?

Deprivation answered 28/7, 2011 at 13:12 Comment(1)
Retry later is a very generic error. Try looking at the inner execption or substatus of the exception, that can give you a hint of what's going on. The exception may still need to be handled but this will atleast make it rational.Orola
K
7

I also see the same behavior and my understanding is that this is by design. The cache contains two concurrency models:

  • Optimistic Concurrency Model methods: Get, Put, ...
  • Pessimistic Concurrency Model: GetAndLock, PutAndLock, Unlock

If you use optimistic concurrency model methods like Get then you have to be ready to get DataCacheErrorCode.RetryLater and handle that appropriately - I also use a retry approach.

You might find more information at MSDN: Concurrency Models

Koheleth answered 1/9, 2011 at 5:27 Comment(0)
R
3

We have seen this problem as well in our code. We solve this by overloading the Get method to catch expections and then retry the call N times before fallback to a direct request to SQL.

Here is a code that we use to get data from the cache

    private static bool TryGetFromCache(string cacheKey, string region, out GetMappingValuesToCacheResult cacheResult, int counter = 0)
    {
    cacheResult = new GetMappingValuesToCacheResult();

    try
    {
        // use as instead of cast, as this will return null instead of exception caused by casting.
        if (_cache == null) return false;

        cacheResult = _cache.Get(cacheKey, region) as GetMappingValuesToCacheResult;

        return cacheResult != null;
    }
    catch (DataCacheException dataCacheException)
    {
        switch (dataCacheException.ErrorCode)
        {
            case DataCacheErrorCode.KeyDoesNotExist:
            case DataCacheErrorCode.RegionDoesNotExist:
                return false;
            case DataCacheErrorCode.Timeout:
            case DataCacheErrorCode.RetryLater:
                if (counter > 9) return false; // we tried 10 times, so we will give up.

                counter++;
                Thread.Sleep(100);
                return TryGetFromCache(cacheKey, region, out cacheResult, counter);
            default:
                EventLog.WriteEntry(EventViewerSource, "TryGetFromCache: DataCacheException caught:\n" +
                        dataCacheException.Message, EventLogEntryType.Error);

                return false;
        }
    }
}

Then when we need to get something from the cache we do:

TryGetFromCache(key, region, out cachedMapping)

This allows us to use Try methods that encasulates the exceptions. If it returns false, we know thing is wrong with the cache and we can access SQL directly.

Recessive answered 29/7, 2011 at 8:12 Comment(3)
Thanks for your reply, I am happy that I am not alone :-) But I cannot reconcile with such workaround and I cannot imagine it for mission critical application in production. I will try to report it to Microsoft or maybe use Memcached instead.Deprivation
I get you. We use this setup in one of our most important web services, with over a million hits per day. One server may be processing over 4000 transactions per minute. This setup will make sure that the cache has time to respond. (as well as deal with exceptions as locally as possible.) I love Try methods :)Shamble
Please read appfabriccat.com/2011/07/…Shamble

© 2022 - 2024 — McMap. All rights reserved.