Using a hashtable inside a Parallel.ForEach?
Asked Answered
B

4

8

I have a Parallel.ForEach loop running an intensive operation inside the body.

The operation can use a Hashtable to store the values, and can be reused for other consecutive loop items. I add to the Hashtable after the intensive operation is complete, the next loop item can look up in the Hashtable and reuse the object, instead of running the intensive operation again.

However, because I am using Parallel.ForEach there is a unsafe issue, causing the Hashtable.Add and the ContainsKey(key) calls go out of sync, as they might be running in parallel. Introducing locks may cause perf issues.

Here's the sample code:

Hashtable myTable = new Hashtable;
Parallel.ForEach(items, (item, loopState) =>
{
    // If exists in myTable use it, else add to hashtable
    if(myTable.ContainsKey(item.Key))
    {
       myObj = myTable[item.Key];
    }
    else
    {
       myObj = SomeIntensiveOperation();
       myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime
    }
    // Do something with myObj
    // some code here
}

There must be some API, Property setting inside TPL library, that could handle this scenario. Is there?

Bullis answered 1/11, 2009 at 18:20 Comment(0)
J
18

You're looking for System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>. The new concurrent collections use significantly improved locking mechanisms and should perform excellectly in parallel algorithms.

Edit: The result might look like this:

ConcurrentDictionary<T,K> cache = ...;
Parallel.ForEach(items, (item, loopState) =>
{
    K value;
    if (!cache.TryGetValue(item.Key, out value))
    {
        value = SomeIntensiveOperation();
        cache.TryAdd(item.Key, value);
    }

    // Do something with value
} );

Word of warning: if the elements in items do not all have unique item.Key, then SomeIntensiveOperation could get called twice for that key. In the example, the key isn't passed to SomeIntensiveOperation, but it means that the "Do something with value" code could execute key/valueA and key/valueB pairs, and only one result would get stored in the cache (not necessarily the first one computed by SomeIntensiveOperation either). You'd need a parallel lazy factory to handle this if it's a problem. Also, for obvious reasons SomeIntensiveOperation should be thread safe.

Janenejanenna answered 1/11, 2009 at 18:41 Comment(3)
@AdamRalph : since he is using TPL library he is already using .net 4.0Bronchitis
@Adam & Yassir: correct, the new collections were designed with Parallel LINQ in mind.Janenejanenna
Yup Thanks for the answers and commentsBullis
B
4

check the System.Collections.Concurrent namespace i think you need ConcurrentDictionary

Bronchitis answered 1/11, 2009 at 18:42 Comment(0)
R
3

Use a ReaderWriterLock, this has good performance for work that has many reads and few writes that are of a short duration. Your problem seems to fit this specification.

All read operations will run quickly and lock free, the only time anyone will be blocked is when a write is happening, and that write is only as long as it takes to shove something in a Hashtable.

ReaderWriterLockSlim on MSDN

I guess I'll throw down some code...

ReaderWriterLockSlim cacheLock = new ReaderWriterLockSlim();
Hashtable myTable = new Hashtable();
Parallel.ForEach(items, (item, loopState) =>
{
    cacheLock.EnterReadLock();
    MyObject myObj = myTable.TryGet(item.Key);
    cacheLock.ExitReadLock();

    // If the object isn't cached, calculate it and cache it
    if(myObj == null)
    {
       myObj = SomeIntensiveOperation();
       cacheLock.EnterWriteLock();
       try
       {
           myTable.Add(item.Key, myObj);
       }
       finally
       {
           cacheLock.ExitWriteLock();
       }           
    }
    // Do something with myObj
    // some code here
}

static object TryGet(this Hashtable table, object key)
{
    if(table.Contains(key))
        return table[key]
    else
        return null;
}
Recollected answered 1/11, 2009 at 18:42 Comment(4)
"The .NET Framework has two reader-writer locks, ReaderWriterLockSlim and ReaderWriterLock. ReaderWriterLockSlim is recommended for all new development. ReaderWriterLockSlim is similar to ReaderWriterLock, but it has simplified rules for recursion and for upgrading and downgrading lock state. ReaderWriterLockSlim avoids many cases of potential deadlock. In addition, the performance of ReaderWriterLockSlim is significantly better than ReaderWriterLock."Janenejanenna
That advice seems sound, so I've updated my answer. For those interested take a look at this MSDN magazine article: msdn2.microsoft.com/en-us/magazine/cc163599.aspxRecollected
Why doesn't this have the same issue that HashTable.Synchronized() causes with a two thread race condition where both get a null return from TryGet and then both calculate myObj and attempt to add it?Irritate
This is only meant to make accesses to the hashtable atomic; it does no key collision detection but should worst-case be a less efficient last-write-wins (> 1 threads do the same work). It's also possible that the lock used by Syncrhonized() is static for back-compat reasons while ReaderWriterLockSlim has apparently been updated to avoid many old deadlock cases through a different locking model.Recollected
M
1

I see no other correct choice than to use (more or less explicit) locks (A synchronized Hashtable just overrides all methods with locks).

Another option could be to allow the dictionary to go out of sync. The race condition will not corrupt the dictionary, it will just require the code to do some superfluous computations. Profile the code to check whether the lock or missing memoization has worse effects.

Meetly answered 1/11, 2009 at 18:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.