Haskell: Concurrent data structure guidelines
Asked Answered
E

2

7

I've been trying to get a understanding of concurrency, and I've been trying to work out what's better, one big IORef lock or many TVars. I've came to the following guidelines, comments will be appreciated, regarding whether these are roughly right or whether I've missed the point.


Lets assume our concurrent data structure is a map m, accessed like m[i]. Lets also say we have two functions, f_easy and f_hard. The f_easy is quick, f_hard takes a long time. We'll assume the arguments to f_easy/f_hard are elements of m.

(1) If your transactions look roughly like this m[f_easy(...)] = f_hard(...), use an IORef with atomicModifyIORef. Laziness will ensure that m is only locked for a short time as it's updated with a thunk. Calculating the index effectively locks the structure (as something is going to get updated, but we don't know what yet), but once it's known what that element is, the thunk over the entire structure moves to a thunk only over that particular element, and then only that particular element is "locked".

(2) If your transactions look roughly like this m[f_hard(...)] = f_easy(...), and the don't conflict too much, use lots of TVars. Using an IORef in this case will effectively make the app single threaded, as you can't calculate two indexes at the same time (as there will be an unresolved thunk over the entire structure). TVars let you work out two indexes at the same time, however, the negative is that if two concurrent transactions both access the same element, and one of them is a write, one transaction must be scrapped, which wastes time (which could have been used elsewhere). If this happens a lot, you may be better with locks that come (via blackholing) from IORef, but if it doesn't happen very much, you'll get better parallelism with TVars.

Basically in case (2), with IORef you may get 100% efficiency (no wasted work) but only use 1.1 threads, but with TVar if you have a low number of conflicts you might get 80% efficiency but use 10 threads, so you still end up 7 times faster even with the wasted work.

Essayistic answered 19/4, 2012 at 1:56 Comment(2)
Is there a question here? And I'm surprised MVar is entirely absent.Tyrothricin
One other way to get "safe" variables then having multiple threads is to use channels (haskell.org/ghc/docs/latest/html/libraries/base/…) and having a thread that supervises all the values and the other threads have to contact that one to get accces to a valueScissile
D
5

Your guidelines are somewhat similar to the findings of [1] (Section 6) where the performance of the Haskell STM is analyzed:

"In particular, for programs that do not perform much work inside transactions, the commit overhead appears to be very high. To further observe this overhead, an analysis needs to be conducted on the performance of commit-time course-grain and fine-grain STM locking mechanisms."

I use atomicModifyIORef or an MVar when all the synchronization I need is something that simple locking will ensure. When looking at concurrent accesses to a data structure, it also depends on how this data structure is implemented. For example, if you store your data inside a IORef Data.Map and frequently perform read/write access then I think atmoicModifyIORef will degrade to a single thread performance, as you have conjectured, but the same will be true for a TVar Data.Map. My point is that it's important to use a data structure that is suitable for concurrent programming (balanced trees aren't).

That said, in my opinion the winning argument for using STM is composability: you can combine multiple operations into a single transactions without headaches. In general, this isn't possible using IORef or MVar without introducing new locks.

[1] The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment. http://dx.doi.org/10.1145/1366230.1366241

Answer to @Clinton's comment:

If a single IORef contains all your data, you can simply use atomicModifyIORef for composition. But if you need to process lots of parallel read/write requests to that data, the performance loss might become significant, since every pair of parallel read/write requests to that data might cause a conflict.

The approach that I would try is to use a data structure where the entries themselves are stored inside a TVar (vs putting the whole data structure into a single TVar). That should reduce the possibility of livelocks, as transactions won't conflict that often.

Of course, you still want to keep your transactions as small as possible and use composability only if it's absolutely necessary to guarantee consistency. So far I haven't encountered a scenario where combining more than a few insert/lookup operations into a single transaction was necessary.

Desired answered 19/4, 2012 at 2:45 Comment(2)
Peter: I thought the problem with TVar is composability? You can always compose multiple operations using the atomicModifyIORef approach (if that IORef contains all your data). Using TVars though, if you have transaction TA1, TA2, ... that run every second and each time increment X, and transaction T2 that reads X, and then does work for 2 seconds before writing something, T2 never completes. You can't compose TA1, ... and T2 using TVars. You could prevent livelock by giving transactions priority based on creation time, but TVar doesn't seem to do this.Essayistic
@Clinton: I've added an answer to your comment.Desired
F
1

Beyond performance, I see a more fundamental reason to using TVar--the type system ensures you dont do any "unsafe" operations like readIORef or writeIORef. That your data is shared is a property of the type, not of the implementation. EDIT: unsafePerformIO is always unsafe. readIORef is only unsafe if you are also using atomicModifyIORef. At the very least wrap your IORef in a newtype and only expose a wrapped atomicModifyIORef

Beyond that, don't use IORef, use MVar or TVar

  1. The first usage pattern you describe probably does not have nice performance characteristics. You likely end up being (almost) entirely single threaded--because of laziness no actual work happens each time you update the shared state, but whenever you need to use this shared state, the entire accumulated pile of thunks needs to be forced, and has a linear data dependency structure.
  2. Having 80% efficiency but substantially higher parallelism allows you to exploit growing number of cores. You can expect minimal performance improvements over the coming years on single threaded code.
  3. Many word CAS is likely coming to a processor near you in the form of "Hardware Transactional Memory" allowing STMs to become far more efficient.
  4. Your code will be more modular--every piece of code has to be changed if you add more shared state when your design has all shared state behind a single reference. TVars and to a lesser extent MVars support natural modularity.
Fictional answered 19/4, 2012 at 2:39 Comment(7)
Regarding "unsafe" operations, there are always unsafe operations, e.g. "performUnsafeIO". It's just another unsafe operation we know to avoid unless we're really careful.Essayistic
How do you use MVar? By continually calling takeMVar and putMVar? Isn't that similar to using an IORef and atomicModifyIORef anyway?Essayistic
1. You can easily fork off the work in parallel, there's no need to wait for these thunks to be forced. I believe TVars are also lazy anyway, won't you get the same problem?Essayistic
The fear I have with TVar is livelock. Namely that all transactions have to take less time to complete than your maximum gap between conflicts with that transaction. If there's a transaction T1 that runs every second that touches X, and another transaction T2 that reads X that takes 2 seconds to complete, T2 will never complete. The type system doesn't protect against this. Code can randomly fail depending on usage patterns. This is more difficult to avoid than simply not using "readIORef" and "writeIORef".Essayistic
although in principle livelock is a problem, I'm not sure if anyone has any evidence of it actually causing the kinds of problems you describe with the current STM. It is also possible for STM implementations to ensure some degree of fairness.Fictional
the potential advantage of MVar is that you can use more than 1, so long as you have a global order for access (to prevent deadlock). Since takeMVar and putMVar are synchronized.Fictional
"You can easily fork off the work in parallel" my concern is this is not true. All your updates depend on a single IORef, so have a global ordering which asserts a single linear data dependency path through all your data. The parallelism is only local to a single operation, which you would be getting anyway from the spark pool.Fictional

© 2022 - 2024 — McMap. All rights reserved.