I'm writing a program where a large number of agents listen for events and react on them. Since Control.Concurrent.Chan.dupChan
is deprecated I decided to use TChan's as advertised.
The performance of TChan is much worse than I expected. I have the following program that illustrates the issue:
{-# LANGUAGE BangPatterns #-}
module Main where
import Control.Concurrent.STM
import Control.Concurrent
import System.Random(randomRIO)
import Control.Monad(forever, when)
allCoords :: [(Int,Int)]
allCoords = [(x,y) | x <- [0..99], y <- [0..99]]
randomCoords :: IO (Int,Int)
randomCoords = do
x <- randomRIO (0,99)
y <- randomRIO (0,99)
return (x,y)
main = do
chan <- newTChanIO :: IO (TChan ((Int,Int),Int))
let watcher p = do
chan' <- atomically $ dupTChan chan
forkIO $ forever $ do
r@(p',_counter) <- atomically $ readTChan chan'
when (p == p') (print r)
return ()
mapM_ watcher allCoords
let go !cnt = do
xy <- randomCoords
atomically $ writeTChan chan (xy,cnt)
go (cnt+1)
go 1
When compiled (-O) and run the program first will output something like this:
./tchantest ((0,25),341) ((0,33),523) ((0,33),654) ((0,35),196) ((0,48),181) ((0,48),446) ((1,15),676) ((1,50),260) ((1,78),561) ((2,30),622) ((2,38),383) ((2,41),365) ((2,50),596) ((2,57),194) ((3,19),259) ((3,27),344) ((3,33),65) ((3,37),124) ((3,49),109) ((3,72),91) ((3,87),637) ((3,96),14) ((4,0),34) ((4,17),390) ((4,73),381) ((4,74),217) ((4,78),150) ((5,7),476) ((5,27),207) ((5,47),197) ((5,49),543) ((5,53),641) ((5,58),175) ((5,70),497) ((5,88),421) ((5,89),617) ((6,0),15) ((6,4),322) ((6,16),661) ((6,18),405) ((6,30),526) ((6,50),183) ((6,61),528) ((7,0),74) ((7,28),479) ((7,66),418) ((7,72),318) ((7,79),101) ((7,84),462) ((7,98),669) ((8,5),126) ((8,64),113) ((8,77),154) ((8,83),265) ((9,4),253) ((9,26),220) ((9,41),255) ((9,63),51) ((9,64),229) ((9,73),621) ((9,76),384) ((9,92),569) ...
And then, at some point, will stop writing anything, while still consuming 100% cpu.
((20,56),186) ((20,58),558) ((20,68),277) ((20,76),102) ((21,5),396) ((21,7),84)
With -threaded the lockup is even faster and occurs after only a handful of lines. It will also consume whatever number of cores are made available through RTS' -N flag.
Additionally the performance seems rather poor - only about 100 events per second are processed.
Is this a bug in STM or am I misunderstanding something about semantics of STM?
Chan
will wake up a single reader while STM'sTChan
will wake all readers for each individual write. Aside from that, Neil Brown has a good suggestion for you in his answer. – Icecold