Repa and Accelerate API Similarity
The Haskell repa library is for automatically parallel array computation on CPUs. The accelerate library is automatic data parallelism on GPUs. The APIs are quite similar, with identical representations of N-dimensional arrays. One can even switch between accelerate and repa arrays with fromRepa
and toRepa
in Data.Array.Accelerate.IO
:
fromRepa :: (Shapes sh sh', Elt e) => Array A sh e -> Array sh' e
toRepa :: Shapes sh sh' => Array sh' e -> Array A sh e
There are multiple backends for accelerate, including LLVM, CUDA and FPGA (see Figure 2 of http://www.cse.unsw.edu.au/~keller/Papers/acc-cuda.pdf). I've spotted a repa backend for accelerate, though the library doesn't appear to be maintained. Given that the repa and accelerate programming models are similar, I am hopeful that there is an elegant way of switching between them i.e. functions written once can be executed with repa's R.computeP or with one of accelerate's backends e.g. with the CUDA run function.
Two very similar functions: Repa and Accelerate on a Pumpkin
Take a simple image processing thresholding function. If a grayscale pixel value is less than 50, then it is set to 0, otherwise it retains its value. Here's what it does to a pumpkin:
The following code presents repa and accelerate implementations:
module Main where
import qualified Data.Array.Repa as R
import qualified Data.Array.Repa.IO.BMP as R
import qualified Data.Array.Accelerate as A
import qualified Data.Array.Accelerate.IO as A
import qualified Data.Array.Accelerate.Interpreter as A
import Data.Word
-- Apply threshold over image using accelerate (interpreter)
thresholdAccelerate :: IO ()
thresholdAccelerate = do
img <- either (error . show) id `fmap` A.readImageFromBMP "pumpkin-in.bmp"
let newImg = A.run $ A.map evalPixel (A.use img)
A.writeImageToBMP "pumpkin-out.bmp" newImg
where
-- *** Exception: Prelude.Ord.compare applied to EDSL types
evalPixel :: A.Exp A.Word32 -> A.Exp A.Word32
evalPixel p = if p > 50 then p else 0
-- Apply threshold over image using repa
thresholdRepa :: IO ()
thresholdRepa = do
let arr :: IO (R.Array R.U R.DIM2 (Word8,Word8,Word8))
arr = either (error . show) id `fmap` R.readImageFromBMP "pumpkin-in.bmp"
img <- arr
newImg <- R.computeP (R.map applyAtPoint img)
R.writeImageToBMP "pumpkin-out.bmp" newImg
where
applyAtPoint :: (Word8,Word8,Word8) -> (Word8,Word8,Word8)
applyAtPoint (r,g,b) =
let [r',g',b'] = map applyThresholdOnPixel [r,g,b]
in (r',g',b')
applyThresholdOnPixel x = if x > 50 then x else 0
data BackendChoice = Repa | Accelerate
main :: IO ()
main = do
let userChoice = Repa -- pretend this command line flag
case userChoice of
Repa -> thresholdRepa
Accelerate -> thresholdAccelerate
Question: can I write this only once?
The implementations of thresholdAccelerate
and thresholdRepa
are very similar. Is there an elegant way to write array processing functions once, then opt for multicore CPUs (repa) or GPUs (accelerate) in a switch programmatically? I can think of choosing my import in accordance with whether I want CPU or GPU i.e. to import either Data.Array.Accelerate.CUDA
or Data.Array.Repa
to execute an action of type Acc a
with:
run :: Arrays a => Acc a -> a
Or, to use a type class e.g. something roughly like:
main :: IO ()
main = do
let userChoice = Repa -- pretend this is a command line flag
action <- case userChoice of
Repa -> applyThreshold :: RepaBackend ()
Accelerate -> applyThreshold :: CudaBackend ()
action
Or is it the case that, for each parallel array function I wish to express for both CPUs and GPUs, I must implement it twice --- once with the repa library and again with the accelerate library?
accelerate
, but I guess there's not much interest in that for whatever reason. – Lemuellemuelaaccelerate
allows us to write the function once and have it run on parallel CPU or on GPU depending on which you want. It's a module switch instead of flipping a constructor, but you can write the latter on top of the former. – Armure