To be more specific, I have the following innocuous-looking little Repa 3 program:
{-# LANGUAGE QuasiQuotes #-}
import Prelude hiding (map, zipWith)
import System.Environment (getArgs)
import Data.Word (Word8)
import Data.Array.Repa
import Data.Array.Repa.IO.DevIL
import Data.Array.Repa.Stencil
import Data.Array.Repa.Stencil.Dim2
main = do
[s] <- getArgs
img <- runIL $ readImage s
let out = output x where RGB x = img
runIL . writeImage "out.bmp" . Grey =<< computeP out
output img = map cast . blur . blur $ blur grey
where
grey = traverse img to2D luminance
cast n = floor n :: Word8
to2D (Z:.i:.j:._) = Z:.i:.j
---------------------------------------------------------------
luminance f (Z:.i:.j) = 0.21*r + 0.71*g + 0.07*b :: Float
where
(r,g,b) = rgb (fromIntegral . f) i j
blur = map (/ 9) . convolve kernel
where
kernel = [stencil2| 1 1 1
1 1 1
1 1 1 |]
convolve = mapStencil2 BoundClamp
rgb f i j = (r,g,b)
where
r = f $ Z:.i:.j:.0
g = f $ Z:.i:.j:.1
b = f $ Z:.i:.j:.2
Which takes this much time to process a 640x420 image on my 2Ghz core 2 duo laptop:
real 2m32.572s
user 4m57.324s
sys 0m1.870s
I know something must be quite wrong, because I have gotten much better performance on much more complex algorithms using Repa 2. Under that API, the big improvement I found came from adding a call to 'force' before every array transform (which I understand to mean every call to map, convolve, traverse etc). I cannot quite make out the analogous thing to do in Repa 3 - in fact I thought the new manifestation type parameters are supposed to ensure there is no ambiguity about when an array needs to be forced? And how does the new monadic interface fit into this scheme? I have read the nice tutorial by Don S, but there are some key gaps between the Repa 2 and 3 APIs that are little discussed online AFAIK.
More simply, is there a minimally impactful way to fix the above program's efficiency?