Although I have a good LSFR C implementation I thought I'd try the same in Haskell - just to see how it goes. What I came up with, so far, is two orders of magnitude slower than the C implementation, which begs the question: How can the performance be improved? Clearly, the bit-fiddling operations are the bottleneck, and the profiler confirms this.
Here's the baseline Haskell code using lists and Data.Bits
:
import Control.Monad (when)
import Data.Bits (Bits, shift, testBit, xor, (.&.), (.|.))
import System.Environment (getArgs)
import System.Exit (exitFailure, exitSuccess)
tap :: [[Int]]
tap = [
[], [], [], [3, 2],
[4, 3], [5, 3], [6, 5], [7, 6],
[8, 6, 5, 4], [9, 5], [10, 7], [11, 9],
[12, 6, 4, 1], [13, 4, 3, 1], [14, 5, 3, 1], [15, 14],
[16,15,13,4], [17, 14], [18, 11], [19, 6, 2, 1],
[20, 17], [21, 19], [22, 21], [23, 18],
[24,23,22,17], [25, 22], [26, 6, 2, 1], [27, 5, 2, 1],
[28, 25], [29, 27], [30, 6, 4, 1], [31, 28],
[32,22,2,1], [33,20], [34,27,2,1], [35,33],
[36,25], [37,5,4,3,2,1],[38,6,5,1], [39,35],
[40,38,21,19], [41,38], [42,41,20,19], [43,42,38,37],
[44,43,18,17], [45,44,42,41], [46,45,26,25], [47,42],
[48,47,21,20], [49,40], [50,49,24,23], [51,50,36,35],
[52,49], [53,52,38,37], [54,53,18,17], [55,31],
[56,55,35,34], [57,50], [58,39], [59,58,38,37],
[60,59], [61,60,46,45], [62,61,6,5], [63,62] ]
xor' :: [Bool] -> Bool
xor' = foldr xor False
mask :: (Num a, Bits a) => Int -> a
mask len = shift 1 len - 1
advance :: Int -> [Int] -> Int -> Int
advance len tap lfsr
| d0 = shifted
| otherwise = shifted .|. 1
where
shifted = shift lfsr 1 .&. mask len
d0 = xor' $ map (testBit lfsr) tap'
tap' = map (subtract 1) tap
main :: IO ()
main = do
args <- getArgs
when (null args) $ fail "Usage: lsfr <number-of-bits>"
let len = read $ head args
when (len < 8) $ fail "No need for LFSR"
let out = last $ take (shift 1 len) $ iterate (advance len (tap!!len)) 0
if out == 0 then do
putStr "OK\n"
exitSuccess
else do
putStr "FAIL\n"
exitFailure
Basically it tests whether the LSFR defined in tap :: [[Int]]
for any given bit-length is of maximum-length. (More precisely, it just checks whether the LSFR reaches the initial state (zero) after 2n iterations.)
According to the profiler the most costly line is the feedback bit d0 = xor' $ map (testBit lfsr) tap'
.
What I've tried so far:
- use
Data.Array
: Attempt abandoned because there's no foldl/r - use
Data.Vector
: Slightly faster than the baseline
The compiler options I use are: -O2
, LTS Haskell 8.12 (GHC-8.0.2)
.
The reference C++ program can be found on gist.github.com.
The Haskell code can't be expected (?) to run as fast as the C code, but two orders of magnitude is too much, there must be a better way to do the bit-fiddling.
Update: Results of applying the optimisations suggested in the answers
- The reference C++ program with input 28, compiled with LLVM 8.0.0, runs in 0.67s on my machine (the same with clang 3.7 is marginally slower, 0.68s)
- The baseline Haskell code runs about 100x slower (because of the space inefficiency don't try it with inputs larger than 25)
- With the rewrite of @Thomas M. DuBuisson, still using the default GHC backend, the execution time goes down to 5.2s
- With the rewrite of @Thomas M. DuBuisson, now using the LLVM backend (GHC option
-O2 -fllvm
), the execution time goes down to 1.7s- Using GHC option
-O2 -fllvm -optlc -mcpu=native
brings this to 0.73s
- Using GHC option
- Replacing
iterate
withiterate'
of @cirdec makes no difference when Thomas' code is used (both with the default 'native' backend and LLVM). However, it does make a difference when the baseline code is used.
So, we've come from 100x to 8x to 1.09x, i.e. only 9% slower than C!
Note
The LLVM backend to GHC 8.0.2 requires LLVM 3.7. On Mac OS X this means installing this version with brew
and then symlinking opt
and llc
. See 7.10. GHC Backends.
shift 1 len
with the more straight-forward2^len
and re-benchmark. – Tabithatablatureadvance ... = shifted .|. fromEnum d0
. – Tabithatablatureshift 1 len
by2 ^ len
: 2.259s -> 0.178s - that is bizarre, 2^len is needed exactly once, why would this make a difference at all? – Blondfoldr
vs.foldl
: It makes almost no difference. Empiricallyfoldr
was slightly faster! – Blond