Two simple codes to generate divisors of a number. Why is the recursive one faster?

C

2

3

While solving a problem, I had to calculate the divisors of a number. I have two implementations that produce all divisors > 1 for a given number.

The first is using simple recursion:

divisors :: Int64 -> [Int64]
divisors k = divisors' 2 k
  where
    divisors' n k | n*n > k = [k]
                  | n*n == k = [n, k]
                  | k `mod` n == 0 = (n:(k `div` n):result)
                  | otherwise = result
      where result = divisors' (n+1) k

The second one uses list processing functions from the Prelude:

divisors2 :: Int64 -> [Int64]
divisors2 k = k : (concatMap (\x -> [x, k `div` x]) $!
                  filter (\x -> k `mod` x == 0) $! 
                  takeWhile (\x -> x*x <= k) [2..])

I find that the first implementation is faster (I printed the whole list returned, so that no part of the result remains unevaluated due to laziness). The two implementations produce differently ordered divisors, but that is not a problem for me. (In fact, if k is a perfect square, the square root is output twice in the second implementation - again not a problem).

In general are such recursive implementations faster in Haskell? Also, I would appreciate any pointers to make either of these codes faster. Thanks!

EDIT:

Here is the code I am using to compare these two implementations for performance: https://gist.github.com/3414372

Here are my timing measurements:

Using divisor2 with strict evaluation ($!)

$ ghc --make -O2 div.hs 
[1 of 1] Compiling Main             ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1

real    0m7.651s
user    0m7.604s
sys 0m0.012s

Using divisors2 with lazy evaluation ($):

$ ghc --make -O2 div.hs 
[1 of 1] Compiling Main             ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1

real    0m7.461s
user    0m7.444s
sys 0m0.012s

Using function divisors

$ ghc --make -O2 div.hs 
[1 of 1] Compiling Main             ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1

real    0m7.058s
user    0m7.036s
sys 0m0.020s

Clintonclintonia answered 21/8, 2012 at 9:37 Comment(9)

I think it has something to do with Haskell being Tail Call Optimised, so recursion isn't as big a hit as it would be with other languages. But I don't know exactly how this helps, so I'm leaving this as a comment instead of an answer. – Shoring 21/8, 2012 at 9:45

Nope, tail call optimization does not generally affect the performance of a Haskell program, and tail call optimization is very seldom even used by the compiler in Haskell, since it is a lazily evaluated language and doesn't need it most of the time. – Mcadams 21/8, 2012 at 9:57

@Mcadams As I said - I wasn't sure. Thanks for putting me right :) – Shoring 21/8, 2012 at 10:15

(misread your code there ...) try replacing $! with $, the culprit in the 2nd might be excessive strictness. How much is the difference anyway? 5%? 1%? – Strumpet 21/8, 2012 at 10:15

@WillNess, good call, I came to the same conclusion after you deleted your answer. – Mcadams 21/8, 2012 at 10:20

@Mcadams yeah I misread the OP code for prime factorization, but it isn't. Must ... pace ... myself. I had a hunch, but you've provided links and definite arguments there. :) – Strumpet 21/8, 2012 at 10:26

Better algorithm should be used for that. Generate a prime factorization first, then construct the divisors from it. If you're interested, ask this in a new question. :) – Strumpet 21/8, 2012 at 10:29

@WillNess Thanks, will ask in another question. Also, please see the edits in my question. – Clintonclintonia 21/8, 2012 at 10:57

since you've asked how to make it faster, I've added newly re-written answer here. – Strumpet 21/8, 2012 at 13:21

M

5

The recursive version is not in general faster than the list-based version. This is because the GHC compiler employs List fusion optimizations when a computation follows a certain pattern. This means that list generators and "list transformers" might be fused into one big generator instead.

However, when you use $!, you basically tell the compiler to "Please produce the first cons of this list before performing the next step." This means that GHC is forced to at least compute one intermediate list element, which disables the whole fusion optimization entirely.

So, the second algorithm is slower, because you produce intermediate lists that have to be constructed and destructed, while the recursive algorithm simply produces a single list straight away.

Mcadams answered 21/8, 2012 at 10:19 Comment(5)

Thanks for answering, but this seems to be only part of the explanation, as there still seems to be significant performance difference. But I learned about list fusion now! Please see the edits in my question. – Clintonclintonia 21/8, 2012 at 10:49

I'll update my answer in 5 minutes when I have a Haskell compiler again. – Mcadams 21/8, 2012 at 10:55

I can't reproduce your results. For me, the first version runs in 3.446s, while the second version runs in 0.831s. I'm using GHC 7.4.2. NOTE: you are using sum in your benchmark, which leads to a full list fusion. That's why the second version of the algorithm is so much faster. You should e.g. force the whole list instead of using sum to get a better benchmark. By the way, the first version of the algorithm uses plain recursion, and is therefore not being fused for that reason. You could rewrite it in terms of unfoldr or something. – Mcadams 21/8, 2012 at 11:25

@Clintonclintonia do you and dflemstr use same optimization flags? – Strumpet 21/8, 2012 at 13:22

I am using only -O2 as in the edits. However, my ghc version is 7.4.1. Could that be it? – Clintonclintonia 21/8, 2012 at 13:23

S

5

Since you asked, to make it faster a different algorithm should be used. Simple and straightforward is to find a prime factorization first, then construct the divisors from it somehow.

Standard prime factorization by trial division is:

factorize :: Integral a => a -> [a]
factorize n = go n (2:[3,5..])    -- or: `go n primes`
   where
     go n ds@(d:t)
        | d*d > n    = [n]
        | r == 0     =  d : go q ds
        | otherwise  =      go n t
            where  (q,r) = quotRem n d

-- factorize 12348  ==>  [2,2,3,3,7,7,7]

Equal prime factors can be grouped and counted:

import Data.List (group)

primePowers :: Integral a => a -> [(a, Int)]
primePowers n = [(head x, length x) | x <- group $ factorize n]
-- primePowers = map (head &&& length) . group . factorize

-- primePowers 12348  ==>  [(2,2),(3,2),(7,3)]

Divisors are usually constructed, though out of order, with:

divisors :: Integral a => a -> [a]
divisors n = map product $ sequence 
                    [take (k+1) $ iterate (p*) 1 | (p,k) <- primePowers n]

Hence, we have

numDivisors :: Integral a => a -> Int
numDivisors n = product  [ k+1                   | (_,k) <- primePowers n]

The product here comes from the sequence in the definition above it, because sequence :: Monad m => [m a] -> m [a] for list monad m ~ [] constructs lists of all possible combinations of elements picked by one from each member list, sequence_lists = foldr (\xs rs -> [x:r | x <- xs, r <- rs]) [[]], so that length . sequence_lists === product . map length, and or course length . take n === n for infinite argument lists.

In-order generation is possible, too:

ordDivisors :: Integral a => a -> [a]
ordDivisors n = foldr (\(p,k)-> foldi merge [] . take (k+1) . iterate (map (p*)))
                      [1] $ reverse $ primePowers n

foldi :: (a -> a -> a) -> a -> [a] -> a
foldi f z (x:xs) = f x (foldi f z (pairs xs))  where
         pairs (x:y:xs) = f x y:pairs xs
         pairs xs       = xs
foldi f z []     = z

merge :: Ord a => [a] -> [a] -> [a]
merge (x:xs) (y:ys) = case (compare y x) of 
           LT -> y : merge (x:xs)  ys
           _  -> x : merge  xs  (y:ys)
merge  xs     []    = xs
merge  []     ys    = ys

{- ordDivisors 12348  ==>  
[1,2,3,4,6,7,9,12,14,18,21,28,36,42,49,63,84,98,126,147,196,252,294,343,441,588,
686,882,1029,1372,1764,2058,3087,4116,6174,12348] -}

This definition is productive, too, i.e. it starts producing the divisors right away, without noticeable delay:

{- take 20 $ ordDivisors $ product $ concat $ replicate 5 $ take 11 primes
==> [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
(0.00 secs, 525068 bytes)

numDivisors $ product $ concat $ replicate 5 $ take 11 primes
==> 362797056  -}

Strumpet answered 21/8, 2012 at 10:5 Comment(0)

M

5

The recursive version is not in general faster than the list-based version. This is because the GHC compiler employs List fusion optimizations when a computation follows a certain pattern. This means that list generators and "list transformers" might be fused into one big generator instead.

However, when you use $!, you basically tell the compiler to "Please produce the first cons of this list before performing the next step." This means that GHC is forced to at least compute one intermediate list element, which disables the whole fusion optimization entirely.

So, the second algorithm is slower, because you produce intermediate lists that have to be constructed and destructed, while the recursive algorithm simply produces a single list straight away.

Mcadams answered 21/8, 2012 at 10:19 Comment(5)

Thanks for answering, but this seems to be only part of the explanation, as there still seems to be significant performance difference. But I learned about list fusion now! Please see the edits in my question. – Clintonclintonia 21/8, 2012 at 10:49

I'll update my answer in 5 minutes when I have a Haskell compiler again. – Mcadams 21/8, 2012 at 10:55

I can't reproduce your results. For me, the first version runs in 3.446s, while the second version runs in 0.831s. I'm using GHC 7.4.2. NOTE: you are using sum in your benchmark, which leads to a full list fusion. That's why the second version of the algorithm is so much faster. You should e.g. force the whole list instead of using sum to get a better benchmark. By the way, the first version of the algorithm uses plain recursion, and is therefore not being fused for that reason. You could rewrite it in terms of unfoldr or something. – Mcadams 21/8, 2012 at 11:25

@Clintonclintonia do you and dflemstr use same optimization flags? – Strumpet 21/8, 2012 at 13:22

I am using only -O2 as in the edits. However, my ghc version is 7.4.1. Could that be it? – Clintonclintonia 21/8, 2012 at 13:23

Recommended topics

Hot tags