F# parallelizing issue when calculating perfect numbers?
Asked Answered
I

2

2

I am trying to optimize a small program which calculates perfect numbers from a given exponent.

The program runs (almost) perfectly, but when I open the task manager, it still runs on a single thread. That means that I must be doing something wrong, but my knowledge of F# is still in a 'beginning' phase.

I will try to put this question as clear as I possibly can, but if I fail in doing so, please let me know.

A perfect number is a number where the sum of all it's divisors (except for the number itself) is equal to the number itself (e.g. 6 is perfect, since the sum of it's divisors 1, 2 and 3 are 6).

I use prime numbers to speed up the calculation, that is I am not interested in (huge) lists where all the divisors are stored. To do so, I use the formula that Euclid proved to be correct: (2*(power of num - 1)) * ( 2* (power of num - 1)) where the latter is a Mersenne prime. I used a very fast algorithm from stackoverflow (by @Juliet) to determine whether a given number is a prime.

As I have been reading through several articles (I have not yet purchased a good book, so shame on me) on the Internet, I found out that sequences perform better than lists. So that is why I first started to create a function which generates a sequence of perfect numbers:

   let perfectNumbersTwo (n : int) =  
    seq { for i in 1..n do 
           if (PowShift i) - 1I |> isPrime 
           then yield PowShift (i-1) * ((PowShift i)-1I)
        } 

The helperfunction PowShift is implemented as following:

    let inline PowShift (exp:int32) = 1I <<< exp ;;

I use a bit shift operator, since the base of all power calculations is from 2, hence this could be an easy way. Of course I am still grateful for the contributions on the question I asked about this on: F# Power issues which accepts both arguments to be bigints>F# Power issues which accepts both arguments to be bigints

The function Juliet created (borrowed here) is as following:

let isPrime ( n : bigint) = 
    let maxFactor = bigint(sqrt(float n))
    let rec loop testPrime tog =
        if testPrime > maxFactor then true
        elif n % testPrime = 0I then false
        else loop (testPrime + tog) (6I - tog)
    if n = 2I || n = 3I || n = 5I then true
    elif n <= 1I || n % 2I = 0I || n % 3I = 0I || n % 5I = 0I then false
    else loop 7I 4I;;

Using this code, without parallel, it takes about 9 minutes on my laptop to find the 9th perfect number (which consists of 37 digits, and can be found with value 31 for the exponent). Since my laptop has a CPU with two cores, and only one is running at 50 percent (full load for one core) I though that I could speed up the calculations by calculating the results parallel.

So I changed my perfectnumber function as following:

//Now the function again, but async for parallel computing
let perfectNumbersAsync ( n : int) =
    async {
        try
            for x in 1.. n do
                if PowShift x - 1I |> isPrime then
                    let result = PowShift (x-1) * ((PowShift x)-1I)
                    printfn "Found %A as a perfect number" result
        with
            | ex -> printfn "Error%s" (ex.Message);
    }

To call this function, I use a small helper function to run it:

 let runPerfects n =
    [n]
        |> Seq.map perfectNumbersAsync
        |> Async.Parallel
        |> Async.RunSynchronously
        |> ignore

Where the result of async calculation is ignored, since I am displaying it within the perfectNumbersAsync function.

The code above compiles and it runs, however it still uses only one core (although it runs 10 seconds faster when calculating the 9th perfect number). I am afraid that it has to do something with the helper functions PowShift and isPrime, but I am not certain. Do I have to put the code of these helper functions within the async block of perfectNumbersAsync? It does not improve readability...

The more I play with F#, the more I learn to appreciate this language, but as with this case, I am in need of some experts sometimes :).

Thanks in advance for reading this, I only hope that I made myself a bit clear...

Robert.

Ingroup answered 3/12, 2011 at 13:53 Comment(2)
There are optimizations you can do that don't involve parallellization. In order for 2^p-1 to be a Mersenne prime, p must be prime as well, so you can test for that first. Also, the Lucas-Rehmer test is much more efficient for Mersenne primes. See en.wikipedia.org/wiki/Lucas%E2%80%93Lehmer_primality_testGatlin
@JeffreySax Thank you for your comment Jeffrey, you have more experience with numbers (and how to deal with them properly) than I have. I will take your advice with me and see how I can optimize it. Reminds me of being a Math hating kid, which turns out that I am starting to like it more and more. It really starts to fascinate me!Ingroup
O
3

@Jeffrey Sax's comment is definitely interesting, so I took some time to do a small experiment. The Lucas-Lehmer test is written as follows:

let lucasLehmer p =
    let m = (PowShift p) - 1I
    let rec loop i acc =
        if i = p-2 then acc
        else loop (i+1) ((acc*acc - 2I)%m)
    (loop 0 4I) = 0I

With the Lucas-Lehmer test, I can get first few perfect numbers very fast:

let mersenne (i: int) =     
    if i = 2 || (isPrime (bigint i) && lucasLehmer i) then
        let p = PowShift i
        Some ((p/2I) * (p-1I))
    else None

let runPerfects n =
    seq [1..n]
        |> Seq.choose mersenne
        |> Seq.toArray

let m1 = runPerfects 2048;; // Real: 00:00:07.839, CPU: 00:00:07.878, GC gen0: 112, gen1: 2, gen2: 1

The Lucas-Lehmer test helps to reduce the time checking prime numbers. Instead of testing divisibility of 2^p-1 which takes O(sqrt(2^p-1)), we use the primality test which is at most O(p^3). With n = 2048, I am able to find first 15 Mersenne numbers in 7.83 seconds. The 15th Mersenne number is the one with i = 1279 and it consists of 770 digits.

I tried to parallelize runPerfects using PSeq module in F# Powerpack. PSeq doesn't preserve the order of the original sequence, so to be fair I have sorted the output sequence. Since the primality test is quite balance among indices, the result is quite encouraging:

#r "FSharp.Powerpack.Parallel.Seq.dll"    
open Microsoft.FSharp.Collections

let runPerfectsPar n =
    seq [1..n]
        |> PSeq.choose mersenne
        |> PSeq.sort (* align with sequential version *)
        |> PSeq.toArray 

let m2 = runPerfectsPar 2048;; // Real: 00:00:02.288, CPU: 00:00:07.987, GC gen0: 115, gen1: 1, gen2: 0

With the same input, the parallel version took 2.28 seconds which is equivalent to 3.4x speedup on my quad-core machine. I believe the result could be improved further if you use Parallel.For construct and partition the input range sensibly.

Olympia answered 3/12, 2011 at 14:23 Comment(10)
Thanks again my friend, you helped me out again. Though your code had a few minor errors (which is good, so that it prevents copy paste without understanding what your are doing). I have another pc, which has four cores, but you are right, the speed increase will only be marginal. However, I've learned again :D.Ingroup
Did you get any speedup? What is your finding after parallelizing this example? It will be interesting to see the actual results.Olympia
The speed up turns out to be minimal (1 second faster), it still uses one core, which is a bit strange... Could that be caused by the fact that I use this example in a Console Application? I can hardly believe that though...Ingroup
@Ingroup - this only uses 1 core because almost all the time is spent checking primality of 2^61 - 1 which is only single threadedDettmer
@RvdV79: I got a very good speedup using Lucas-Lehmer test and Parallel Sequence. You may want to look at it.Olympia
@Olympia Astonishing, I have tested your code and it turns out to be enormously fast! Now I need some time to study it before I will actually use it. One more (off topic) question, can you recommend books on F#?Ingroup
@RvdV79: Realworld Functional Programming (rads.stackoverflow.com/amzn/click/1933988924) is a good book for learning F# and functional programming concepts. Expert F# is good for reference (amazon.com/dp/1590598504/?tag=stackoverfl08-20).Olympia
You are really starting out to be my personal hero :-). I have ordered the book Realworld Functional Programming (at Bol.com, since that suits me better), in this way I can support @thomaspetricek a little as well. That book Expert F# has a 2.0 here in the Netherlands and is quite expensive (60 euro's, which is about 80 USD if I am not mistaken). I'll leave it up for my wife to decide whether that would be a nice Christmas present.Ingroup
My pleasure :). It's great to see your enthusiasm for learning F#.Olympia
@Olympia Small update: Yesterday I received the book: Real-World Functional Programming. It is a very helpful book indeed, so again I need to express my gratitude!Ingroup
D
3

One quick comment on speed and parallelisability,

Your isPrime is O(sqrt(n)), and each succesive n is about 2 x as big as the last one, so will take approx 1.5 x as long to calculate, which means that calculating the last numbers will take much longer

I have done some hacking with testing for primality and some things I have found which are useful are:

  1. For big N, (you are testing numbers with 20 digits), the prime density is actually quite low, so you will be doing alot of divisions by composite numbers. A better approach is to precalculate a table of primes (using a sieve) up to some maximum limit (probably determined by amount of memory). Note that you are most likely to find factors with small numbers. Once you run out of memory for your table, you can test the rest of the numbers with your existing function, with a larger starting point.

  2. Another approach is to use multiple threads in the checking. For example, you currently check x,x+4,x+6... as factors. By being slightly cleverer, you can do the number congruent to 1 mod 3 in 1 thread and the numbers congruent to 2 mod 3 in another thread.

No. 2 is simplest, but No. 1 is more effective, and provides potential for doing control flow with OutOfMemoryExceptions which can always be interesting

EDIT: So I implemented both of these ideas, it finds 2305843008139952128 almost instantly, finding 2658455991569831744654692615953842176 takes 7 minutes on my computer (quad core AMD 3200). Most of the time is spent checking 2^61 is prime, so a better algorithm would probably be better for checking the prime numbers: Code here

let swatch = new System.Diagnostics.Stopwatch()
swatch.Start()
let inline PowShift (exp:int32) = 1I <<< exp ;;
let limit = 10000000 //go to a limit, makes table gen slow, but should pay off
printfn "making table"
//returns an array of all the primes up to limit
let table =
    let table = Array.create limit true //use bools in the table to save on memory
    let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
    table.[1] <- false //special case
    [2..tlimit] 
    |> List.iter (fun t -> 
        if table.[t]  then //simple optimisation
            let mutable v = t*2
            while v < limit do
                table.[v] <- false
                v <- v + t)
    let out = Array.create (50847534) 0I //wolfram alpha provides pi(1 billion) - want to minimize memory
    let mutable idx = 0
    for x in [1..(limit-1)] do
        if table.[x] then
            out.[idx] <- bigint x
            idx <- idx + 1
    out |> Array.filter (fun t -> t <> 0I) //wolfram no is for 1 billion as limit, we use a smaller number
printfn "table made"

let rec isploop testprime incr max n=
    if testprime > max then true
    else if n % testprime = 0I then false
    else isploop (testprime + incr) incr max n

let isPrime ( n : bigint) = 
    //first test the table
    let maxFactor = bigint(sqrt(float n))
    match table |> Array.tryFind (fun t -> n % t = 0I && t <= maxFactor) with
    |Some(t) -> 
        false
    |None -> //now slow test
        //I have 4 cores so
        let bases = [|limit;limit+1;limit+3;limit+4|] //uses the fact that 10^x congruent to 1 mod 3
        //for 2 cores, drop last 2 terms above and change 6I to 3I
        match bases |> Array.map (fun t -> async {return isploop (bigint t) 6I maxFactor n}) |> Async.Parallel |> Async.RunSynchronously |> Array.tryFind (fun t -> t = false) with
        |Some(t) -> false
        |None -> true


let pcount = ref 0
let perfectNumbersTwo (n : int) =  
    seq { for i in 2..n do 
           if (isPrime (bigint i)) then
                if (PowShift i) - 1I |> isPrime then
                    pcount := !pcount + 1
                    if !pcount = 9 then
                        swatch.Stop()
                        printfn "total time %f seconds, %i:%i m:s"  (swatch.Elapsed.TotalSeconds) (swatch.Elapsed.Minutes) (swatch.Elapsed.Seconds)
                    yield PowShift (i-1) * ((PowShift i)-1I)
        } 


perfectNumbersTwo 62 |> Seq.iter (printfn "PERFECT: %A") //62 gives 9th number

printfn "done"
System.Console.Read() |> ignore
Dettmer answered 3/12, 2011 at 22:50 Comment(5)
Thank you for this elegant approach! It finds the number in 543 seconds on my Duo Core Centrino (9 minutes, 03 seconds), which is almost as fast as my earlier solution. Both cores are working hard, so it proves that this solution works a bit better than the one @Olympia provided above. The memory usuage (for storing the table) is quite enormous though, almost 700mb, while my approach is stable (but then again, it does not generate a lookup table) and using 3.8 mb of memory. When I find the time, I will see what the benefits are in terms of speed.Ingroup
@Ingroup - there is actually a trivial optimisation in the above code which will make it 2x faster, which I will leave for you to find . You can shrink the table, which probably won't affect the running time, which is mainly due to primality checking the final term.Dettmer
Note that I am just a beginner with F#, studying the code you have placed above reminds me of something I have studied in a paper (cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) which I found as recommended also on Stackoverflow. I am also not the brightest mathematician, but I think the optimization you are pointing out, might have to do with the elimination of the known factors 2,3,5 and 7. Am I right? Since that is what I have been doing in the algorithm I started out with. I will study some more, so please not give an answer straight away, just a hint might be enough :-)Ingroup
It is a pity that I cannot leave longer comments, so forgive me that I write two, which in fact should have been one. Another handicap of mine is that English is not my motherlanguage, so I have to read things over and over to get a grip of what people are actually writing and meaning. The reactions and answers here are given faster than I can deal with (by means of replying and studying the given information), however that doesn't mean that I owe you all a lot of gratitude!Ingroup
Here is a hint: we check numbers starting from [|limit;limit+1;limit+3;limit+4|], incrementing by 6, but this is not necersarry as some of these numbers we can very easily eliminate as compositeDettmer

© 2022 - 2024 — McMap. All rights reserved.