Efficient Prime Factorization for large numbers

Asked 13/10, 2014 at 15:54 Answered 20/6, 2018 at 15:4

c++multithreading algorithm primes prime-factoring

I've been working on a little problem where I need to compute 18-digit numbers into their respective prime factorization. Everything compiles and it runs just fine, considering that it actually works, but I am looking to reduce the run time of the prime factorization. I have implemented recursion and threading but I think I might need some help in understanding possible algorithms for large number computation.

Every time I run this on the 4 numbers I have pre-made, it takes about 10 seconds. I would like to reduce this to possibly 0.06 seconds if there are any ideas out there.

I noticed a few algorithms like Sieve of Eratosthenes and producing a list of all the prime numbers prior to computing. I'm just wondering if someone could elaborate on it. For instance, I'm having issues understanding how to implement Sieve of Eratosthenes into my program or if it would even be a good idea. Any and all pointers on how to approach this better would be really helpful!

Here is my code:

#include <iostream>
#include <thread>
#include <vector>
#include <chrono>

using namespace std;
using namespace std::chrono;

vector<thread> threads;
vector<long long> inputVector;
bool developer = false; 
vector<unsigned long long> factor_base;
vector<long long> primeVector;

class PrimeNumber
{
    long long initValue;        // the number being prime factored
    vector<long long> factors;  // all of the factor values
public:
    void setInitValue(long long n)
    {
        initValue = n;
    }
    void addToVector(long long m)
    {
        factors.push_back(m);
    }
    void setVector(vector<long long> m)
    {
        factors = m;
    }
    long long getInitValue()
    {
        return initValue;
    }
    vector<long long> getVector()
    {
        return factors;
    }
};

vector<PrimeNumber> primes;

// find primes recursively and have them returned in vectors
vector<long long> getPrimes(long long n, vector<long long> vec)
{
    double sqrt_of_n = sqrt(n);

    for (int i = 2; i <= sqrt_of_n; i++)
    {
        if (n % i == 0) 
        {
            return vec.push_back(i), getPrimes(n / i, vec); //cause recursion
        }
    }

    // pick up the last prime factorization number
    vec.push_back(n);

    //return the finished vector
    return vec;
}

void getUserInput()
{
    long long input = -1;
    cout << "Enter all of the numbers to find their prime factors. Enter 0 to compute" << endl;
    do
    {
        cin >> input;
        if (input == 0)
        {
            break;
        }
        inputVector.push_back(input);
    } while (input != 0);
}

int main() 
{

    vector<long long> temp1;   // empty vector
    vector<long long> result1; // temp vector

    if (developer == false)
    {
        getUserInput();
    }
    else
    {
        cout << "developer mode active" << endl;
        long long a1 = 771895004973090566;
        long long b1 = 788380500764597944;
        long long a2 = 100020000004324000;
        long long b2 = 200023423420000000;
        inputVector.push_back(a1);
        inputVector.push_back(b2);
        inputVector.push_back(b1);
        inputVector.push_back(a2);
    }

    high_resolution_clock::time_point time1 = high_resolution_clock::now();

    // give each thread a number to comput within the recursive function
    for (int i = 0; i < inputVector.size(); i++)
    {   
        PrimeNumber prime;
        prime.setInitValue(inputVector.at(i));
        threads.push_back(thread([&]{
            prime.setVector(result1 = getPrimes(inputVector.at(i), temp1));
            primes.push_back(prime);
        }));
    }

    // allow all of the threads to join back together.
    for (auto& th : threads)
    {
        cout << th.get_id() << endl;
        th.join();
    }

    high_resolution_clock::time_point time2 = high_resolution_clock::now();

    // print all of the information
    for (int i = 0; i < primes.size(); i++)
    {
        vector<long long> temp = primes.at(i).getVector();

        for (int m = 0; m < temp.size(); m++)
        {
            cout << temp.at(m) << " ";
        }
        cout << endl;
    }

    cout << endl;

    // so the running time
    auto duration = duration_cast<microseconds>(time2 - time1).count();

    cout << "Duration: " << (duration / 1000000.0) << endl;

    return 0;
}

Granados answered 13/10, 2014 at 15:54 Comment(5)

This question is better suited for Code Review – Colorblind 13/10, 2014 at 15:55

oh shoot haha didnt even know they had a section for this. thanks! – Granados 13/10, 2014 at 15:56

possible duplicate of Problems with prime numbers – Sold 13/10, 2014 at 16:1

It's good that you stop looking for factors when you reach sqrt(n), but it's not so good that when you recurse, you start again at 2. If no number smaller than i is a factor of n, then no number smaller than i is a factor of n/i either, and there's no need checking all of those again. In particular, if i is the smallest factor of n and i is greater than the cube root of n, then i and n/i are the only factors of n. It's not necessary to do that specific check, though; it will be automatic if you start the next search at i. – Chenay 13/10, 2014 at 18:55

Hmm. Your first test number, a1 = 771895004973090566, can be factored in less than 1/2000 second (or better), because it is 2 x 385947502486545283. The factor 2 is of course found instantly. Then, 385947502486545283 is easily determined to be prime using Miller–Rabin. Similarly, a2 = 788380500764597944 can be factored almost instantly to 2 x 2 x 2 x 7 x 14078223227939249. The challenge is actually to factor hard semiprimes like 18436839306515468081 = 2988873347 x 6168491323, and for that you want Shanks's Square Forms Factorization, Hart's One-Line Factorization, or Brent–Pollard Rho. – Michey 6/6, 2015 at 6:12

Trial division is only suitable for factoring small numbers. For n up to 2^64, you'll need a better algorithm: I recommend starting with wheel factorization to get the small factors, followed by Pollard's rho algorithm to get the rest. Where trial division is O(sqrt(n)), rho is O(sqrt(sqrt(n))), so it's much faster. For 2^64, sqrt(n) = 2^32, but sqrt(sqrt(n)) = 2^16, which is a huge improvement. You should expect to factor your numbers in a few milliseconds, at most.

I don't have C++ code for factoring, but I do have readable Python code. Let me know if you want me to post it. If you want to know more about wheel factorization and the rho algorithm, I have lots of prime number stuff at my blog.

Delamare answered 13/10, 2014 at 16:45 Comment(2)

While I agree with this answer (+1), Pollard rho works in expected O(sqrt(p)) time where p is the prime factor. So why not just use Pollard rho the whole way through to make things simpler? It will find the small prime factors very fast, so probably no noticeable slow down. Also, if someone wants guaranteed running time, Pollard rho only gives expected running time, and worse, the "expected" time relies on the assumption that a pseudo-random polynomial mod n will behave enough like a random sequence to make the random assumption valid to compute expected running time. – Compendium 13/10, 2014 at 18:14

To use Pollard rho, you have to at least remove factors of 2. Pollard rho may yield composite factors, so you have to test primality, which slows things down, and trial division lets you get small prime factors without need of a primality test. I agree with you that the crossover point from trial division to rho is low; I typically use a 2,3,5-wheel to 10000, then cut over to rho. – Delamare 13/10, 2014 at 19:24

I found that the Sieve of Eratosthenes on your modern processor thrashes the cache, so that main memory bandwidth is the limiting factor. I found this when trying to run multiple threads and failing to speed things up by as much as I was hoping for.

So, I recommend breaking the sieve into segments which will fit in the L3 cache. Also, if you exclude multiples of 2, 3 and 5 from the bit vector, then an 8 bit byte can represent 30 numbers on the number line, with 1 bit for each number which is 1, 7, 11, 13, 17, 19, 23 or 29 modulo 30 -- so that a bit map for primes up to 10^9 takes ~32MB -- 10^9 / (30 * 1024 * 1024). This is almost half the size of a bit map which just excludes multiples of 2, which is ~60MB -- 10^9 / (2 * 8 * 1024 * 1024).

Obviously, to run the sieve up to 10^9 you need the primes up to sqrt(10^9) -- which requires some 1,055 bytes, from which you can generate any part of the full sieve up to 10^9.

FWIW, the results I get on a modest AMD Phenom II x6 1090T (8MB L3 cache), for primes up to 10^9 are:

  1. 1 core,   1 segment    3.260 seconds elapsed
  2. 5 cores,  1 segment    1.830 seconds elapsed
  3. 1 core,   8 segments   1.800 seconds elapsed
  4. 5 cores, 40 segments   0.370 seconds elapsed

where by "segment" I mean a part of the sieve. In this case the sieve is ~32MB, so where there are multiple segments they are using about 4MB of L3 cache at any one time.

Those times include the time required to scan the completed sieve and generate all the primes as an array of integers. That takes about 0.5 secs of CPU ! So, to run the sieve without actually extracting the primes from it, takes 0.270 seconds elapsed in case (4) above.

FWIW, I get a small improvement -- to 0.240 seconds in case (4) -- by initialising each segment using a precalculated pattern that removes multiples of 7, 11, 13 and 17. That pattern is 17,017 bytes.

Clearly, to do a single factorization in 0.06 secs... you need the sieve to be pre-computed !

Lafayette answered 14/10, 2014 at 10:43 Comment(1)

Packing the primes mod 30 (sans 2,3,5) into a single byte is brilliant. Did you think that up? Very nice. – Michey 6/6, 2015 at 6:25

for(int i  = 2; i * i <= n; ++i) //no sqrt, please
{
    while(n%i == 0) //while, not if
    {
         factors.push_back(i);
         n/=i;
    }
}
if(n != 1)
{
    factors.push_back(n);
}

This is basically a neater implementation of your algorithm. Its complexity is sqrt of N. It will work pretty quickly even for a 18-digit number, but only if the prime factors are all small. If it's a product of two large prime numbers, or worse, is prime itself, this will run for approximately 10 seconds.

Myrtia answered 13/10, 2014 at 16:49 Comment(7)

It can be faster to compute sqrt once than to compute i*i repeatedly. – Cubiculum 13/10, 2014 at 16:51

@MarkRansom It is better to take i*i than to take sqrt as multiplication is faster than taking square root. – Matthus 10/5, 2015 at 6:0

@Matthus it depends. If you only need to do sqrt once, but you need to do i*i 1000 times, the sqrt can be faster. – Cubiculum 10/5, 2015 at 12:45

@MarkRansom Can you explain this comment please ? How are they differ ? If you take sqrt(n) and check with i - or if you take i^2 and check with n ... Mathematically these are equivalent am I wrong ? – Faa 21/11, 2016 at 1:40

@BeTa yes they're equivalent, that's the point. The question is which is faster. If you look at the construction of the loop, i*i must be recomputed each time i changes, which is each iteration of the loop. On the other hand sqrt(n) must be computed only once. Even though the computation of sqrt(n) is slower than i*i, once you factor in all the repetitions sqrt is faster overall. – Cubiculum 21/11, 2016 at 3:40

Very neat. Thank you. – Siobhansion 24/9, 2018 at 14:8

Instead of computing i*i or sqrt(n) would it be even faster to create a separate variable named i_2 and adding i_2 += i + i + 1 to the end of the loop? – Dupery 27/10, 2022 at 9:26

A simple speedup of two can easily be achieved by changing your loop:

if (n % 2) {
    return vec.push_back(i), getPrimes(n / i, vec);
}

for (int i = 3; i <= sqrt_of_n; i += 2)
{
    if (n % i == 0) 
    {
        return vec.push_back(i), getPrimes(n / i, vec); //cause recursion
    }
}

You first should test the number by two. Then, starting from 3 you test again incrementing your loop by two at a time. You already know thay 4, 6, 8, ... are even numbers and have 2 as a factor. Testing against even numbers you're reducing your complexity by half.

To factor a number N you only need the prime numbers <= sqrt(N). For a 18 digit number you only need to test against all primes less than 1e9, and since there are 98 millon primes less than 2e9 you can easily store 100 millon numbers on today's computers and run the factoring in parallel. If each number takes 8 bytes of RAM (int64_t), 100 millon primes would take 800 MB of memory. This algorithm is the classic solution to SPOJ problem #2, Prime Generator.

The best way to list all the small primes that can fit on a 32-bit int is to build a Sieve of Eratostenes. I told you that we need the primes less than sqrt(N) to factor any N, so to factor 64 bit integers you need all the primes that fit as a 32-bit number.

Metaprotein answered 13/10, 2014 at 16:5 Comment(15)

Are you suggesting that I generate all of the primes less than 2e9 before checking if they are prime? – Granados 13/10, 2014 at 16:7

Yes. First build or get a list of all the primes less than 1e9. Then use that primes list to factor each of your numbers. For each prime P test if it divides your N store the result somewhere and keep going with N' = N / P again. You're on the edge of today's computers capabilities but I believe it may work. – Metaprotein 13/10, 2014 at 16:10

This algorithm is capable of producing correct results, but it's akin to suggesting someone use a Bubble Sort for sorting a list. Outside of certain learning scenarios, it's just not the right tool for the job. – Sold 13/10, 2014 at 16:13

@SamHarwell you're wrong. Read the OP source code on how he is testing if a number if prime. He goes from 2 to sqrt(N) meaning his bigoh is O(sqrt(N)) and my solution algo goes to sqrt(N) but only on the number of primes between 2 and sqrt(N), which is waaaay lower. We have ~50M primes between 2 and sqrt(N) so my bigoh is O(1/20 sqrt(N)). – Metaprotein 13/10, 2014 at 16:23

O(1/20 sqrt(N)) = O(sqrt(N)). – Dashiell 13/10, 2014 at 16:28

Interesting that it takes much more memory to store the list of prime numbers vs. a bitmask that indicates the primality of each number in the range. – Cubiculum 13/10, 2014 at 16:28

@Metaprotein the recursion increases the bigO of the loop. – Cubiculum 13/10, 2014 at 16:32

@YvesDaoust Yes, of course ;) But OP is interested on wall clock run time. He is measuring his efficiency on seconds, therefore constants are important. – Metaprotein 13/10, 2014 at 16:33

@MarkRansom You're right! A bitmask on all the 32 bit numbers takes ~536 MB. That would also be a nice speed trick, not only using less memory but also faster for locality stuff. But again we need to measure exactly what's the gain. – Metaprotein 13/10, 2014 at 16:36

@vz0: this is why the bigO notation is not so appropriate. – Dashiell 13/10, 2014 at 16:43

yes having the primes list it's a 20x speedup (so 10 secs -> 0.5 secs). But to calculate the primes it'd take 1-2 seconds with a simple enough code (e.g. this code takes 10 secs on ideone, and it's not even segmented - though it's on odds only). So for 0.06 time the trial division and the sieve of Eratosthenes are no go. – Yodle 13/10, 2014 at 18:7

Running a Sieve of Eratosthenes on a modern processor thrashes the cache. The trick is to break the sieve into chunks, each of which will fit in the L3 cache (leaving space for other cpus). Also, if you exclude multiples of 2, 3 and 5 from the bit vector, then an 8 bit byte can represent multiples of 1, 7, 11, 13, 17, 19, 23 and 29, so that a bit map for primes up to 10^9 takes ~32MB. – Lafayette 14/10, 2014 at 3:1

Sorry... not multiples of 1, 7, etc., but numbers which are 1, 7, 11, 13, 17, 19, 23 and 29 modulo 30 -- so each byte represents 30 numbers on the number line, excluding all multiples of 2, 3 and 5. Obviously, to run the sieve up to 10^9 you need the primes up to sqrt(10^9) -- which requires some 1,055 bytes, from which you can generate any part of the full sieve up to 10^9. Using 5 cores, and limiting each to ~800K byte segments of the sieve, I get primes up to 10^9 in ~0.25 secs elapsed time for the sieve. So, to do the factorization in 0.06 secs... you need the sieve to be pre-computed ! – Lafayette 14/10, 2014 at 9:16

@gmch You should definitively create a new answer on this question! – Metaprotein 14/10, 2014 at 9:28

OK... I've expanded what I found when playing with the Sieve of Eratosthenes into an answer. – Lafayette 14/10, 2014 at 10:57

Algorithm :

Divide number by 2 until it is not divisible by it ,store the result and display it .
Divide number by 3 until it is not divisible by 3 and display the result,
Repeat same process for 5,7... etc till square root of n.

If at the end resultant number is prime , display its count as 1.

int main() {
long long n;
cin >> n;
int count =0 ;
while(!(n%2)){
    n = n / 2;
    count++;
}
if(count > 0) {
    cout<<"2^"<<count<<" ";
}
for(long long i=3;i<=sqrt(n) ; i+=2){
    count=0;
    while(n%i == 0){
        count++;
        n = n/i;
    }
    if(count){
        cout << i <<"^" <<count<<" ";
    }

}
if(n>2){
    cout<<n <<"^1";
}



return 0;
}

Input : 100000000 Output 2^8 5^8

Capercaillie answered 20/6, 2018 at 15:4 Comment(0)

-2

Algorithm for factoring integers, so simple, that it may be realized even on an abacus.

void start()
{ 
   int a=4252361;    //  integer to factorize
   int b=1,  c,  d=a-1;  

   while ((a > b) && (d != b))
  {
    if (d > b)
     {
       c=c+b;
       a=a-1;
       d=a-c-1;
     }
    if (d < b)
     {
       c=b-d;
       b=b+1;
       a=a-1;
       d=a-c-1;
     }         
  }
    if ((d == b)&&(a > b)) 
  Alert ("a = ",  a-1,  " * ", b+1); 

    if ((d < b)&&(a <= b)) 
  Alert ("a  is a prime");

  return;
}

The algorithm is composed by me, Miliaev Viktor Konstantinovich, born 26, July, 1950. It is written in MQL4 15.12. 2017. E-mail: [email protected]

Adman answered 22/12, 2017 at 18:11 Comment(4)

This causes my eyes to bleed, when I try to read it. Can you use some meaningful variable names opposed to "a", "b", "c" and "d"? Or add some comments on what is supposed to happen at certain parts. It's like reading a minified code now. – Gaytan 17/3, 2018 at 15:55

There seems to be a small mistake in the code above: c needs to be initialized to zero before entering the loop, otherwise the expression c = c + b is undefined. When I change the code to initialize c = 0, the code behaves as expected. Very cool little algorithm! It's certainly not the fastest way to factor a number (it appears to be O(n) runtime), but it's quite interesting that it can find a factor using only addition, subtraction, and comparisons (no multiplication, division, or modulus operations). – Michey 14/7, 2019 at 1:29

This fun snippet of code assumes a >= 3. It answers the question of prime or composite, but doesn't fully prime-factorize if the number has 3 or more prime factors. e.g. a=8 is reported as 4*2 – Goetz 5/7, 2020 at 4:31

@ToddLehman re: appearance of O(n) I see that a counts down from n by 1, but it only goes as far as sqrt(n) as b climbs up to sqrt(n). So, would that make it O(n - sqrt(n)) -- and is that technically different from O(n)? The big-O notation is not natural in my head. – Goetz 6/7, 2020 at 1:40

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags