Fastest way to prime factorise a number up to 10^18

Asked 9/5, 2018 at 10:48 Answered 10/5, 2018 at 17:25

Solved algorithm primes prime-factoring sieve-of-eratosthenes factorization

Given a number 1 <= n <= 10^18, how can I factorise it in least time complexity?

There are many posts on the internet addressing how you can find prime factors but none of them (at least from what I've seen) state their benefits, say in a particular situation.

I use Pollard's rho algorithm in addition to Eratosthenes' sieve:

Using sieve, find all prime numbers in the first 10⁷ numbers, and then divide n with these primes as much as possible.
Now use Pollard's rho algorithm to try and find the rest of the primes until n is equal to 1.

My Implementation:

#include <iostream>
#include <vector>
#include <cstdio>
#include <ctime>
#include <cmath>
#include <cstdlib>
#include <algorithm>
#include <string>

using namespace std;

typedef unsigned long long ull;
typedef long double ld;
typedef pair <ull, int> pui;
#define x first
#define y second
#define mp make_pair

bool prime[10000005];
vector <ull> p;

void initprime(){
    prime[2] = 1;
    for(int i = 3 ; i < 10000005 ; i += 2){
        prime[i] = 1;
    }
    for(int i = 3 ; i * i < 10000005 ; i += 2){
        if(prime[i]){
            for(int j = i * i ; j < 10000005 ; j += 2 * i){
                prime[j] = 0;
            }
        }
    }
    for(int i = 0 ; i < 10000005 ; ++i){
        if(prime[i]){
            p.push_back((ull)i);
        }
    }
}

ull modularpow(ull base, ull exp, ull mod){
    ull ret = 1;
    while(exp){
        if(exp & 1){
            ret = (ret * base) % mod;
        }
        exp >>= 1;
        base = (base * base) % mod;
    }
    return ret;
}

ull gcd(ull x, ull y){
    while(y){
        ull temp = y;
        y = x % y;
        x = temp;
    }
    return x;
}

ull pollardrho(ull n){
    srand(time(NULL));
    if(n == 1)
        return n;
    ull x = (rand() % (n - 2)) + 2;
    ull y = x;
    ull c = (rand() % (n - 1)) + 1;
    ull d = 1;
    while(d == 1){
        x = (modularpow(x, 2, n) + c + n) % n;
        y = (modularpow(y, 2, n) + c + n) % n;
        y = (modularpow(y, 2, n) + c + n) % n;
        d = gcd(abs(x - y), n);
        if(d == n){
            return pollardrho(n);
        }
    }
    return d;
}
int main ()
{
ios_base::sync_with_stdio(false);
cin.tie(0);
initprime();
ull n;
cin >> n;
ull c = n;
vector <pui> o;
for(vector <ull>::iterator i = p.begin() ; i != p.end() ; ++i){
    ull t = *i;
    if(!(n % t)){
        o.push_back(mp(t, 0));
    }
    while(!(n % t)){
        n /= t;
        o[o.size() - 1].y++;
    }
}
while(n > 1){
    ull u = pollardrho(n);
    o.push_back(mp(u, 0));
    while(!(n % u)){
        n /= u;
        o[o.size() - 1].y++;
    }
    if(n < 10000005){
        if(prime[n]){
            o.push_back(mp(n, 1));
        }
    }
}
return 0;
}

Is there any faster way to factor such numbers? If possible, please explain why along with the source code.

Sketch answered 9/5, 2018 at 10:48 Comment(8)

There is some elliptical curve stuff. I don't remember the name, but there is code available on the Internet. – Tameshatamez 9/5, 2018 at 11:19

You will spend a lot of time trying to factor any large primes (> 10^14) that fall into your range. You could try a quick primality test after you have reached the end of your sieve to save trying to find factors of a large prime. – Righthander 9/5, 2018 at 12:39

@Righthander I have been thinking about that. But I wasn't able to find any algorithm to test primality that has complexity lesser than O(sqrt(n)) and that doesn't use a giant amount of memory – Apophasis 9/5, 2018 at 14:14

quora.com/What-is-the-fastest-deterministic-primality-test – Squinch 9/5, 2018 at 14:37

@Sketch Does Miller-Rabin use a lot of memory? IIRC the test can be repeated until the chance of a false positive is less than the chance of a hardware failure in your computer. – Righthander 9/5, 2018 at 15:31

While there are advanced algorithms that are asymptotically faster, for integers of the size you are looking at a carefully written pollard-rho may well be the fastest. Actually, a combination of trial division followed by pollard-rho, just like you are doing, is close to the fastest. The experts claim that a fairly obscure algorithm called SQUFOF is the fastest for integers of your size, but only if you implement it carefully to avoid multiprecision integers. – Saransarangi 9/5, 2018 at 23:17

I believe that you can factor numbers up to 10^18 just by using Eratosthenes sieve. But if you have only one such number, Pollard's rho algorithm should be faster, as you already observed. Also, keep in mind that C++ uses 8bits for bool variable. Use std::vector<bool> instead, it will consume lot less of memory, thus caching better. – Lashonda 10/5, 2018 at 9:47

For numbers in that range, you should use trial division by small primes followed by Pollard's rho algorithm, with a Miller-Rabin test for primality. It is possible to do better, but will be a lot of work. See this essay at my blog for details and an implementation in C/GMP. – Simmer 10/5, 2018 at 13:59

Approach

Lets say you have a number n that goes up to 10¹⁸ and you want to prime factorise it. Since this number can be as small as unity and as big as 10¹⁸, all along it can be prime as well as composite, this would be my approach -

Using miller rabin primality testing, make sure that the number is composite.
Factorise n using primes up to 10⁶, which can be calculated using sieve of Eratosthenes.

Now the updated value of n is such that it has prime factors only above 10⁶ and since the value of n can still be as big as 10¹⁸, we conclude that the number is either prime or it has exactly two prime factors (not necessarily distinct).

Run Miller Rabin again to ensure the number isn't prime.
Use Pollard rho algorithm to get one prime factor.

You have the complete factorisation now.

Lets look at the time-complexity of the above approach:

Miller Rabin takes O(log n)
Sieve of Eratosthenes takes O(n*log n)
The implementation of Pollard rho I shared takes O(n^0.25)

Time Complexity

Step 2 takes maximum time which is equal to O(10^7), which is in turn the complexity of the above algorithm. This means you can find the factorisation within a second for almost all programming languages.

Space Complexity

Space is used only in the step 2 where sieve is implemented and is equal to O(10^6). Again, very practical for the purpose.

Implementation

Complete Code implemented in C++14. The code has a hidden bug. You can either reveal it in the next section, or skip towards the challenge ;)

Bug in the code

In line 105, iterate till i<=np. Otherwise, you may miss the cases where prime[np]=999983 is a prime factor

Challenge

Give me a value of n, if any, where the shared code results in wrong prime factorisation.

Bonus

How many such values of n exist ?

Hint

For such value of n, assertion in Line 119 may fail.

Solution

Lets call P=999983. All numbers of the form n = p*q*r where p, q, r are primes >= P such that at least one of them is equal to P will result in wrong prime factorisation.

Bonus Solution

There are exactly four such numbers: {P₀³, P₀²P₁, P₀²P₂, P₀P₁²}, where P₀ = P = 999983, P₁ = next_prime(P₀) = 1000003, P₂ = next_prime(P₁) = 1000033.

Wylen answered 10/5, 2018 at 17:25 Comment(4)

It looks really good. Do you perhaps have an implementation of that algorithm? – Apophasis 10/5, 2018 at 17:45

@Stef yes I added code snippets from my github repo for all the algorithms I used above. If you ask, I can add a complete working C++14 code in an hour as I'm travelling right now. – Wylen 10/5, 2018 at 17:47

It doesn't have to be in c++14, any language is fine, if I have to I can translate it to c++14 myself – Apophasis 10/5, 2018 at 17:53

Added! Please give any valuable feedback :) – Wylen 10/5, 2018 at 18:28

The fastest solution for 64-bit inputs on modern processors is a small amount of trial division (the amount will differ, but something under 100 is common) followed by Pollard's Rho. You will need a good deterministic primality test using Miller-Rabin or BPSW, and a infrastructure to handle multiple factors (e.g. if a composite is split into more composites). For 32-bit you can optimize each of these things even more.

You will want a fast mulmod, as it is the core of both Pollard's Rho, Miller-Rabin, and the Lucas test. Ideally this is done as a tiny assembler snippet.

Times should be under 1 millisecond to factor any 64-bit input. Significantly faster under 50 bits.

As shown by Ben Buhrow's spBrent implementation, algorithm P2'' from Brent's 1980 paper seems to be as fast as the other implementations I'm aware of. It uses Brent's improved cycle finding as well as the useful trick of delaying GCDs with the necessary added backtracking.

See this thread on Mersenneforum for some messy details and benchmarking of various solutions. I have a number of benchmarks of these and other implementations at different sizes, but haven't published anything (partly because there are so many ways to look at the data).

One of the really interesting things to come out of this was that SQUFOF, for many years believed to be the better solution for the high end of the 64-bit range, no longer is competitive. SQUFOF does have the advantage of only needing a fast perfect-square detector for best speed, which doesn't have to be in asm to be really fast.

Parturifacient answered 10/5, 2018 at 16:56 Comment(2)

Thanks for the info on SQUFOF, my information was outdated. To be honest, I'm somewhat outdated. – Saransarangi 10/5, 2018 at 21:53

I think it was a surprise to most of the people on that thread, because we'd long followed the same logic, and hadn't tested against a really well optimized P-R. It was to me as well. SQUFOF is still interesting and if I recall correctly not more than 2x slower. Once above 64-bit things get a bit more complicated assuming one wants to really optimize (P-1 and some tinyQS implementations start looking interesting). – Parturifacient 10/5, 2018 at 21:59