Part 1
Very interesting question you have, thanks!
Decided to implement my own very complex C++ solution of your task from scratch. Although you asked not to write code, still I did it fully only to have proof of concept, to check real speed and timings.
To tell in advance, I improved speed of you program 250-500x times (see Part 2).
Besides well known algorithm described in Wikipedia I did following extra optimizations:
Made most of computations in compile time. This is main feature of my code. Input number N is provided at compile time as long macro constant. This ensures that compiler does half of optimizations at compile time like inlining constants and doing optimizing division and other arithmetics. As a drawback, you have to re-compile program every time when you change a number that you want to factor.
Additionally to 1.
I also did support of runtime-only value of N. This is needed to do real comparison of speed in different environments.
One more very important speed improvement is that I used Montgomery Reduction to speedup modulus division. This Montgomery speeds up all computations 2-2.5x times. Besides Montgomery you can also use Barrett Reduction. Both Montgomery and Barrett replace single expensive division with several multiplications and additions, which makes division very fast.
Unlike in your code I do GCD (Greatest Common Divisor) computation very rarely, once in 8 000 - 16 000 iterations. Because GCD is very expensive, it needs around 128 expensive divisions for 128-bit numbers. Instead of computing GCD(x - y, N)
every time you can notice that it is enough to accumulate product prod = (prod * (x - y)) % N
and later after thousand of such iterations just compute GCD(prod, N)
. This is easily derived from fact that GCD((a * b * c) % N, N) = GCD(GCD(a, N) * GCD(b, N) * GCD(c, N), N)
.
One very advanced and fast optimization that I did is implemented my own uint128
and uint256
with all necessary sub-optimizations needed for my task. This optimization is only posted in code by me in Part 2 of my answer, see this part after first code.
As a result of above steps, total speed of Pollard Rho is increased 50-100x times, especially due to doing GCD only once in thousand steps. This speed is enough even to compute your biggest number provided in your question.
Besides algorithms described above I also used following extra algorithms: Extended Euclidean Algorithm (for computing coefficients for modular inverse), Modular Multiplicative Inverse, Euclidean Algorithm (for computing GCD), Modular Binary Exponentiation, Trial Division (for checking primality of small numbers), Fermat Primality Test, as already mentioned Montgomery Reduction and Pollard Rho itself.
I did following timings and speed measurements:
N 1000036317378699858851366323 (90 bits)
1000003357 (30 bits) * 1000014599 (30 bits) * 1000018361 (30 bits)
PollardRho time 0.1 secs, tries 1 iterations 25599 (2^14.64)
N 780002082420246798979794021150335143 (120 bits)
244300526707007 (48 bits) * 3192797383346267127449 (72 bits)
PollardRho time 32 secs, tries 1 iterations 25853951 (2^24.62)
NO-Montgomery time 70 secs
N 614793320656537415355785711660734447 (120 bits)
44780536225818373 (56 bits) * 13729029897191722339 (64 bits)
PollardRho time 310 secs, tries 1 iterations 230129663 (2^27.78)
N 1000000000002322140000000048599822299 (120 bits)
1000000000000021121 (60 bits) * 1000000000002301019 (60 bits)
PollardRho time 2260 secs, tries 1 iterations 1914068991 (2^30.83)
As you can see above your smaller number takes just 0.1 second to factor, while your bigger number (that you failed to factor at all) takes quite affordable time, 2260 seconds (a bit more than half hour). Also you can see that I created myself a number with 48-bit smallest factor, and another number with 56-bit smaller factor.
In general a rule is such that if you have smallest factor of K-bit then it takes 2^(K/2)
iterations of Pollard Rho to compute this factor. Unlike for example Trial division algorithm which needs square times bigger time of 2^K
for K-bit factor.
In my code see very start of file, there is a bunch of lines #define NUM
, each defining compile time constant containing a number. You can comment out any line or change value of a number or add a new line with new number. Then re-compile program and run it to see results.
Before below code don't forget to click on Try it online!
link to check code run on GodBolt server. Also see example Console Output after code.
Try it online!
#include <cstdint>
#include <tuple>
#include <iostream>
#include <iomanip>
#include <chrono>
#include <random>
#include <stdexcept>
#include <string>
#include <mutex>
#include <cmath>
#include <type_traits>
#include <boost/multiprecision/cpp_int.hpp>
//#define NUM "1000036317378699858851366323" // 90 bits, 1000003357 (30 bits) * 1000014599 (30 bits) * 1000018361 (30 bits), PollardRho time 0.1 secs, tries 1 iterations 25599 (2^14.64)
#define NUM "780002082420246798979794021150335143" // 120 bits, 244300526707007 (48 bits) * 3192797383346267127449 (72 bits), PollardRho time 32 secs, tries 1 iterations 25853951 (2^24.62), NO-Montgomery time 70 secs
//#define NUM "614793320656537415355785711660734447" // 120 bits, 44780536225818373 (56 bits) * 13729029897191722339 (64 bits), PollardRho time 310 secs, tries 1 iterations 230129663 (2^27.78)
//#define NUM "1000000000002322140000000048599822299" // 120 bits, 1000000000000021121 (60 bits) * 1000000000002301019 (60 bits), PollardRho time 2260 secs, tries 1 iterations 1914068991 (2^30.83)
#define IS_DEBUG 0
#define IS_COMPILE_TIME 1
bool constexpr use_montg = 1;
size_t constexpr gcd_per_nloops = 1 << 14;
#if defined(_MSC_VER) && !defined(__clang__)
#define HAS_INT128 0
#else
#define HAS_INT128 1
#endif
#define ASSERT_MSG(cond, msg) { if (!(cond)) throw std::runtime_error("Assertion (" #cond ") failed at line " + std::to_string(__LINE__) + "! Msg: '" + std::string(msg) + "'."); }
#define ASSERT(cond) ASSERT_MSG(cond, "")
#define COUT(code) { std::unique_lock<std::mutex> lock(cout_mux); std::cout code; std::cout << std::flush; }
#if IS_DEBUG
#define LN { COUT(<< "LN " << __LINE__ << std::endl); }
#define DUMP(var) { COUT(<< __LINE__ << " : " << #var << " = (" << (var) << ")" << std::endl); }
#else
#define LN
#define DUMP(var)
#endif
#define bisizeof(x) (sizeof(x) * 8)
using u32 = uint32_t;
using i64 = int64_t;
using u64 = uint64_t;
using u128 = boost::multiprecision::uint128_t;
using i128 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<128, 128, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
using u192 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<192, 192, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
using i192 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<192, 192, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
using u256 = boost::multiprecision::uint256_t;
using i256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
using u384 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<384, 384, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
using i384 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<384, 384, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
#if HAS_INT128
using u128_cl = unsigned __int128;
using i128_cl = signed __int128;
#endif
template <typename T> struct DWordOf;
template <> struct DWordOf<u64> : std::type_identity<u128> {};
template <> struct DWordOf<i64> : std::type_identity<i128> {};
template <> struct DWordOf<u128> : std::type_identity<u256> {};
template <> struct DWordOf<i128> : std::type_identity<i256> {};
#if HAS_INT128
template <> struct DWordOf<u128_cl> : std::type_identity<u256> {};
template <> struct DWordOf<i128_cl> : std::type_identity<i256> {};
#endif
template <typename T>
using DWordOfT = typename DWordOf<T>::type;
template <typename T> struct SignedOf;
template <> struct SignedOf<u64> : std::type_identity<i64> {};
template <> struct SignedOf<i64> : std::type_identity<i64> {};
template <> struct SignedOf<u128> : std::type_identity<i128> {};
template <> struct SignedOf<i128> : std::type_identity<i128> {};
#if HAS_INT128
template <> struct SignedOf<u128_cl> : std::type_identity<i128> {};
template <> struct SignedOf<i128_cl> : std::type_identity<i128> {};
#endif
template <typename T>
using SignedOfT = typename SignedOf<T>::type;
template <typename T> struct BiSizeOf;
template <> struct BiSizeOf<u64> : std::integral_constant<size_t, 64> {};
template <> struct BiSizeOf<u128> : std::integral_constant<size_t, 128> {};
template <> struct BiSizeOf<u192> : std::integral_constant<size_t, 192> {};
template <> struct BiSizeOf<u256> : std::integral_constant<size_t, 256> {};
template <> struct BiSizeOf<u384> : std::integral_constant<size_t, 384> {};
#if HAS_INT128
template <> struct BiSizeOf<u128_cl> : std::integral_constant<size_t, 128> {};
#endif
template <typename T>
size_t constexpr BiSizeOfT = BiSizeOf<T>::value;
static std::mutex cout_mux;
double Time() {
static auto const gtb = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::duration<double>>(
std::chrono::high_resolution_clock::now() - gtb).count();
}
template <typename T, typename DT = DWordOfT<T>>
constexpr DT MulD(T const & a, T const & b) {
return DT(a) * b;
}
template <typename T>
constexpr auto EGCD(T const & a, T const & b) {
using ST = SignedOfT<T>;
using DST = DWordOfT<ST>;
T ro = 0, r = 0, qu = 0, re = 0;
ST so = 0, s = 0;
std::tie(ro, r, so, s) = std::make_tuple(a, b, 1, 0);
while (r != 0) {
std::tie(qu, re) = std::make_tuple(ro / r, ro % r);
std::tie(ro, r) = std::make_tuple(r, re);
std::tie(so, s) = std::make_tuple(s, ST(so - DST(s) * ST(qu)));
}
ST const to = ST((DST(ro) - DST(a) * so) / ST(b));
return std::make_tuple(ro, so, to);
}
template <typename T>
constexpr T ModInv(T x, T mod) {
using ST = SignedOfT<T>;
using DT = DWordOfT<T>;
x %= mod;
auto [g, s, t] = EGCD(x, mod);
//ASSERT(g == 1);
if (s < 0) {
//ASSERT(ST(mod) + s >= 0);
s += mod;
} else {
//ASSERT(s < mod);
}
//ASSERT((DT(x) * s) % mod == 1);
return T(s);
}
template <typename ST>
constexpr std::tuple<ST, ST, ST, ST> MontgKRR(ST n) {
size_t constexpr ST_bisize = BiSizeOfT<ST>;
using DT = DWordOfT<ST>;
DT constexpr r = DT(1) << ST_bisize;
ST const rmod = ST(r % n), rmod2 = ST(MulD<ST>(rmod, rmod) % n), rinv = ModInv<ST>(rmod, n);
DT const k0 = (r * DT(rinv) - 1) / n;
//ASSERT(k0 < (DT(1) << ST_bisize));
ST const k = ST(k0);
return std::make_tuple(k, rmod, rmod2, rinv);
}
template <typename T>
constexpr T GCD(T a, T b) {
while (b != 0)
std::tie(a, b) = std::make_tuple(b, a % b);
return a;
}
template <typename T>
T PowMod(T a, T b, T const & c) {
// https://en.wikipedia.org/wiki/Modular_exponentiation
using DT = DWordOfT<T>;
T r = 1;
while (b != 0) {
if (u32(b) & 1)
r = T(MulD<T>(r, a) % c);
a = T(MulD<T>(a, a) % c);
b >>= 1;
}
return r;
}
template <typename T>
std::pair<bool, bool> IsProbablyPrime_TrialDiv(T const n, u64 limit = u64(-1)) {
// https://en.wikipedia.org/wiki/Trial_division
if (n <= 16)
return {n == 2 || n == 3 || n == 5 || n == 7 || n == 11 || n == 13, true};
if ((n & 1) == 0)
return {false, true};
u64 d = 0;
for (d = 3; d < limit && d * d <= n; d += 2)
if (n % d == 0)
return {false, true};
return {true, d * d > n};
}
template <typename T>
bool IsProbablyPrime_Fermat(T const n, size_t ntrials = 32) {
// https://en.wikipedia.org/wiki/Fermat_primality_test
if (n <= 16)
return n == 2 || n == 3 || n == 5 || n == 7 || n == 11 || n == 13;
thread_local std::mt19937_64 rng{123};
u64 const rand_end = n - 3 <= u64(-5) ? u64(n - 3) : u64(-5);
for (size_t trial = 0; trial < ntrials; ++trial)
if (PowMod<T>(rng() % rand_end + 2, n - 1, n) != 1)
return false;
return true;
}
template <typename T>
bool IsProbablyPrime(T const n) {
if (n < (1 << 12))
return IsProbablyPrime_TrialDiv(n).first;
return IsProbablyPrime_Fermat(n);
}
template <typename T>
std::string IntToStr(T n) {
if (n == 0)
return "0";
std::string r;
while (n != 0) {
u32 constexpr mod = 1'000'000'000U;
std::ostringstream ss;
auto const nm = u32(n % mod);
n /= mod;
if (n != 0)
ss << std::setw(9) << std::setfill('0');
ss << nm;
r = ss.str() + r;
}
return r;
}
template <typename T>
constexpr T ParseNum(char const * s) {
size_t len = 0;
for (len = 0; s[len]; ++len);
T r = 0;
for (size_t i = 0; i < len; ++i) {
r *= 10;
r += s[i] - '0';
}
return r;
}
template <typename T>
std::tuple<T, std::vector<T>, std::vector<T>> Factor_PollardRho(
#if !IS_COMPILE_TIME
T const & n,
#endif
u64 limit = u64(-1), size_t ntrials = 6) {
size_t constexpr T_bisize = BiSizeOfT<T>;
// https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
using DT = DWordOfT<T>;
#if IS_COMPILE_TIME
static auto constexpr n = ParseNum<T>(NUM);
#endif
if (n <= 1)
return {n, {}, {}};
if (IsProbablyPrime<T>(n))
return {n, {n}, {}};
#if IS_COMPILE_TIME
static auto constexpr montg_krr = MontgKRR(n);
static T constexpr mk = std::get<0>(montg_krr), mrm = std::get<1>(montg_krr), mrm2 = std::get<2>(montg_krr), mri = std::get<3>(montg_krr),
mone = use_montg ? mrm : 1, mone2 = use_montg ? mrm2 : 1;
#else
static auto const montg_krr = MontgKRR(n);
static T const mk = std::get<0>(montg_krr), mrm = std::get<1>(montg_krr), mrm2 = std::get<2>(montg_krr), mri = std::get<3>(montg_krr),
mone = use_montg ? mrm : 1, mone2 = use_montg ? mrm2 : 1;
#endif
auto AdjustL = [&](T x) -> T {
if constexpr(1) {
while (x >= n)
x -= n;
return x;
} else {
using SiT = SignedOfT<T>;
return x - (n & (~T(SiT(x - n) >> (T_bisize - 1))));
}
};
auto MontgModL = [&](DT const & x) -> T {
if constexpr(!use_montg)
return T(x % n);
else
return T((x + MulD<T>(n, T(x) * mk)) >> T_bisize);
};
auto ToMontgL = [&](T const & x) -> T {
if constexpr(!use_montg)
return x;
else
return MontgModL(MulD<T>(x, mrm2));
};
auto FromMontgL = [&](T const & x) -> T {
if constexpr(!use_montg)
return x;
else
return AdjustL(MontgModL(x));
};
auto DumpMontgX = [&](char const * name, T const & x, bool from = true){
if constexpr(1) {
COUT(<< __LINE__ << " : " << name << " = " << IntToStr(from ? FromMontgL(x) : x) << std::endl);
}
};
auto f = [&](T x){ return MontgModL(MulD<T>(x, x) + mone2); };
#if IS_DEBUG
#define DUMPM(x) DumpMontgX(#x, x)
#define DUMPI(x) DumpMontgX(#x, x, false)
#else
#define DUMPM(x)
#define DUMPI(x)
#endif
ASSERT(3 <= n);
size_t cnt = 0;
u64 const distr_end = n - 2 <= u64(-5) ? u64(n - 2) : u64(-5);
thread_local std::mt19937_64 rng{123};
for (size_t itry = 0; itry < ntrials; ++itry) {
bool failed = false;
u64 const rnd = rng() % distr_end + 1;
T x = ToMontgL(rnd);
u64 sum_cycles = 0;
for (u64 cycle = 1;; cycle <<= 1) {
T y = x, m = mone, xstart = x, ny = 0;
while (ny < y)
ny += n;
ny -= y;
auto ILast = [&](auto istart){
size_t ri = istart + gcd_per_nloops - 1;
if (ri < cycle)
return ri;
else
return cycle - 1;
};
for (u64 i = 0, istart = 0, ilast = ILast(istart); i < cycle; ++i) {
x = f(x);
m = MontgModL(MulD<T>(m, ny + x));
if (i < ilast)
continue;
cnt += ilast + 1 - istart;
if (cnt >= limit)
return {n, {}, {n}};
if (GCD<T>(n, FromMontgL(m)) == 1) {
istart = i + 1;
ilast = ILast(istart);
xstart = x;
continue;
}
T x2 = xstart;
for (u64 i2 = istart; i2 <= i; ++i2) {
x2 = f(x2);
auto const g = GCD<T>(n, FromMontgL(ny + x2));
if (g == 1) {
continue;
}
sum_cycles += i + 1;
if (g == n) {
failed = true;
break;
}
#if 0
auto res0 = Factor_PollardRho<T>(g, limit, ntrials);
auto res1 = Factor_PollardRho<T>(n / g, limit, ntrials);
res0.first.insert(res0.first.end(), res1.first.begin(), res1.first.end());
res0.second.insert(res0.second.end(), res1.second.begin(), res1.second.end());
#endif
ASSERT(n % g == 0);
COUT(<< "PollardRho tries " << (itry + 1) << " iterations " << sum_cycles << " (2^" << std::fixed << std::setprecision(2) << std::log2(std::max<size_t>(1, sum_cycles)) << ")" << std::endl);
if (IsProbablyPrime<T>(n / g))
return {n, {g, n / g}, {}};
else
return {n, {g}, {n / g}};
}
if (failed)
break;
ASSERT(false);
}
sum_cycles += cycle;
if (failed)
break;
}
}
return {n, {}, {n}};
}
template <typename T>
void ShowFactors(std::tuple<T, std::vector<T>, std::vector<T>> fs) {
auto [N, a, b] = fs;
std::cout << "Factors of " << IntToStr(N) << " (2^" << std::fixed << std::setprecision(3) << std::log2(double(std::max<T>(1, N))) << "):" << std::endl;
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
for (auto const & x: a)
std::cout << x << " ";
std::cout << std::endl;
if (!b.empty()) {
std::cout << "Unfactored:" << std::endl;
for (auto const & x: b)
std::cout << x << " ";
std::cout << std::endl;
}
}
int main() {
try {
using T = u128;
#if !IS_COMPILE_TIME
std::string s;
COUT(<< "Enter number: ");
std::cin >> s;
auto const N = ParseNum<T>(s.c_str());
#endif
auto const tim = Time();
ShowFactors(Factor_PollardRho<T>(
#if !IS_COMPILE_TIME
N
#endif
));
COUT(<< "Time " << std::fixed << std::setprecision(3) << (Time() - tim) << " sec" << std::endl);
return 0;
} catch (std::exception const & ex) {
std::cout << "Exception: " << ex.what() << std::endl;
return -1;
}
}
Console Output:
PollardRho tries 1 iterations 25853951 (2^24.62)
Factors of 780002082420246798979794021150335143 (2^119.231):
244300526707007 3192797383346267127449
Time 35.888 sec
Part 2
Decided to do even further improvements, by implementing my own highly optimal uint128
and uint256
, meaning long arithmetics, same like done in Boost or GMP.
Not to dwell into all the details, I optimized every line and every method of these classes. Especially all those methods that deal with operations needed for factorization.
This improved version gives 6x times more speed, compared to Part 1 of my answer, meaning if first version takes 30 seconds to finish, this second version takes 5 seconds.
As you can see in Console Output at the very end of my post, your biggest number takes just 420 seconds to factor. This is 5.4x times faster compared to 2260 seconds of the first part of this answer.
CODE GOES HERE. Only because of StackOverflow limit of 30 000 symbols per post, I can't inline 2nd code here, because it alone is 26 KB in size. Hence I'm providing this code as Gist link below and also through Try it online!
link (to run online on GodBolt server):
Github Gist source code
Try it online!
Console Output:
PollardRho tries 1 iterations 32767 (2^15.00)
Factors of 1000036317378699858851366323 (2^89.692):
1000014599
Unfactored:
1000021718061637877
Time 0.086 sec
PollardRho tries 1 iterations 25853951 (2^24.62)
Factors of 780002082420246798979794021150335143 (2^119.231):
244300526707007 3192797383346267127449
Time 5.830 sec
PollardRho tries 1 iterations 230129663 (2^27.78)
Factors of 614793320656537415355785711660734447 (2^118.888):
44780536225818373 13729029897191722339
Time 49.446 sec
PollardRho tries 1 iterations 1914077183 (2^30.83)
Factors of 1000000000002322140000000048599822299 (2^119.589):
1000000000000021121 1000000000002301019
Time 419.680 sec
f(x)
will change it's behaviour. Maybe that could help you progress faster on larger primes? – Shellieshellproof